ExamGecko
Home Home / Microsoft / DP-203

Microsoft DP-203 Practice Test - Questions Answers, Page 8

Question list
Search
Search

List of questions

Search

Related questions











You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The size of the files will vary based on the number of events that occur per hour. File sizes range from 4 KB to 5 GB.

You need to ensure that the files stored in the container are optimized for batch processing. What should you do?

A.
Convert the files to JSON
A.
Convert the files to JSON
Answers
B.
Convert the files to Avro
B.
Convert the files to Avro
Answers
C.
Compress the files
C.
Compress the files
Answers
D.
Merge the files
D.
Merge the files
Answers
Suggested answer: B

Explanation:

Avro supports batch and is very relevant for streaming.

Note: Avro is framework developed within Apache's Hadoop project. It is a row-based storage format which is widely used as a serialization process. AVRO stores its schema in JSON format making it easy to read and interpret by any program. The data itself is stored in binary format by doing it compact and efficient.

Reference:

https://www.adaltas.com/en/2020/07/23/benchmark-study-of-different-file-format/

You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:

TransactionType: 40 million rows per transaction type

CustomerSegment: 4 million per customer segment

TransactionMonth: 65 million rows per month AccountType: 500 million per account type You have the following query requirements:

Analysts will most commonly analyze transactions for a given month. Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account type You need to recommend a partition strategy for the table to minimize query times. On which column should you recommend partitioning the table?

A.
CustomerSegment
A.
CustomerSegment
Answers
B.
AccountType
B.
AccountType
Answers
C.
TransactionType
C.
TransactionType
Answers
D.
TransactionMonth
D.
TransactionMonth
Answers
Suggested answer: D

You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and PolyBase in Azure Synapse Analytics. You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the files can be queried quickly and that the data type information is retained.

What should you recommend?

A.
JSON
A.
JSON
Answers
B.
Parquet
B.
Parquet
Answers
C.
CSV
C.
CSV
Answers
D.
Avro
D.
Avro
Answers
Suggested answer: B

Explanation:

Need Parquet to support both Databricks and PolyBase.

Reference:

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql

You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a partitioned fact table named dbo.Sales and a staging table named stg.Sales that has the matching table and partition definitions. You need to overwrite the content of the first partition in dbo.Sales with the content of the same partition in stg.Sales. The solution must minimize load times. What should you do?

A.
Insert the data from stg.Sales into dbo.Sales.
A.
Insert the data from stg.Sales into dbo.Sales.
Answers
B.
Switch the first partition from dbo.Sales to stg.Sales.
B.
Switch the first partition from dbo.Sales to stg.Sales.
Answers
C.
Switch the first partition from stg.Sales to dbo.Sales.
C.
Switch the first partition from stg.Sales to dbo.Sales.
Answers
D.
Update dbo.Sales from stg.Sales.
D.
Update dbo.Sales from stg.Sales.
Answers
Suggested answer: C

Explanation:

Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool

You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool.

You plan to keep a record of changes to the available fields.

The supplier data contains the following columns.

Which three additional columns should you add to the data to create a Type 2 SCD? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

A.
surrogate primary key
A.
surrogate primary key
Answers
B.
effective start date
B.
effective start date
Answers
C.
business key
C.
business key
Answers
D.
last modified date
D.
last modified date
Answers
E.
effective end date
E.
effective end date
Answers
F.
foreign key
F.
foreign key
Answers
Suggested answer: A, B, E

Explanation:

https://learn.microsoft.com/en-us/training/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between-dimension-types

You are designing a partition strategy for a fact table in an Azure Synapse Analytics dedicated SQL pool. The table has the following specifications:

Contain sales data for 20,000 products.

Use hash distribution on a column named ProductID.

Contain 2.4 billion records for the years 2019 and 2020.

Which number of partition ranges provides optimal compression and performance for the clustered columnstore index?

A.
40
A.
40
Answers
B.
240
B.
240
Answers
C.
400
C.
400
Answers
D.
2,400
D.
2,400
Answers
Suggested answer: A

Explanation:

Each partition should have around 1 millions records. Dedication SQL pools already have 60 partitions. We have the formula: Records/(Partitions*60)= 1 million Partitions= Records/(1 million * 60)

Partitions= 2.4 x 1,000,000,000/(1,000,000 * 60) = 40

Note: Having too many partitions can reduce the effectiveness of clustered columnstore indexes if each partition has fewer than 1 million rows. Dedicated SQL pools automatically partition your data into 60 databases. So, if you create a table with 100 partitions, the result will be 6000 partitions.

Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool

You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.

FactPurchase will have 1 million rows of data added daily and will contain three years of data.

Transact-SQL queries similar to the following query will be executed daily.

SELECT

SupplierKey, StockItemKey, COUNT(*)

FROM FactPurchase

WHERE DateKey >= 20210101

AND DateKey <= 20210131

GROUP By SupplierKey, StockItemKey

Which table distribution will minimize query times?

A.
replicated
A.
replicated
Answers
B.
hash-distributed on PurchaseKey
B.
hash-distributed on PurchaseKey
Answers
C.
round-robin
C.
round-robin
Answers
D.
hash-distributed on DateKey
D.
hash-distributed on DateKey
Answers
Suggested answer: B

Explanation:

Hash-distributed tables improve query performance on large fact tables, and are the focus of this article. Round-robin tables are useful for improving loading speed. Incorrect:

Not D: Do not use a date column. . All data for the same date lands in the same distribution. If several users are all filtering on the same date, then only 1 of the 60 distributions do all the processing work.

Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

You are implementing a batch dataset in the Parquet format. Data files will be produced be using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics serverless SQL pool. You need to minimize storage costs for the solution.

What should you do?

A.
Use Snappy compression for files.
A.
Use Snappy compression for files.
Answers
B.
Use OPENROWSET to query the Parquet files.
B.
Use OPENROWSET to query the Parquet files.
Answers
C.
Create an external table that contains a subset of columns from the Parquet files.
C.
Create an external table that contains a subset of columns from the Parquet files.
Answers
D.
Store all data as string in the Parquet files.
D.
Store all data as string in the Parquet files.
Answers
Suggested answer: C

Explanation:

An external table points to data located in Hadoop, Azure Storage blob, or Azure Data Lake Storage. External tables are used to read data from files or write data to files in Azure Storage. With Synapse SQL, you can use external tables to read external data using dedicated SQL pool or serverless SQL pool.

Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables

You are designing a data mart for the human resources (HR) department at your company. The data mart will contain employee information and employee transactions.

From a source system, you have a flat extract that has the following fields:

EmployeeID

FirstName

LastName

Recipient

GrossAmount

TransactionID

GovernmentID

NetAmountPaid

TransactionDate

You need to design a star schema data model in an Azure Synapse Analytics dedicated SQL pool for the data mart.

Which two tables should you create? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

A.
a dimension table for Transaction
A.
a dimension table for Transaction
Answers
B.
a dimension table for EmployeeTransaction
B.
a dimension table for EmployeeTransaction
Answers
C.
a dimension table for Employee
C.
a dimension table for Employee
Answers
D.
a fact table for Employee
D.
a fact table for Employee
Answers
E.
a fact table for Transaction
E.
a fact table for Transaction
Answers
Suggested answer: C, E

Explanation:

C: Dimension tables contain attribute data that might change but usually changes infrequently. For example, a customer's name and address are stored in a dimension table and updated only when the customer's profile changes. To minimize the size of a large fact table, the customer's name and address don't need to be in every row of a fact table. Instead, the fact table and the dimension table can share a customer ID. A query can join the two tables to associate a customer's profile and transactions.

E: Fact tables contain quantitative data that are commonly generated in a transactional system, and then loaded into the dedicated SQL pool. For example, a retail business generates sales transactions every day, and then loads the data into a dedicated SQL pool fact table for analysis.

Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview

You are designing a dimension table for a data warehouse. The table will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as the data changes. Which type of slowly changing dimension (SCD) should you use?

A.
Type 0
A.
Type 0
Answers
B.
Type 1
B.
Type 1
Answers
C.
Type 2
C.
Type 2
Answers
D.
Type 3
D.
Type 3
Answers
Suggested answer: C

Explanation:

A Type 2 SCD supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for example, IsCurrent) to easily filter by current dimension members.

Incorrect Answers:

B: A Type 1 SCD always reflects the latest values, and when changes in source data are detected, the dimension table data is overwritten. D: A Type 3 SCD supports storing two versions of a dimension member as separate columns. The table includes a column for the current value of a member plus either the original or previous value of the member. So Type 3 uses additional columns to track one key instance of history, rather than storing additional rows to track each change like in a Type 2 SCD.

Reference:

https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between-dimension-types

Total 320 questions
Go to page: of 32