ExamGecko
Home Home / Microsoft / DP-203

Microsoft DP-203 Practice Test - Questions Answers, Page 27

Question list
Search
Search

List of questions

Search

Related questions











DRAG DROP

You are batch loading a table in an Azure Synapse Analytics dedicated SQL pool. You need to load data from a staging table to the target table. The solution must ensure that if an error occurs while loading the data to the target table, all the inserts in that batch are undone. How should you complete the Transact-SQL code? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE Each correct selection is worth one point.

Question 261
Correct answer: Question 261

You have an Azure subscription that contains an Azure SQL database named DB1 and a storage account named storage1. The storage1 account contains a file named File1.txt. File1.txt contains the names of selected tables in DB1. You need to use an Azure Synapse pipeline to copy data from the selected tables in DB1 to the files in storage1. The solution must meet the following requirements:

• The Copy activity in the pipeline must be parameterized to use the data in File1.txt to identify the source and destination of the copy.

• Copy activities must occur in parallel as often as possible. Which two pipeline activities should you include in the pipeline? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

A.
If Condition
A.
If Condition
Answers
B.
ForEach
B.
ForEach
Answers
C.
Lookup
C.
Lookup
Answers
D.
Get Metadata
D.
Get Metadata
Answers
Suggested answer: A, D

Explanation:


You have an Azure data factory that connects to a Microsoft Purview account. The data factory is registered in Microsoft Purview. You update a Data Factory pipeline.

You need to ensure that the updated lineage is available in Microsoft Purview. What should you do first?

A.
Locate the related asset in the Microsoft Purview portal.
A.
Locate the related asset in the Microsoft Purview portal.
Answers
B.
Execute the pipeline.
B.
Execute the pipeline.
Answers
C.
Disconnect the Microsoft Purview account from the data factory.
C.
Disconnect the Microsoft Purview account from the data factory.
Answers
D.
Execute an Azure DevOps build pipeline.
D.
Execute an Azure DevOps build pipeline.
Answers
Suggested answer: B

You plan to use an Apache Spark pool in Azure Synapse Analytics to load data to an Azure Data Lake Storage Gen2 account. You need to recommend which file format to use to store the data in the Data Lake Storage account. The solution must meet the following requirements:

• Column names and data types must be defined within the files loaded to the Data Lake Storage account.

• Data must be accessible by using queries from an Azure Synapse Analytics serverless SQL pool.

• Partition elimination must be supported without having to specify a specific partition. What should you recommend?

A.
Delta Lake
A.
Delta Lake
Answers
B.
JSON
B.
JSON
Answers
C.
CSV
C.
CSV
Answers
D.
ORC
D.
ORC
Answers
Suggested answer: D

HOTSPOT

You have two Azure SQL databases named DB1 and DB2.

DB1 contains a table named Table 1. Table1 contains a timestamp column named LastModifiedOn. LastModifiedOn contains the timestamp of the most recent update for each individual row. DB2 contains a table named Watermark. Watermark contains a single timestamp column named WatermarkValue. You plan to create an Azure Data Factory pipeline that will incrementally upload into Azure Blob Storage all the rows in Table1 for which the LastModifiedOn column contains a timestamp newer than the most recent value of the WatermarkValue column in Watermark.

You need to identify which activities to include in the pipeline. The solution must meet the following requirements:

• Minimize the effort to author the pipeline.

• Ensure that the number of data integration units allocated to the upload operation can be controlled. What should you identify? To answer, select the appropriate options in the answer area.

Question 265
Correct answer: Question 265

HOTSPOT

You have an Azure Synapse serverless SQL pool.

You need to read JSON documents from a file by using the OPENROWSET function. How should you complete the query? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Question 266
Correct answer: Question 266

HOTSPOT

You have an Azure Data Factory pipeline shown the following exhibit.

The execution log for the first pipeline run is shown in the following exhibit.

The execution log for the second pipeline run is shown in the following exhibit.

For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE:

Each correct selection is worth one point.

Question 267
Correct answer: Question 267

You are designing 2 solution that will use tables in Delta Lake on Azure Databricks. You need to minimize how long it takes to perform the following:

*Queries against non-partitioned tables

* Joins on non-partitioned columns

Which two options should you include in the solution? Each correct answer presents part of the solution. (Choose Correct Answer and Give Explanation and Reference to Support the answers based from Data Engineering on Microsoft Azure)

A.
Z-Ordering
A.
Z-Ordering
Answers
B.
Apache Spark caching
B.
Apache Spark caching
Answers
C.
dynamic file pruning (DFP)
C.
dynamic file pruning (DFP)
Answers
D.
the clone command
D.
the clone command
Answers
Suggested answer: A, C

Explanation:

A. Z-Ordering

B. Apache Spark caching

C. dynamic file pruning (DFP)

D. the clone command

Answer: AB

Explanation:

According to the information I found on the web, two options that you should include in the solution to minimize how long it takes to perform queries and joins on non-partitioned tables are:

Z-Ordering: This is a technique to colocate related information in the same set of files. This colocality is automatically used by Delta Lake in data-skipping algorithms. This behavior dramatically reduces the amount of data that Delta Lake on Azure Databricks needs to read123. Apache Spark caching: This is a feature that allows you to cache data in memory or on disk for faster access. Caching can improve the performance of repeated queries and joins on the same data. You can cache Delta tables using the CACHE TABLE or CACHE LAZY commands.

To minimize the time it takes to perform queries against non-partitioned tables and joins on nonpartitioned columns in Delta Lake on Azure Databricks, the following options should be included in the solution:

1. Z-Ordering: Z-Ordering improves query performance by co-locating data that share the same column values in the same physical partitions. This reduces the need for shuffling data across nodes during query execution. By using Z-Ordering, you can avoid full table scans and reduce the amount of data processed. 2. Apache Spark caching: Caching data in memory can improve query performance by reducing the amount of data read from disk. This helps to speed up subsequent queries that need to access the same data. When you cache a table, the data is read from the data source and stored in memory.

Subsequent queries can then read the data from memory, which is much faster than reading it from disk.

Reference:

Delta Lake on Databricks: https://docs.databricks.com/delta/index.html

Best Practices for Delta Lake on Databricks: https://databricks.com/blog/2020/05/14/best-practicesfor-delta-lake-on-databricks.html

You are deploying a lake database by using an Azure Synapse database template. You need to add additional tables to the database. The solution must use the same grouping method as the template tables. ‘Which grouping method should you use?

A.
business area
A.
business area
Answers
B.
size
B.
size
Answers
C.
facts and dimensions
C.
facts and dimensions
Answers
D.
partition style
D.
partition style
Answers
Suggested answer: A

Explanation:

Business area: This is how the Azure Synapse database templates group tables by default. Each template consists of one or more enterprise templates that contain tables grouped by business areas. For example, the Retail template has business areas such as Customer, Product, Sales, and Store123. Using the same grouping method as the template tables can help you maintain consistency and compatibility with the industry-specific data model. https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/database-templates-inazure-synapse-analytics/ba-p/2929112

HOTSPOT

You have an Azure Blob storage account that contains a folder. The folder contains 120,000 files. Each file contains 62 columns. Each day, 1,500 new files are added to the folder.

You plan to incrementally load five data columns from each new file into an Azure Synapse Analytics workspace. You need to minimize how long it takes to perform the incremental loads. What should you use to store the files and format?

Question 270
Correct answer: Question 270

Explanation:

Box 1 = timeslice partitioning in the folders

This means that you should organize your files into folders based on a time attribute, such as year, month, day, or hour. For example, you can have a folder structure like /yyyy/mm/dd/file.csv. This way, you can easily identify and load only the new files that are added each day by using a time filter in your Azure Synapse pipeline12. Timeslice partitioning can also improve the performance of data loading and querying by reducing the number of files that need to be scanned Box = 2 Apache Parquet

This is because Parquet is a columnar file format that can efficiently store and compress data with many columns. Parquet files can also be partitioned by a time attribute, which can improve the performance of incremental loading and querying by reducing the number of files that need to be scanned123. Parquet files are supported by both dedicated SQL pool and serverless SQL pool in Azure Synapse Analytics2.


Total 320 questions
Go to page: of 32