ExamGecko
Home Home / Microsoft / DP-203

Microsoft DP-203 Practice Test - Questions Answers, Page 12

Question list
Search
Search

List of questions

Search

Related questions











Note: This question-is part of a series of questions that present the same scenario. Each question-in the series contains a unique solution that might meet the stated goals. Some question-sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question-in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once. Solution: You use a hopping window that uses a hop size of 5 seconds and a window size 10 seconds. Does this meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

Instead use a tumbling window. Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.

Reference:

https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types of data, and insert the data into a table in an Azure Synapse Analytic dedicated SQL pool. The CSV file contains three columns named username, comment, and date.

The data flow already contains the following:

A source transformation.

A Derived Column transformation to set the appropriate types of data. A sink transformation to land the data in the pool. You need to ensure that the data flow meets the following requirements:

All valid rows must be written to the destination table.

Truncation errors in the comment column must be avoided proactively. Any rows containing comment values that will cause truncation errors upon insert must be written to a file in blob storage. Which two actions should you perform? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

A.
To the data flow, add a sink transformation to write the rows to a file in blob storage.
A.
To the data flow, add a sink transformation to write the rows to a file in blob storage.
Answers
B.
To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors.
B.
To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors.
Answers
C.
To the data flow, add a filter transformation to filter out rows that will cause truncation errors.
C.
To the data flow, add a filter transformation to filter out rows that will cause truncation errors.
Answers
D.
Add a select transformation to select only the rows that will cause truncation errors.
D.
Add a select transformation to select only the rows that will cause truncation errors.
Answers
Suggested answer: A, B

Explanation:

B: Example:

1. This conditional split transformation defines the maximum length of "title" to be five. Any row that is less than or equal to five will go into the GoodRows stream. Any row that is larger than five will go into the BadRows stream.

2. This conditional split transformation defines the maximum length of "title" to be five. Any row that is less than or equal to five will go into the GoodRows stream. Any row that is larger than five will go into the BadRows stream.

A:

3. Now we need to log the rows that failed. Add a sink transformation to the BadRows stream for logging. Here, we'll "auto-map" all of the fields so that we have logging of the complete transaction record. This is a text-delimited CSV file output to a single file in Blob Storage. We'll call the log file "badrows.csv".

4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL truncation errors and put those entries into a log file. Meanwhile, successful rows can continue to write to our target database.

Reference:

https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-error-rows

You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South region. You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory. The solution must meet the following requirements:

Ensure that the data remains in the UK South region at all times. Minimize administrative effort. Which type of integration runtime should you use?

A.
Azure integration runtime
A.
Azure integration runtime
Answers
B.
Azure-SSIS integration runtime
B.
Azure-SSIS integration runtime
Answers
C.
Self-hosted integration runtime
C.
Self-hosted integration runtime
Answers
Suggested answer: A

Explanation:

Incorrect Answers:

C: Self-hosted integration runtime is to be used On-premises.

Reference:

https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime

You have an Azure Stream Analytics job that receives clickstream data from an Azure event hub. You need to define a query in the Stream Analytics job. The query must meet the following requirements:

Count the number of clicks within each 10-second window based on the country of a visitor. Ensure that each click is NOT counted more than once. How should you define the Query?

A.
SELECT Country, Avg(*) AS AverageFROM ClickStream TIMESTAMP BY CreatedAtGROUP BY Country, SlidingWindow(second, 10)
A.
SELECT Country, Avg(*) AS AverageFROM ClickStream TIMESTAMP BY CreatedAtGROUP BY Country, SlidingWindow(second, 10)
Answers
B.
SELECT Country, Count(*) AS CountFROM ClickStream TIMESTAMP BY CreatedAtGROUP BY Country, TumblingWindow(second, 10)
B.
SELECT Country, Count(*) AS CountFROM ClickStream TIMESTAMP BY CreatedAtGROUP BY Country, TumblingWindow(second, 10)
Answers
C.
SELECT Country, Avg(*) AS AverageFROM ClickStream TIMESTAMP BY CreatedAtGROUP BY Country, HoppingWindow(second, 10, 2)
C.
SELECT Country, Avg(*) AS AverageFROM ClickStream TIMESTAMP BY CreatedAtGROUP BY Country, HoppingWindow(second, 10, 2)
Answers
D.
SELECT Country, Count(*) AS CountFROM ClickStream TIMESTAMP BY CreatedAtGROUP BY Country, SessionWindow(second, 5, 10)
D.
SELECT Country, Count(*) AS CountFROM ClickStream TIMESTAMP BY CreatedAtGROUP BY Country, SessionWindow(second, 5, 10)
Answers
Suggested answer: B

Explanation:

Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window. Example:

Incorrect Answers:

A: Sliding windows, unlike Tumbling or Hopping windows, output events only for points in time when the content of the window actually changes. In other words, when an event enters or exits the window. Every window has at least one event, like in the case of Hopping windows, events can belong to more than one sliding window. C: Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size. D: Session windows group events that arrive at similar times, filtering out periods of time where there is no data.

Reference:

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure Data Lake Storage Gen2 container. Which type of trigger should you use?

A.
on-demand
A.
on-demand
Answers
B.
tumbling window
B.
tumbling window
Answers
C.
schedule
C.
schedule
Answers
D.
event
D.
event
Answers
Suggested answer: D

Explanation:

Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events. Data integration scenarios often require Data Factory customers to trigger pipelines based on events happening in storage account, such as the arrival or deletion of a file in Azure Blob Storage account.

Reference:

https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger

You have two Azure Data Factory instances named ADFdev and ADFprod. ADFdev connects to an Azure DevOps Git repository. You publish changes from the main branch of the Git repository to ADFdev. You need to deploy the artifacts from ADFdev to ADFprod.

What should you do first?

A.
From ADFdev, modify the Git configuration.
A.
From ADFdev, modify the Git configuration.
Answers
B.
From ADFdev, create a linked service.
B.
From ADFdev, create a linked service.
Answers
C.
From Azure DevOps, create a release pipeline.
C.
From Azure DevOps, create a release pipeline.
Answers
D.
From Azure DevOps, update the main branch.
D.
From Azure DevOps, update the main branch.
Answers
Suggested answer: C

Explanation:

In Azure Data Factory, continuous integration and delivery (CI/CD) means moving Data Factory pipelines from one environment (development, test, production) to another. Note: The following is a guide for setting up an Azure Pipelines release that automates the deployment of a data factory to multiple environments. In Azure DevOps, open the project that's configured with your data factory. On the left side of the page, select Pipelines, and then select Releases. Select New pipeline, or, if you have existing pipelines, select New and then New release pipeline. In the Stage name box, enter the name of your environment. Select Add artifact, and then select the git repository configured with your development data factory. Select the publish branch of the repository for the Default branch. By default, this publish branch is adf_publish. Select the Empty job template.

Reference:

https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment

You are developing a solution that will stream to Azure Stream Analytics. The solution will have both streaming data and reference data. Which input type should you use for the reference data?

A.
Azure Cosmos DB
A.
Azure Cosmos DB
Answers
B.
Azure Blob storage
B.
Azure Blob storage
Answers
C.
Azure IoT Hub
C.
Azure IoT Hub
Answers
D.
Azure Event Hubs
D.
Azure Event Hubs
Answers
Suggested answer: B

Explanation:

Stream Analytics supports Azure Blob storage and Azure SQL Database as the storage layer for Reference Data.

Reference:

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data

You are designing an Azure Stream Analytics job to process incoming events from sensors in retail environments. You need to process the events to produce a running average of shopper counts during the previous 15 minutes, calculated at five-minute intervals. Which type of window should you use?

A.
snapshot
A.
snapshot
Answers
B.
tumbling
B.
tumbling
Answers
C.
hopping
C.
hopping
Answers
D.
sliding
D.
sliding
Answers
Suggested answer: B

Explanation:

Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. The following diagram illustrates a stream with a series of events and how they are mapped into 10-second tumbling windows.

Reference:

https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day. You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times. What should you include in the solution?

A.
Partition by DateTime fields.
A.
Partition by DateTime fields.
Answers
B.
Sink to Azure Queue storage.
B.
Sink to Azure Queue storage.
Answers
C.
Include a watermark column.
C.
Include a watermark column.
Answers
D.
Use a JSON format for physical data storage.
D.
Use a JSON format for physical data storage.
Answers
Suggested answer: B

Explanation:

The Databricks ABS-AQS connector uses Azure Queue Storage (AQS) to provide an optimized file source that lets you find new files written to an Azure Blob storage (ABS) container without repeatedly listing all of the files. This provides two major advantages:

Lower latency: no need to list nested directory structures on ABS, which is slow and resource intensive. Lower costs: no more costly LIST API requests made to ABS.

Reference:

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/aqs

You have an Azure Databricks workspace named workspace1 in the Standard pricing tier. You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must meet the following requirements:

Automatically scale down workers when the cluster is underutilized for three minutes. Minimize the time it takes to scale to the maximum number of workers. Minimize costs. What should you do first?

A.
Enable container services for workspace1.
A.
Enable container services for workspace1.
Answers
B.
Upgrade workspace1 to the Premium pricing tier.
B.
Upgrade workspace1 to the Premium pricing tier.
Answers
C.
Set Cluster Mode to High Concurrency.
C.
Set Cluster Mode to High Concurrency.
Answers
D.
Create a cluster policy in workspace1.
D.
Create a cluster policy in workspace1.
Answers
Suggested answer: B

Explanation:

For clusters running Databricks Runtime 6.4 and above, optimized autoscaling is used by all-purpose clusters in the Premium plan Optimized autoscaling:

Scales up from min to max in 2 steps.

Can scale down even if the cluster is not idle by looking at shuffle file state. Scales down based on a percentage of current nodes.

On job clusters, scales down if the cluster is underutilized over the last 40 seconds. On all-purpose clusters, scales down if the cluster is underutilized over the last 150 seconds. The spark.databricks.aggressiveWindowDownS Spark configuration property specifies in seconds how often a cluster makes down-scaling decisions. Increasing the value causes a cluster to scale down more slowly. The maximum value is 600.

Note: Standard autoscaling

Starts with adding 8 nodes. Thereafter, scales up exponentially, but can take many steps to reach the max. You can customize the first step by setting the spark.databricks.autoscaling.standardFirstStepUp Spark configuration property. Scales down only when the cluster is completely idle and it has been underutilized for the last 10 minutes. Scales down exponentially, starting with 1 node.

Reference:

https://docs.databricks.com/clusters/configure.html

Total 320 questions
Go to page: of 32