ExamGecko
Home / Microsoft / DP-203 / List of questions
Ask Question

Microsoft DP-203 Practice Test - Questions Answers, Page 11

List of questions

Question 101

Report Export Collapse

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:

A workload for data engineers who will use Python and SQL.

A workload for jobs that will run notebooks that use Python, Scala, and SQL. A workload that data scientists will use to perform ad hoc analysis in Scala and R.

The enterprise architecture team at your company identifies the following standards for Databricks environments:

The data engineers must share a cluster.

The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster. All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.

You need to create the Databricks clusters for the workloads.

Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.

Does this meet the goal?

Yes
Yes
No
No
Suggested answer: B
Explanation:

We would need a High Concurrency cluster for the jobs.

Note:

Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL. A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.

Reference: https://docs.azuredatabricks.net/clusters/configure.html

asked 02/10/2024
Juan Rodriguez
47 questions

Question 102

Report Export Collapse

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:

A workload for data engineers who will use Python and SQL.

A workload for jobs that will run notebooks that use Python, Scala, and SQL. A workload that data scientists will use to perform ad hoc analysis in Scala and R.

The enterprise architecture team at your company identifies the following standards for Databricks environments:

The data engineers must share a cluster.

The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster. All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.

You need to create the Databricks clusters for the workloads.

Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a High Concurrency cluster for the jobs. Does this meet the goal?

Yes
Yes
No
No
Suggested answer: A
Explanation:

We need a High Concurrency cluster for the data engineers and the jobs. Note: Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL. A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.

Reference: https://docs.azuredatabricks.net/clusters/configure.html

asked 02/10/2024
Pieter Louw
46 questions

Question 103

Report Export Collapse

A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data. The cloud job is configured to use 120 Streaming Units (SU).

You need to optimize performance for the Azure Stream Analytics job. Which two actions should you perform? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

Implement event ordering.
Implement event ordering.
Implement Azure Stream Analytics user-defined functions (UDF).
Implement Azure Stream Analytics user-defined functions (UDF).
Implement query parallelization by partitioning the data output.
Implement query parallelization by partitioning the data output.
Scale the SU count for the job up.
Scale the SU count for the job up.
Scale the SU count for the job down.
Scale the SU count for the job down.
Implement query parallelization by partitioning the data input.
Implement query parallelization by partitioning the data input.
Suggested answer: D, F
Explanation:

D: Scale out the query by allowing the system to process each input partition separately. F: A Stream Analytics job definition includes inputs, a query, and output. Inputs are where the job reads the data stream from.

Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization

asked 02/10/2024
Chris Morris
44 questions

Question 104

Report Export Collapse

You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake Storage Gen2 container. Which resource provider should you enable?

Microsoft.Sql
Microsoft.Sql
Microsoft.Automation
Microsoft.Automation
Microsoft.EventGrid
Microsoft.EventGrid
Microsoft.EventHub
Microsoft.EventHub
Suggested answer: C
Explanation:

Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events. Data integration scenarios often require Data Factory customers to trigger pipelines based on events happening in storage account, such as the arrival or deletion of a file in Azure Blob Storage account. Data Factory natively integrates with Azure Event Grid, which lets you trigger pipelines on such events.

Reference: https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers

asked 02/10/2024
Nicholas Johnson
46 questions

Question 105

Report Export Collapse

You plan to perform batch processing in Azure Databricks once daily. Which type of Databricks cluster should you use?

High Concurrency
High Concurrency
automated
automated
interactive
interactive
Suggested answer: C
Explanation:

Azure Databricks has two types of clusters: interactive and automated. You use interactive clusters to analyze data collaboratively with interactive notebooks. You use automated clusters to run fast and robust automated jobs. Example: Scheduled batch workloads (data engineers running ETL jobs) This scenario involves running batch job JARs and notebooks on a regular cadence through the Databricks platform. The suggested best practice is to launch a new cluster for each run of critical jobs. This helps avoid any issues (failures, missing SLA, and so on) due to an existing workload (noisy neighbor) on a shared cluster.

Reference: https://docs.databricks.com/administration-guide/cloud-configurations/aws/cmbp.html#scenario-3-scheduled-batch-workloads-data-engineers-running-etl-jobs

asked 02/10/2024
Hamid Assimi
41 questions

Question 106

Report Export Collapse

You have an Azure Data Factory instance that contains two pipelines named Pipeline1 and Pipeline2.

Pipeline1 has the activities shown in the following exhibit.

Microsoft DP-203 image Question 6 89527 10022024015849000000

Pipeline2 has the activities shown in the following exhibit.

Microsoft DP-203 image Question 6 89527 10022024015849000000

You execute Pipeline2, and Stored procedure1 in Pipeline1 fails.

What is the status of the pipeline runs?

Pipeline1 and Pipeline2 succeeded.
Pipeline1 and Pipeline2 succeeded.
Pipeline1 and Pipeline2 failed.
Pipeline1 and Pipeline2 failed.
Pipeline1 succeeded and Pipeline2 failed.
Pipeline1 succeeded and Pipeline2 failed.
Pipeline1 failed and Pipeline2 succeeded.
Pipeline1 failed and Pipeline2 succeeded.
Suggested answer: A
Explanation:

Activities are linked together via dependencies. A dependency has a condition of one of the following: Succeeded, Failed, Skipped, or Completed.

Consider Pipeline1:

If we have a pipeline with two activities where Activity2 has a failure dependency on Activity1, the pipeline will not fail just because Activity1 failed. If Activity1 fails and Activity2 succeeds, the pipeline will succeed. This scenario is treated as a try-catch block by Data Factory.

Microsoft DP-203 image Question 6 explanation 89527 10022024015849000000

The failure dependency means this pipeline reports success.

Note:

If we have a pipeline containing Activity1 and Activity2, and Activity2 has a success dependency on Activity1, it will only execute if Activity1 is successful. In this scenario, if Activity1 fails, the pipeline will fail.

Reference:

https://datasavvy.me/category/azure-data-factory/

asked 02/10/2024
Joseph Daly
48 questions

Question 107

Report Export Collapse

You have an Azure Data Factory that contains 10 pipelines.

You need to label each pipeline with its main purpose of either ingest, transform, or load. The labels must be available for grouping and filtering when using the monitoring experience in Data Factory. What should you add to each pipeline?

a resource tag
a resource tag
a correlation ID
a correlation ID
a run group ID
a run group ID
an annotation
an annotation
Suggested answer: D
Explanation:

Annotations are additional, informative tags that you can add to specific factory resources: pipelines, datasets, linked services, and triggers. By adding annotations, you can easily filter and search for specific factory resources.

Reference:

https://www.cathrinewilhelmsen.net/annotations-user-properties-azure-data-factory/

asked 02/10/2024
Azwihangwisi Ntikane
42 questions

Question 108

Report Export Collapse

You are designing a statistical analysis solution that will use custom proprietary Python functions on near real-time data from Azure Event Hubs. You need to recommend which Azure service to use to perform the statistical analysis. The solution must minimize latency. What should you recommend?

Azure Synapse Analytics
Azure Synapse Analytics
Azure Databricks
Azure Databricks
Azure Stream Analytics
Azure Stream Analytics
Azure SQL Database
Azure SQL Database
Suggested answer: C
Explanation:

Reference: https://docs.microsoft.com/en-us/azure/event-hubs/process-data-azure-stream-analytics

asked 02/10/2024
Adrien BARDE
35 questions

Question 109

Report Export Collapse

You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an Azure Synapse Analytics dedicated SQL pool.

You have a table that was created by using the following Transact-SQL statement. Which two columns should you add to the table? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

[EffectiveStartDate] [datetime] NOT NULL,
[EffectiveStartDate] [datetime] NOT NULL,
[CurrentProductCategory] [nvarchar] (100) NOT NULL,
[CurrentProductCategory] [nvarchar] (100) NOT NULL,
[EffectiveEndDate] [datetime] NULL,
[EffectiveEndDate] [datetime] NULL,
[ProductCategory] [nvarchar] (100) NOT NULL,
[ProductCategory] [nvarchar] (100) NOT NULL,
[OriginalProductCategory] [nvarchar] (100) NOT NULL,
[OriginalProductCategory] [nvarchar] (100) NOT NULL,
Suggested answer: B, E
Explanation:

A Type 3 SCD supports storing two versions of a dimension member as separate columns. The table includes a column for the current value of a member plus either the original or previous value of the member. So Type 3 uses additional columns to track one key instance of history, rather than storing additional rows to track each change like in a Type 2 SCD.

This type of tracking may be used for one or two columns in a dimension table. It is not common to use it for many members of the same table. It is often used in combination with Type 1 or Type 2 members.

Microsoft DP-203 image Question 9 explanation 89530 10022024015849000000

Reference:

https://k21academy.com/microsoft-azure/azure-data-engineer-dp203-q-a-day-2-live-session-review/

asked 02/10/2024
nosh shah
50 questions

Question 110

Report Export Collapse

Note: This question-is part of a series of questions that present the same scenario. Each question-in the series contains a unique solution that might meet the stated goals. Some question-sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question-in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once. Solution: You use a hopping window that uses a hop size of 10 seconds and a window size of 10 seconds. Does this meet the goal?

Yes
Yes
No
No
Suggested answer: B
Explanation:

Instead use a tumbling window. Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.

Reference:

https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

asked 02/10/2024
Steven Prater
44 questions
Total 341 questions
Go to page: of 35
Search

Related questions