ExamGecko
Home Home / Microsoft / DP-600

Microsoft DP-600 Practice Test - Questions Answers, Page 5

Question list
Search
Search

List of questions

Search

Related questions











You have a Fabric tenant that contains a warehouse.

Several times a day. the performance of all warehouse queries degrades. You suspect that Fabric is throttling the compute used by the warehouse.

What should you use to identify whether throttling is occurring?

A.
the Capacity settings
A.
the Capacity settings
Answers
B.
the Monitoring hub
B.
the Monitoring hub
Answers
C.
dynamic management views (DMVs)
C.
dynamic management views (DMVs)
Answers
D.
the Microsoft Fabric Capacity Metrics app
D.
the Microsoft Fabric Capacity Metrics app
Answers
Suggested answer: B

Explanation:

To identify whether throttling is occurring, you should use the Monitoring hub (B). This provides a centralized place where you can monitor and manage the health, performance, and reliability of your data estate, and see if the compute resources are being throttled. Reference = The use of the Monitoring hub for performance management and troubleshooting is detailed in the Azure Synapse Analytics documentation.

You have a Fabric tenant that contains a warehouse.

A user discovers that a report that usually takes two minutes to render has been running for 45 minutes and has still not rendered.

You need to identify what is preventing the report query from completing.

Which dynamic management view (DMV) should you use?

A.
sys.dm-exec_requests
A.
sys.dm-exec_requests
Answers
B.
sys.dn_.exec._sessions
B.
sys.dn_.exec._sessions
Answers
C.
sys.dm._exec._connections
C.
sys.dm._exec._connections
Answers
D.
sys.dm_pdw_exec_requests
D.
sys.dm_pdw_exec_requests
Answers
Suggested answer: D

Explanation:

The correct DMV to identify what is preventing the report query from completing is sys.dm_pdw_exec_requests (D). This DMV is specific to Microsoft Analytics Platform System (previously known as SQL Data Warehouse), which is the environment assumed to be used here. It provides information about all queries and load commands currently running or that have recently run. Reference = You can find more about DMVs in the Microsoft documentation for Analytics Platform System.

You need to create a data loading pattern for a Type 1 slowly changing dimension (SCD).

Which two actions should you include in the process? Each correct answer presents part of the solution.

NOTE: Each correct answer is worth one point.

A.
Update rows when the non-key attributes have changed.
A.
Update rows when the non-key attributes have changed.
Answers
B.
Insert new rows when the natural key exists in the dimension table, and the non-key attribute values have changed.
B.
Insert new rows when the natural key exists in the dimension table, and the non-key attribute values have changed.
Answers
C.
Update the effective end date of rows when the non-key attribute values have changed.
C.
Update the effective end date of rows when the non-key attribute values have changed.
Answers
D.
Insert new records when the natural key is a new value in the table.
D.
Insert new records when the natural key is a new value in the table.
Answers
Suggested answer: A, D

Explanation:

For a Type 1 SCD, you should include actions that update rows when non-key attributes have changed (A), and insert new records when the natural key is a new value in the table (D). A Type 1 SCD does not track historical data, so you always overwrite the old data with the new data for a given key. Reference = Details on Type 1 slowly changing dimension patterns can be found in data warehousing literature and Microsoft's official documentation.

You are analyzing customer purchases in a Fabric notebook by using PySpanc You have the following DataFrames:

You need to join the DataFrames on the customer_id column. The solution must minimize data shuffling. You write the following code.

Which code should you run to populate the results DataFrame?

A)

B)

C)

D)

A.
Option A
A.
Option A
Answers
B.
Option B
B.
Option B
Answers
C.
Option C
C.
Option C
Answers
D.
Option D
D.
Option D
Answers
Suggested answer: A

Explanation:

The correct code to populate the results DataFrame with minimal data shuffling is Option A. Using the broadcast function in PySpark is a way to minimize data movement by broadcasting the smaller DataFrame (customers) to each node in the cluster. This is ideal when one DataFrame is much smaller than the other, as in this case with customers. Reference = You can refer to the official Apache Spark documentation for more details on joins and the broadcast hint.

You have a Fabric tenant that contains a new semantic model in OneLake.

You use a Fabric notebook to read the data into a Spark DataFrame.

You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all the string and numeric columns.

Solution: You use the following PySpark expression:

df.explain()

Does this meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

The df.explain() method does not meet the goal of evaluating data to calculate statistical functions. It is used to display the physical plan that Spark will execute. Reference = The correct usage of the explain() function can be found in the PySpark documentation.

You have a Fabric tenant that contains a new semantic model in OneLake.

You use a Fabric notebook to read the data into a Spark DataFrame.

You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all the string and numeric columns.

Solution: You use the following PySpark expression:

df.show()

Does this meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

The df.show() method also does not meet the goal. It is used to show the contents of the DataFrame, not to compute statistical functions. Reference = The usage of the show() function is documented in the PySpark API documentation.

You have a Fabric tenant that contains a new semantic model in OneLake.

You use a Fabric notebook to read the data into a Spark DataFrame.

You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all the string and numeric columns.

Solution: You use the following PySpark expression:

df .sumary ()

Does this meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: A

Explanation:

Yes, the df.summary() method does meet the goal. This method is used to compute specified statistics for numeric and string columns. By default, it provides statistics such as count, mean, stddev, min, and max. Reference = The PySpark API documentation details the summary() function and the statistics it provides.

You have a Fabric tenant that contains a takehouse named lakehouse1. Lakehouse1 contains a Delta table named Customer.

When you query Customer, you discover that the query is slow to execute. You suspect that maintenance was NOT performed on the table.

You need to identify whether maintenance tasks were performed on Customer.

Solution: You run the following Spark SQL statement:

DESCRIBE HISTORY customer

Does this meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: A

Explanation:

Yes, the DESCRIBE HISTORY statement does meet the goal. It provides information on the history of operations, including maintenance tasks, performed on a Delta table. Reference = The functionality of the DESCRIBE HISTORY statement can be verified in the Delta Lake documentation.

You have a Fabric tenant tha1 contains a takehouse named Lakehouse1. Lakehouse1 contains a Delta table named Customer.

When you query Customer, you discover that the query is slow to execute. You suspect that maintenance was NOT performed on the table.

You need to identify whether maintenance tasks were performed on Customer.

Solution: You run the following Spark SQL statement:

REFRESH TABLE customer

Does this meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

No, the REFRESH TABLE statement does not provide information on whether maintenance tasks were performed. It only updates the metadata of a table to reflect any changes on the data files. Reference = The use and effects of the REFRESH TABLE command are explained in the Spark SQL documentation.

You have a Fabric tenant tha1 contains a takehouse named Lakehouse1. Lakehouse1 contains a Delta table named Customer.

When you query Customer, you discover that the query is slow to execute. You suspect that maintenance was NOT performed on the table.

You need to identify whether maintenance tasks were performed on Customer.

Solution: You run the following Spark SQL statement:

EXPLAIN TABLE customer

Does this meet the goal?

A.
Yes
A.
Yes
Answers
B.
No
B.
No
Answers
Suggested answer: B

Explanation:

No, the EXPLAIN TABLE statement does not identify whether maintenance tasks were performed on a table. It shows the execution plan for a query. Reference = The usage and output of the EXPLAIN command can be found in the Spark SQL documentation.

Total 102 questions
Go to page: of 11