ExamGecko
Home Home / Microsoft / DP-203

Microsoft DP-203 Practice Test - Questions Answers, Page 24

Question list
Search
Search

List of questions

Search

Related questions











You have an Azure Synapse Analytics serverless SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named storage1. The AllowedBlobpublicAccess porperty is disabled for storage1. You need to create an external data source that can be used by Azure Active Directory (Azure AD) users to access storage1 from Pool1. What should you create first?

A.
an external resource pool
A.
an external resource pool
Answers
B.
a remote service binding
B.
a remote service binding
Answers
C.
database scoped credentials
C.
database scoped credentials
Answers
D.
an external library
D.
an external library
Answers
Suggested answer: C

Explanation:

Security User must have SELECT permission on an external table to read the data. External tables access underlying Azure storage using the database scoped credential defined in data source.Note: A database scoped credential is a record that contains the authentication information that is required to connect to a resource outside SQL Server. Most credentials include a Windows user and password.Reference:https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/ develop-tables-external-tables https://docs.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact- sql

You are designing the folder structure for an Azure Data Lake Storage Gen2 account. You identify the following usage patterns:

• Users will query data by using Azure Synapse Analytics serverless SQL pools and Azure Synapse Analytics serverless Apache Spark pods.

• Most queries will include a filter on the current year or week.

• Data will be secured by data source.

You need to recommend a folder structure that meets the following requirements:

• Supports the usage patterns

• Simplifies folder security

• Minimizes query times

Which folder structure should you recommend?

A.
A.
Answers
B.
B.
Answers
C.
C.
Answers
D.
D.
Answers
E.
E.
Answers
Suggested answer: C

Explanation:

Data will be secured by data source. -> Use DataSource as top folder. Most queries will include a filter on the current year or week -> Use \YYYY\WW\ as subfolders. Common Use Cases

A common use case is to filter data stored in a date (and possibly time) folder structure such as /YYYY/MM/DD/ or /YYYY/MM/YYYY-MM-DD/. As new data is generated/sent/copied/moved to the storage account, a new folder is created for each specific time period. This strategy organises data into a maintainable folder structure.

Reference: https://www.serverlesssql.com/optimisation/azurestoragefilteringusingfilepath/

You are designing an Azure Synapse Analytics workspace.

You need to recommend a solution to provide double encryption of all the data at rest. Which two components should you include in the recommendation? Each coned answer presents part of the solution NOTE: Each correct selection is worth one point.

A.
an X509 certificate
A.
an X509 certificate
Answers
B.
an RSA key
B.
an RSA key
Answers
C.
an Azure key vault that has purge protection enabled
C.
an Azure key vault that has purge protection enabled
Answers
D.
an Azure virtual network that has a network security group (NSG)
D.
an Azure virtual network that has a network security group (NSG)
Answers
E.
an Azure Policy initiative
E.
an Azure Policy initiative
Answers
Suggested answer: A, D

Explanation:


You need to design a solution that will process streaming data from an Azure Event Hub and output the data to Azure Data Lake Storage. The solution must ensure that analysts can interactively query the streaming data. What should you use?

A.
event triggers in Azure Data Factory
A.
event triggers in Azure Data Factory
Answers
B.
Azure Stream Analytics and Azure Synapse notebooks
B.
Azure Stream Analytics and Azure Synapse notebooks
Answers
C.
Structured Streaming in Azure Databricks
C.
Structured Streaming in Azure Databricks
Answers
D.
Azure Queue storage and read-access geo-redundant storage (RA-GRS)
D.
Azure Queue storage and read-access geo-redundant storage (RA-GRS)
Answers
Suggested answer: C

Explanation:

Apache Spark Structured Streaming is a fast, scalable, and fault-tolerant stream processing API. You can use it to perform analytics on your streaming data in near real-time. With Structured Streaming, you can use SQL queries to process streaming data in the same way that you would process static data.

Azure Event Hubs is a scalable real-time data ingestion service that processes millions of data in a matter of seconds. It can receive large amounts of data from multiple sources and stream the prepared data to Azure Data Lake or Azure Blob storage.

Azure Event Hubs can be integrated with Spark Structured Streaming to perform the processing of messages in near real-time. You can query and analyze the processed data as it comes by using a Structured Streaming query and Spark SQL.

Reference:

https://k21academy.com/microsoft-azure/data-engineer/structured-streaming-with-azure-event-hubs/

You have an Azure Synapse Analytics dedicated SQL pool named pool1. You plan to implement a star schema in pool1 and create a new table named DimCustomer by using the following code.

You need to ensure that DimCustomer has the necessary columns to support a Type 2 slowly changing dimension (SCD). Which two columns should you add? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

A.
[HistoricalSalesPerson] [nvarchar] (256) NOT NULL
A.
[HistoricalSalesPerson] [nvarchar] (256) NOT NULL
Answers
B.
[EffectiveEndDate] [datetime] NOT NULL
B.
[EffectiveEndDate] [datetime] NOT NULL
Answers
C.
[PreviousModifiedDate] [datetime] NOT NULL
C.
[PreviousModifiedDate] [datetime] NOT NULL
Answers
D.
[RowID] [bigint] NOT NULL
D.
[RowID] [bigint] NOT NULL
Answers
E.
[EffectiveStartDate] [datetime] NOT NULL
E.
[EffectiveStartDate] [datetime] NOT NULL
Answers
Suggested answer: A, B

You have an Azure Data Factory pipeline named Pipeline1!. Pipelinel contains a copy activity that sends data to an Azure Data Lake Storage Gen2 account. Pipeline 1 is executed by a schedule trigger. You change the copy activity sink to a new storage account and merge the changes into the collaboration branch. After Pipelinel executes, you discover that data is NOT copied to the new storage account. You need to ensure that the data is copied to the new storage account. What should you do?

A.
Publish from the collaboration branch.
A.
Publish from the collaboration branch.
Answers
B.
Configure the change feed of the new storage account.
B.
Configure the change feed of the new storage account.
Answers
C.
Create a pull request.
C.
Create a pull request.
Answers
D.
Modify the schedule trigger.
D.
Modify the schedule trigger.
Answers
Suggested answer: A

Explanation:

CI/CD lifecycle

A development data factory is created and configured with Azure Repos Git. All developers should have permission to author Data Factory resources like pipelines and datasets. A developer creates a feature branch to make a change. They debug their pipeline runs with their most recent changes After a developer is satisfied with their changes, they create a pull request from their feature branch to the main or collaboration branch to get their changes reviewed by peers.

After a pull request is approved and changes are merged in the main branch, the changes get published to the development factory.

Reference: https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery

You have an Azure Databricks workspace that contains a Delta Lake dimension table named Tablet. Table1 is a Type 2 slowly changing dimension (SCD) table. You need to apply updates from a source table to Table1. Which Apache Spark SQL operation should you use?

A.
CREATE
A.
CREATE
Answers
B.
UPDATE
B.
UPDATE
Answers
C.
MERGE
C.
MERGE
Answers
D.
ALTER
D.
ALTER
Answers
Suggested answer: C

Explanation:

The Delta provides the ability to infer the schema for data input which further reduces the effort required in managing the schema changes. The Slowly Changing Data(SCD) Type 2 records all the changes made to each key in the dimensional table. These operations require updating the existing rows to mark the previous values of the keys as old and then inserting new rows as the latest values. Also, Given a source table with the updates and the target table with dimensional data, SCD Type 2 can be expressed with the merge.

Example:

// Implementing SCD Type 2 operation using merge function

customersTable

.as("customers")

.merge(

stagedUpdates.as("staged_updates"),

"customers.customerId = mergeKey")

.whenMatched("customers.current = true AND customers.address <> staged_updates.address") .updateExpr(Map(

"current" -> "false",

"endDate" -> "staged_updates.effectiveDate"))

.whenNotMatched()

.insertExpr(Map(

"customerid" -> "staged_updates.customerId",

"address" -> "staged_updates.address",

"current" -> "true",

"effectiveDate" -> "staged_updates.effectiveDate",

"endDate" -> "null"))

.execute()

}

Reference:

https://www.projectpro.io/recipes/what-is-slowly-changing-data-scd-type-2-operation-delta-tabledatabricks

You have an Azure Synapse Analytics dedicated SQL pool named Pool1 that contains a table named Sales. Sales has row-level security (RLS) applied. RLS uses the following predicate filter.

A user named SalesUser1 is assigned the db_datareader role for Pool1. Which rows in the Sales table are returned when SalesUser1 queries the table?

A.
only the rows for which the value in the User_Name column is SalesUser1
A.
only the rows for which the value in the User_Name column is SalesUser1
Answers
B.
all the rows
B.
all the rows
Answers
C.
only the rows for which the value in the SalesRep column is Manager
C.
only the rows for which the value in the SalesRep column is Manager
Answers
D.
only the rows for which the value in the SalesRep column is SalesUser1
D.
only the rows for which the value in the SalesRep column is SalesUser1
Answers
Suggested answer: A

You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool. You need to create a surrogate key for the table. The solution must provide the fastest query performance. What should you use for the surrogate key?

A.
a GUID column
A.
a GUID column
Answers
B.
a sequence object
B.
a sequence object
Answers
C.
an IDENTITY column
C.
an IDENTITY column
Answers
Suggested answer: C

Explanation:

Use IDENTITY to create surrogate keys using dedicated SQL pool in AzureSynapse Analytics. Note: A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.

Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-datawarehouse-tables-identity

You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload. You need to recommend a format for the transformed files. The solution must meet the following requirements:

Contain information about the data types of each column in the files. Support querying a subset of columns in the files.

Support read-heavy analytical workloads.

Minimize the file size.

What should you recommend?

A.
JSON
A.
JSON
Answers
B.
CSV
B.
CSV
Answers
C.
Apache Avro
C.
Apache Avro
Answers
D.
Apache Parquet
D.
Apache Parquet
Answers
Suggested answer: D

Explanation:

Parquet, an open-source file format for Hadoop, stores nested data structures in a flat columnar format. Compared to a traditional approach where data is stored in a row-oriented approach, Parquet file format is more efficient in terms of storage and performance. It is especially good for queries that read particular columns from a “wide” (with many columns) table since only needed columns are read, and IO is minimized.

Reference: https://www.clairvoyant.ai/blog/big-data-file-formats

Total 320 questions
Go to page: of 32