ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 119 - DP-203 discussion

Report
Export

You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day. You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times. What should you include in the solution?

A.
Partition by DateTime fields.
Answers
A.
Partition by DateTime fields.
B.
Sink to Azure Queue storage.
Answers
B.
Sink to Azure Queue storage.
C.
Include a watermark column.
Answers
C.
Include a watermark column.
D.
Use a JSON format for physical data storage.
Answers
D.
Use a JSON format for physical data storage.
Suggested answer: B

Explanation:

The Databricks ABS-AQS connector uses Azure Queue Storage (AQS) to provide an optimized file source that lets you find new files written to an Azure Blob storage (ABS) container without repeatedly listing all of the files. This provides two major advantages:

Lower latency: no need to list nested directory structures on ABS, which is slow and resource intensive. Lower costs: no more costly LIST API requests made to ABS.

Reference:

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/aqs

asked 02/10/2024
Naveen Nama
37 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first