ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 237 - DP-203 discussion

Report
Export

You have an Azure Databricks workspace that contains a Delta Lake dimension table named Tablet. Table1 is a Type 2 slowly changing dimension (SCD) table. You need to apply updates from a source table to Table1. Which Apache Spark SQL operation should you use?

A.
CREATE
Answers
A.
CREATE
B.
UPDATE
Answers
B.
UPDATE
C.
MERGE
Answers
C.
MERGE
D.
ALTER
Answers
D.
ALTER
Suggested answer: C

Explanation:

The Delta provides the ability to infer the schema for data input which further reduces the effort required in managing the schema changes. The Slowly Changing Data(SCD) Type 2 records all the changes made to each key in the dimensional table. These operations require updating the existing rows to mark the previous values of the keys as old and then inserting new rows as the latest values. Also, Given a source table with the updates and the target table with dimensional data, SCD Type 2 can be expressed with the merge.

Example:

// Implementing SCD Type 2 operation using merge function

customersTable

.as("customers")

.merge(

stagedUpdates.as("staged_updates"),

"customers.customerId = mergeKey")

.whenMatched("customers.current = true AND customers.address <> staged_updates.address") .updateExpr(Map(

"current" -> "false",

"endDate" -> "staged_updates.effectiveDate"))

.whenNotMatched()

.insertExpr(Map(

"customerid" -> "staged_updates.customerId",

"address" -> "staged_updates.address",

"current" -> "true",

"effectiveDate" -> "staged_updates.effectiveDate",

"endDate" -> "null"))

.execute()

}

Reference:

https://www.projectpro.io/recipes/what-is-slowly-changing-data-scd-type-2-operation-delta-tabledatabricks

asked 02/10/2024
Ghalem benhameurlaine
31 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first