Snowflake DEA-C01 Practice Test - Questions Answers
List of questions
Related questions
Question 1
Streams cannot be created to query change data on which of the following objects? [Select All that Apply]
Explanation:
Streams supports all the listed objects except Query Log tables.
Question 2
Tasks may optionally use table streams to provide a convenient way to continuously process new or changed dat a. A task can transform new or changed rows that a stream surfaces. Each time a task is scheduled to run, it can verify whether a stream contains change data for a table and either consume the change data or skip the current run if no change data exists. Which System Function can be used by Data engineer to verify whether a stream contains changed data for a table?
Explanation:
SYSTEM$STREAM_HAS_DATA Indicates whether a specified stream contains change data capture (CDC) records.
Question 3
Question 4
Mark a Data Engineer, looking to implement streams on local views & want to use change tracking metadata for one of its Data Loading use case. Please select the incorrect understanding points of Mark with respect to usage of Streams on Views?
Explanation:
A stream object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data. This process is referred to as change data capture (CDC). An individual table stream tracks the changes made to rows in a source table. A table stream (also referred to as simply a "stream") makes a "change table" available of what changed, at the row level, between two transactional points of time in a table. This allows querying and consuming a sequence of change records in a transactional fashion.
Streams can be created to query change data on the following objects:
· Standard tables, including shared tables.
· Views, including secure views · Directory tables · External tables
When created, a stream logically takes an initial snapshot of every row in the source object (e.g. table, external table, or the underlying tables for a view) by initializing a point in time (called an offset) as the current transactional version of the object. The change tracking system utilized by the stream then records information about the DML changes after this snapshot was taken. Change records provide the state of a row before and after the change. Change information mirrors the column structure of the tracked source object and includes additional metadata columns that describe each change event.
Note that a stream itself does not contain any table data. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object.
When the first stream for a table is created, a pair of hidden columns are added to the source table and begin storing change tracking metadata. These columns consume a small amount of storage. The CDC records returned when querying a stream rely on a combination of the offset stored in the stream and the change tracking metadata stored in the table. Note that for streams on views, change tracking must be enabled explicitly for the view and underlying tables to add the hidden columns to these tables.
Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views. Currently, streams cannot track changes in materialized views.
Views with the following operations are not yet supported:
· GROUP BY clauses · QUALIFY clauses · Subqueries not in the FROM clause · Correlated subqueries · LIMIT clauses Change Tracking:
Change tracking must be enabled in the underlying tables.
Prior to creating a stream on a view, you must enable change tracking on the underlying tables for the view.
Set the CHANGE_TRACKING parameter when creating a view (using CREATE VIEW) or later (using ALTER VIEW).
As an alternative to streams, Snowflake supports querying change tracking metadata for tables or views using the CHANGES clause for SELECT statements. The CHANGES clause enables query-ing change tracking metadata between two points in time without having to create a stream with an explicit transactional offset.
Question 5
To advance the offset of a stream to the current table version without consuming the change data in a DML operation, which of the following operations can be done by Data Engineer? [Select 2]
Explanation:
When created, a stream logically takes an initial snapshot of every row in the source object (e.g. table, external table, or the underlying tables for a view) by initializing a point in time (called an offset) as the current transactional version of the object. The change tracking system utilized by the stream then records information about the DML changes after this snapshot was taken. Change records provide the state of a row before and after the change. Change information mirrors the column structure of the tracked source object and includes additional metadata columns that describe each change event.
Note that a stream itself does not contain any table data. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object.
A new table version is created whenever a transaction that includes one or more DML statements is committed to the table.
In the transaction history for a table, a stream offset is located between two table versions. Querying a stream returns the changes caused by transactions committed after the offset and at or before the current time.
Multiple queries can independently consume the same change data from a stream without changing the offset. A stream advances the offset only when it is used in a DML transaction. This behavior applies to both explicit and autocommit transactions. (By default, when a DML statement is execut-ed, an autocommit transaction is implicitly started and the transaction is committed at the completion of the statement. This behavior is controlled with the AUTOCOMMIT parameter.) Querying a stream alone does not advance its offset, even within an explicit transaction; the stream contents must be consumed in a DML statement.
To advance the offset of a stream to the current table version without consuming the change data in a DML operation, complete either of the following actions:
· Recreate the stream (using the CREATE OR REPLACE STREAM syntax).
Insert the current change data into a temporary table. In the INSERT statement, query the stream but include a WHERE clause that filters out all of the change data (e.g. WHERE 0 = 1).
Question 6
Data Engineer is performing below steps in sequence while working on Stream s1 created on table t1.
Step 1: Begin transaction.
Step 2: Query stream s1 on table t1.
Step 3: Update rows in table t1.
Step 4: Query stream s1.
Step 5: Commit transaction.
Step 6: Begin transaction.
Step 7: Query stream s1.
Mark the Incorrect Operational statements:
Explanation:
Streams support repeatable read isolation. In repeatable read mode, multiple SQL statements within a transaction see the same set of records in a stream. This differs from the read committed mode supported for tables, in which statements see any changes made by previous statements executed within the same transaction, even though those changes are not yet committed.
The delta records returned by streams in a transaction is the range from the current position of the stream until the transaction start time. The stream position advances to the transaction start time if the transaction commits; otherwise, it stays at the same position.
Within Transaction 1, all queries to stream s1 see the same set of records. DML changes to table t1 are recorded to the stream only when the transaction is committed.
In Transaction 2, queries to the stream see the changes recorded to the table in Transaction 1. Note that if Transaction 2 had begun before Transaction 1 was committed, queries to the stream would have returned a snapshot of the stream from the position of the stream to the beginning time of Transaction 2 and would not see any changes committed by Transaction 1.
Question 7
Streams record the differences between two offsets. If a row is added and then updated in the current offset, what will be the value of METADATA^^SUPDATE Columns in this scenario?
Explanation:
Stream Columns
A stream stores an offset for the source object and not any actual table columns or data. When queried, a stream accesses and returns the historic data in the same shape as the source object (i.e. the same column names and ordering) with the following additional columns:
METADATA$ACTION Indicates the DML operation (INSERT, DELETE) recorded.
METADATA^^SUPDATE Indicates whether the operation was part of an UPDATE statement. Updates to rows in the source object are represented as a pair of DELETE and INSERT records in the stream with a metadata column METADATA^^SUPDATE values set to TRUE.
METADATA$ROW_ID Specifies the unique and immutable ID for the row, which can be used to track changes to specific rows over time.
Note that streams record the differences between two offsets. If a row is added and then updated in the current offset, the delta change is a new row. The METADATA^^SUPDATE row records a FALSE value.
Question 8
Mark the Incorrect Statements with respect to types of streams supported by Snowflake?
Explanation:
Standard Stream:
Supported for streams on tables, directory tables, or views. A standard (i.e. delta) stream tracks all
DML changes to the source object, including inserts, updates, and deletes (including table truncates).
This stream type performs a join on inserted and deleted rows in the change set to provide the row level delta. As a net effect, for example, a row that is inserted and then deleted between two transactional points of time in a table is removed in the delta (i.e. is not returned when the stream is queried).
Append-only Stream:
Supported for streams on standard tables, directory tables, or views. An append-only stream tracks row inserts only. Update and delete operations (including table truncates) are not recorded. For example, if 10 rows are inserted into a table and then 5 of those rows are deleted before the offset for an append-only stream is advanced, the stream records 10 rows.
An append-only stream returns the appended rows only and therefore can be much more performant than a standard stream for extract, load, transform (ELT) and similar scenarios that depend exclu-sively on row inserts. For example, a source table can be truncated immediately after the rows in an append-only stream are consumed, and the record deletions do not contribute to the overhead the next time the stream is queried or consumed.
Insert-only Stream:
Supported for streams on external tables only. An insert-only stream tracks row inserts only; they do not record delete operations that remove rows from an inserted set (i.e. no-ops). For example, inbetween any two offsets, if File1 is removed from the cloud storage location referenced by the external table, and File2 is added, the stream returns records for the rows in File2 only. Unlike when tracking CDC data for standard tables, Snowflake cannot access the historical records for files in cloud storage.
Question 9
Stuart, a Lead Data Engineer in MACRO Data Company created streams on set of External tables. He has been asked to extend the data retention period of the stream for 90 days, which parameter he can utilize to enable this extension?
Explanation:
External tables do not have data retention period applicable.
Good to Understand other Options available.
DATA_RETENTION_TIME_IN_DAYS Type: Object (for databases, schemas, and tables) — Can be set for Account » Database » Schema » Table
Description: Number of days for which Snowflake retains historical data for performing Time Trav-el actions (SELECT, CLONE, UNDROP) on the object. A value of 0 effectively disables Time Travel for the specified database, schema, or table.
Values:
0 or 1 (for Standard Edition)
0 to 90 (for Enterprise Edition or higher)
Default:
1
MAX_DATA_EXTENSION_TIME_IN_DAYS
Type: Object (for databases, schemas, and tables) — Can be set for Account » Database » Schema »
Table
Description: Maximum number of days for which Snowflake can extend the data retention period for tables to prevent streams on the tables from becoming stale. By default, if the DATA_ RETENTION_TIME_IN_DAYS setting for a source table is less than 14 days, and a stream has not been consumed, Snowflake temporarily extends this period to the stream's offset, up to a maximum of 14 days, regardless of the Snowflake Edition for your account. The MAX_DATA_EXTENSION_TIME_IN_DAYS parameter enables you to limit this automatic ex-tension period to control storage costs for data retention or for compliance reasons.
This parameter can be set at the account, database, schema, and table levels. Note that setting the parameter at the account or schema level only affects tables for which the parameter has not already been explicitly set at a lower level (e.g. at the table level by the table owner). A value of 0 effective-ly disables the automatic extension for the specified database, schema, or table.
Values:
0 to 90 (i.e. 90 days) — a value of 0 disables the automatic extension of the data retention period. To increase the maximum value for tables in your account, Client needs to contact Snowflake Sup-port.
Default: 14
Question 10
Ron, Snowflake Developer needs to capture change data (insert only) on the source views, for that he follows the below steps:
Enable change tracking on the source views & its underlying tables.
Inserted the data via Scripts scheduled with the help of Tasks.
then simply run the below Select statements.
Explanation:
As an alternative to streams, Snowflake supports querying change tracking metadata for tables or views using the CHANGES clause for SELECT statements. The CHANGES clause enables query-ing change tracking metadata between two points in time without having to create a stream with an explicit transactional offset.
To Know more about Snowflake CHANGES clause, please refer the mentioned link:
https://docs.snowflake.com/en/sql-reference/constructs/changes
Question