ExamGecko
Home Home / Snowflake / DSA-C02

Snowflake DSA-C02 Practice Test - Questions Answers, Page 3

Question list
Search
Search

List of questions

Search

Mark the Incorrect understanding of Data Scientist about Streams?

A.
Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views.
A.
Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views.
Answers
B.
Streams can track changes in materialized views.
B.
Streams can track changes in materialized views.
Answers
C.
Streams itself does not contain any table data.
C.
Streams itself does not contain any table data.
Answers
D.
Streams do not support repeatable read isolation.
D.
Streams do not support repeatable read isolation.
Answers
Suggested answer: B, D

Explanation:

Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views. Currently, streams cannot track changes in materialized views.

stream itself does not contain any table data. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object. When the first stream for a table is created, several hidden columns are added to the source table and begin storing change tracking metadata. These columns consume a small amount of storage. The CDC records returned when querying a stream rely on a combination of the offset stored in the stream and the change tracking metadata stored in the table. Note that for streams on views, change tracking must be enabled explicitly for the view and underlying tables to add the hidden columns to these tables.

Streams support repeatable read isolation. In repeatable read mode, multiple SQL statements within a transaction see the same set of records in a stream. This differs from the read committed mode supported for tables, in which statements see any changes made by previous statements executed within the same transaction, even though those changes are not yet committed.

The delta records returned by streams in a transaction is the range from the current position of the stream until the transaction start time. The stream position advances to the transaction start time if the transaction commits; otherwise it stays at the same position.

Data Scientist used streams in ELT (extract, load, transform) processes where new data inserted in-to a staging table is tracked by a stream. A set of SQL statements transform and insert the stream contents into a set of production tables. Raw data is coming in the JSON format, but for analysis he needs to transform it into relational columns in the production tables. which of the following Data transformation SQL function he can used to achieve the same?

A.
He could not apply Transformation on Stream table data.
A.
He could not apply Transformation on Stream table data.
Answers
B.
lateral flatten()
B.
lateral flatten()
Answers
C.
METADATA$ACTION ()
C.
METADATA$ACTION ()
Answers
D.
Transpose()
D.
Transpose()
Answers
Suggested answer: B

Explanation:

To know about lateral flatten SQL Function, please refer:

https://docs.snowflake.com/en/sql-reference/constructs/join-lateral#example-of-using-lateral-with-flatten

Which command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task?

A.
RUN TASK
A.
RUN TASK
Answers
B.
CALL TASK
B.
CALL TASK
Answers
C.
EXECUTE TASK
C.
EXECUTE TASK
Answers
D.
RUN ROOT TASK
D.
RUN ROOT TASK
Answers
Suggested answer: C

Explanation:

The EXECUTE TASK command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task. A successful run of a root task triggers a cascading run of child tasks in the DAG as their precedent task completes, as though the root task had run on its defined schedule.

This SQL command is useful for testing new or modified standalone tasks and DAGs before you enable them to execute SQL code in production.

Call this SQL command directly in scripts or in stored procedures. In addition, this command sup-ports integrating tasks in external data pipelines. Any third-party services that can authenticate into your Snowflake account and authorize SQL actions can execute the EXECUTE TASK command to run tasks.

Which of the following Snowflake parameter can be used to Automatically Suspend Tasks which are running Data science pipelines after specified Failed Runs?

A.
SUSPEND_TASK
A.
SUSPEND_TASK
Answers
B.
SUSPEND_TASK_AUTO_NUM_FAILURES
B.
SUSPEND_TASK_AUTO_NUM_FAILURES
Answers
C.
SUSPEND_TASK_AFTER_NUM_FAILURES
C.
SUSPEND_TASK_AFTER_NUM_FAILURES
Answers
D.
There is none as such available.
D.
There is none as such available.
Answers
Suggested answer: C

Explanation:

Automatically Suspend Tasks After Failed Runs

Optionally suspend tasks automatically after a specified number of consecutive runs that either fail or time out. This feature can reduce costs by suspending tasks that consume Snowflake credits but fail to run to completion. Failed task runs include runs in which the SQL code in the task body either produces a user error or times out. Task runs that are skipped, canceled, or that fail due to a sys-tem error are considered indeterminate and are not included in the count of failed task runs.

Set the SUSPEND_TASK_AFTER_NUM_FAILURES = num parameter on a standalone task or the root task in a DAG. When the parameter is set to a value greater than 0, the following behavior applies to runs of the standalone task or DAG:

Standalone tasks are automatically suspended after the specified number of consecutive task runs either fail or time out.

The root task is automatically suspended after the run of any single task in a DAG fails or times out the specified number of times in consecutive runs.

The parameter can be set when creating a task (using CREATE TASK) or later (using ALTER TASK). The setting applies to tasks that rely on either Snowflake-managed compute resources (i.e. serverless compute model) or user-managed compute resources (i.e. a virtual warehouse).

The SUSPEND_TASK_AFTER_NUM_FAILURES parameter can also be set at the account, database, or schema level. The setting applies to all standalone or root tasks contained in the modified object. Note that explicitly setting the parameter at a lower (i.e. more granular) level overrides the parameter value set at a higher level.

Mark the incorrect statement regarding Python UDF?

A.
Python UDFs can contain both new code and calls to existing packages
A.
Python UDFs can contain both new code and calls to existing packages
Answers
B.
For each row passed to a UDF, the UDF returns either a scalar (i.e. single) value or, if defined as a table function, a set of rows.
B.
For each row passed to a UDF, the UDF returns either a scalar (i.e. single) value or, if defined as a table function, a set of rows.
Answers
C.
A UDF also gives you a way to encapsulate functionality so that you can call it repeatedly from multiple places in code
C.
A UDF also gives you a way to encapsulate functionality so that you can call it repeatedly from multiple places in code
Answers
D.
A scalar function (UDF) returns a tabular value for each input row
D.
A scalar function (UDF) returns a tabular value for each input row
Answers
Suggested answer: D

Explanation:

A scalar function (UDF) returns one output row for each input row. The returned row consists of a single column/value

Data Scientist can query, process, and transform data in a which of the following ways using Snowpark Python. [Select 2]

A.
Query and process data with a DataFrame object.
A.
Query and process data with a DataFrame object.
Answers
B.
Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
B.
Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
Answers
C.
SnowPark currently do not support writing UDTF.
C.
SnowPark currently do not support writing UDTF.
Answers
D.
Transform Data using DataIKY tool with SnowPark API.
D.
Transform Data using DataIKY tool with SnowPark API.
Answers
Suggested answer: A, C

Explanation:

Query and process data with a DataFrame object. Refer to Working with DataFrames in Snowpark Python.

Convert custom lambdas and functions to user-defined functions (UDFs) that you can call to process data.

Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.

Write a stored procedure that you can call to process data, or automate with a task to build a data pipeline.

Which Python method can be used to Remove duplicates by Data scientist?

A.
remove_duplicates()
A.
remove_duplicates()
Answers
B.
duplicates()
B.
duplicates()
Answers
C.
drop_duplicates()
C.
drop_duplicates()
Answers
D.
clean_duplicates()
D.
clean_duplicates()
Answers
Suggested answer: D

Explanation:

The drop_duplicates() method removes duplicate rows.

dataframe.drop_duplicates(subset, keep, inplace, ignore_index)

Remove duplicate rows from the DataFrame:

1. import pandas as pd

2. data = {

3. 'name': ['Peter', 'Mary', 'John', 'Mary'],

4. 'age': [50, 40, 30, 40],

5. 'qualified': [True, False, False, False]

6. }

7.

8. df = pd.DataFrame(data)

9. newdf = df.drop_duplicates()

Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?

g = df.groupby(df.index.str.len())

g.aggregate({'A':len, 'B':np.sum})

A.
Computes Sum of column A values
A.
Computes Sum of column A values
Answers
B.
Computes length of column A
B.
Computes length of column A
Answers
C.
Computes length of column A and Sum of Column B values of each group
C.
Computes length of column A and Sum of Column B values of each group
Answers
D.
Computes length of column A and Sum of Column B values
D.
Computes length of column A and Sum of Column B values
Answers
Suggested answer: C

Explanation:

Computes length of column A and Sum of Column B values of each group

Consider a data frame df with columns ['A', 'B', 'C', 'D'] and rows ['r1', 'r2', 'r3']. What does the ex-pression df[lambda x : x.index.str.endswith('3')] do?

A.
Returns the row name r3
A.
Returns the row name r3
Answers
B.
Results in Error
B.
Results in Error
Answers
C.
Returns the third column
C.
Returns the third column
Answers
D.
Filters the row labelled r3
D.
Filters the row labelled r3
Answers
Suggested answer: D

Explanation:

It will Filters the row labelled r3.

Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the expression g = df.groupby(df.index.str.len()) do?

A.
Groups df based on index values
A.
Groups df based on index values
Answers
B.
Groups df based on length of each index value
B.
Groups df based on length of each index value
Answers
C.
Groups df based on index strings
C.
Groups df based on index strings
Answers
D.
Data frames cannot be grouped by index values. Hence it results in Error.
D.
Data frames cannot be grouped by index values. Hence it results in Error.
Answers
Suggested answer: D

Explanation:

Data frames cannot be grouped by index values. Hence it results in Error.

Total 65 questions
Go to page: of 7