ExamGecko
Home Home / Snowflake / DEA-C01

Snowflake DEA-C01 Practice Test - Questions Answers, Page 10

Question list
Search
Search

List of questions

Search

Data Engineer wants to analyze query performance & looking out for profiling information, He went to Query/Operator Details also called Profile Overview of Query Profile Interface & searching for statistics attributes around I/O. Which of the following information he can't get from there?

A.
Percentage scanned from cache — the percentage of data scanned from the local disk cache.
A.
Percentage scanned from cache — the percentage of data scanned from the local disk cache.
Answers
B.
Bytes written — bytes written (e.g. when loading into a table).
B.
Bytes written — bytes written (e.g. when loading into a table).
Answers
C.
External bytes scanned — bytes read from an external object, e.g. a stage.
C.
External bytes scanned — bytes read from an external object, e.g. a stage.
Answers
D.
Bytes sent over the wireframe — amount of data sent over the wireframe
D.
Bytes sent over the wireframe — amount of data sent over the wireframe
Answers
E.
Bytes read from result — bytes read from the result object.
E.
Bytes read from result — bytes read from the result object.
Answers
Suggested answer: D

Explanation:

To help you analyze query performance, Query/Operator Details panel also called Profile overview panel provides two classes of profiling information:

· Execution time, broken down into categories

· Detailed statistics

Apart from Option à Bytes sent over the wireframe — amount of data sent over the wireframe , Rest of the Statistics Information provided by Query/Operator details in the Query Profile Inter-face.

To Know More about the Query/Operator Details options , please refer the link:

https://docs.snowflake.com/en/user-guide/ui-query-profile#query-operator-details

What are Common Query Problems a Data Engineer can identified using Query Profiler?

A.
"Exploding" Joins i.e Joins resulting due to a "Cartesian product"
A.
"Exploding" Joins i.e Joins resulting due to a "Cartesian product"
Answers
B.
Queries Too Large to Fit in Memory
B.
Queries Too Large to Fit in Memory
Answers
C.
Inefficient Pruning
C.
Inefficient Pruning
Answers
D.
Ineffective Data Sharing
D.
Ineffective Data Sharing
Answers
Suggested answer: A, B, C

Explanation:

"Exploding" Joins

One of the common mistakes SQL users make is joining tables without providing a join condition (resulting in a "Cartesian product"), or providing a condition where records from one table match multiple records from another table. For such queries, the Join operator produces significantly (often by orders of magnitude) more tuples than it consumes.

This can be observed by looking at the number of records produced by a Join operator in the profile interface, and typically is also reflected in Join operator consuming a lot of time.

Queries Too Large to Fit in Memory

For some operations (e.g. duplicate elimination for a huge data set), the amount of memory available for the compute resources used to execute the operation might not be sufficient to hold intermediate results. As a result, the query processing engine will start spilling the data to local disk.

If the local disk space is not sufficient, the spilled data is then saved to remote disks.

This spilling can have a profound effect on query performance (especially if remote disk is used for spilling).

Spilling statistics can be checked in Query Profile Interface.

Inefficient Pruning

Snowflake collects rich statistics on data allowing it not to read unnecessary parts of a table based on the query filters. However, for this to have an effect, the data storage order needs to be correlat-ed with the query filter attributes.

The efficiency of pruning can be observed by comparing Partitions scanned and Partitions total statistics in the TableScan operators. If the former is a small fraction of the latter, pruning is efficient. If not, the pruning did not have an effect.

Of course, pruning can only help for queries that actually filter out a significant amount of data. If the pruning statistics do not show data reduction, but there is a Filter operator above TableScan which filters out a number of records, this might signal that a different data organization might be beneficial for this query.

Michael, a Data Engineer Running a Data query to achieve Union of Data sets coming from Multi-ple data sources, later he figured out that Data processing query is taking more time than expected. He started analyzing the Query performance using query profile interface. He discovered & realized that he used UNION when the UNION ALL semantics was sufficient.

Which Extra Data Processing Operator Michael figured out while doing query profile analysis in this case which helps him to identify this performance bottlenecks?

A.
Aggregate
A.
Aggregate
Answers
B.
UNION ALL
B.
UNION ALL
Answers
C.
Flatten
C.
Flatten
Answers
D.
Join
D.
Join
Answers
E.
Filter
E.
Filter
Answers
Suggested answer: A

Explanation:

In SQL, it is possible to combine two sets of data with either UNION or UNION ALL constructs. The difference between them is that UNION ALL simply concatenates inputs, while UNION does the same, but also performs duplicate elimination.

A common mistake is to use UNION when the UNION ALL semantics are sufficient. These que-ries show in Query Profile as a UnionAll operator with an extra Aggregate operator on top (which performs duplicate elimination).

To Know more about Data Processing Operators, please do refer:

https://docs.snowflake.com/en/user-guide/ui-query-profile#operator-types

Jonas, a Lead Performance Engineer,identifed that some of the operation of his query which functionally remove the duplicates from huge data set is spilling the data to remote disk. How can he alleviate spilling to a remote disk for better query performance?

A.
Jonas can recommend using a large warehouse which effectively increase the available memory/local disk space for the operations.
A.
Jonas can recommend using a large warehouse which effectively increase the available memory/local disk space for the operations.
Answers
B.
He can Process data in smaller batches to manage workload.
B.
He can Process data in smaller batches to manage workload.
Answers
C.
Spilling do not have a profound effect on query performance (especially if remote disk is used for spilling).
C.
Spilling do not have a profound effect on query performance (especially if remote disk is used for spilling).
Answers
D.
Data Sharing can be helpful to improve query performance.
D.
Data Sharing can be helpful to improve query performance.
Answers
Suggested answer: A, B

Explanation:

For some operations (e.g. duplicate elimination for a huge data set), the amount of memory available for the compute resources used to execute the operation might not be sufficient to hold intermediate results. As a result, the query processing engine will start spilling the data to local disk.

If the local disk space is not sufficient, the spilled data is then saved to remote disks.

This spilling can have a profound effect on query performance (especially if remote disk is used for spilling). To alleviate this, It is recommend that:

· Using a larger warehouse (effectively increasing the available memory/local disk space for the operation), and/or Processing data in smaller batches.

You as Data engineer might want to consider disabling auto-suspend for a warehouse if?

A.
You have a low, fluctuating workload for the warehouse.
A.
You have a low, fluctuating workload for the warehouse.
Answers
B.
You have a heavy, steady workload for the warehouse.
B.
You have a heavy, steady workload for the warehouse.
Answers
C.
You require the warehouse to be available with delay.
C.
You require the warehouse to be available with delay.
Answers
D.
You require the warehouse to be available with no delay or lag time.
D.
You require the warehouse to be available with no delay or lag time.
Answers
Suggested answer: B, D

Explanation:

Automating Warehouse Suspension

Data Engineer might want to consider disabling auto-suspend for a warehouse if:

He/She have a heavy, steady workload for the warehouse.

He/She require the warehouse to be available with no delay or lag time. Warehouse provisioning is generally very fast (e.g. 1 or 2 seconds); however, depending on the size of the warehouse and the availability of compute resources to provision, it can take longer.

If he/she chose to disable auto-suspend, He/she must carefully consider the costs associated with running a warehouse continually, even when the warehouse is not processing queries. The costs can be significant, especially for larger warehouses (X-Large, 2X-Large, etc.).

To disable auto-suspend, Engineer must explicitly select Never in the web interface, or specify 0 or NULL in SQL.

Harry using Snowflake Enterprise Edition & decided to scale in/out the Cluster in automatic mode.

He needs to configure some warehouses as multi cluster mode and some among them in Standard mode as per needs.

If Harry is using Snowflake Enterprise Edition (or a higher edition), all his warehouses should be configured as multi-cluster warehouses only.

A.
TRUE
A.
TRUE
Answers
B.
FALSE
B.
FALSE
Answers
Suggested answer: A

Explanation:

If you are using Snowflake Enterprise Edition (or a higher edition), all your warehouses should be configured as multi-cluster warehouses.

Ryan, a Data Engineer, wants to improve the performance of large, complex queries against large data sets. He decided to Scale up underlying warehouse/cluster. What is correct Snowflake consideration while scaling up so that he can achieve better performance results? [Select all that apply]

A.
Resizing can help reduce the queuing that occurs if a warehouse does not have enough compute resources to process all the queries that are submitted concurrently.
A.
Resizing can help reduce the queuing that occurs if a warehouse does not have enough compute resources to process all the queries that are submitted concurrently.
Answers
B.
Scaling up is not intended for handling concurrency issues; instead, use additional warehouses to handle the workload or use a multi-cluster warehouse (if this feature is available for your account).
B.
Scaling up is not intended for handling concurrency issues; instead, use additional warehouses to handle the workload or use a multi-cluster warehouse (if this feature is available for your account).
Answers
C.
Snowflake supports resizing a warehouse at any time, even while running.
C.
Snowflake supports resizing a warehouse at any time, even while running.
Answers
D.
Resizing a running warehouse does not impact queries that are already being processed by the warehouse; the additional compute resources, once fully provisioned, are only used for queued and new queries.
D.
Resizing a running warehouse does not impact queries that are already being processed by the warehouse; the additional compute resources, once fully provisioned, are only used for queued and new queries.
Answers
E.
Resizing between a 5XL or 6XL warehouse to a 4XL or smaller warehouse results in a brief period during which the customer is charged for both the new warehouse and the old warehouse while the old warehouse is quiesced.
E.
Resizing between a 5XL or 6XL warehouse to a 4XL or smaller warehouse results in a brief period during which the customer is charged for both the new warehouse and the old warehouse while the old warehouse is quiesced.
Answers
Suggested answer: A, B, C, D, E

Explanation:

Resizing a warehouse generally improves query performance, particularly for larger, more complex queries. It can also help reduce the queuing that occurs if a warehouse does not have enough compute resources to process all the queries that are submitted concurrently. Note that warehouse resizing is not intended for handling concurrency issues; instead, use additional warehouses to handle the workload or use a multi-cluster warehouse (if this feature is available for your account).

Snowflake supports resizing a warehouse at any time, even while running. If a query is running slowly and you have additional queries of similar size and complexity that you want to run on the same warehouse, you might choose to resize the warehouse while it is running; however, note the following:

· Larger warehouse size is not necessarily faster; for smaller, basic queries that are already executing quickly, you may not see any significant improvement after resizing.

· Resizing a running warehouse does not impact queries that are already being processed by the warehouse; the additional compute resources, once fully provisioned, are only used for queued and new queries.

Resizing between a 5XL or 6XL warehouse to a 4XL or smaller warehouse results in a brief period during which the customer is charged for both the new warehouse and the old warehouse while the old warehouse is quiesced.

While working with Multi Cluster Warehouses, Select the incorrect understanding of Data Engineer about its usage?

A.
Multi-cluster warehouses are designed specifically for handling queuing and perfor-mance issues related to large numbers of concurrent users and/or queries.
A.
Multi-cluster warehouses are designed specifically for handling queuing and perfor-mance issues related to large numbers of concurrent users and/or queries.
Answers
B.
Unless you have a specific requirement for running in Maximized mode, multi-cluster warehouses should be configured to run in Auto-scale mode, which enables Snowflake to automatically start and stop clusters as needed.
B.
Unless you have a specific requirement for running in Maximized mode, multi-cluster warehouses should be configured to run in Auto-scale mode, which enables Snowflake to automatically start and stop clusters as needed.
Answers
C.
When choosing the minimum number of clusters for a multi-cluster warehouse keep the default value as 1.
C.
When choosing the minimum number of clusters for a multi-cluster warehouse keep the default value as 1.
Answers
D.
Multi-cluster warehouses generally improve query performance, particularly for larger, more complex queries.
D.
Multi-cluster warehouses generally improve query performance, particularly for larger, more complex queries.
Answers
E.
When choosing the maximum number of clusters for a multi-cluster warehouse set its value as large as possible.
E.
When choosing the maximum number of clusters for a multi-cluster warehouse set its value as large as possible.
Answers
Suggested answer: D

Explanation:

Resizing a warehouse generally improves query performance, particularly for larger, more complex queries.

Multi-cluster warehouses are designed specifically for handling queuing and performance issues related to large numbers of concurrent users and/or queries. In addition, multi-cluster warehouses can help automate this process if your number of users/queries tend to fluctuate.

Rest all are correct understanding.

Select the incorrect statement while working with warehouses?

A.
Compute resources waiting to shut down are considered to be in "quiesce" mode.
A.
Compute resources waiting to shut down are considered to be in "quiesce" mode.
Answers
B.
Resizing a warehouse to a larger size is useful while loading and unloading significant amounts of data.
B.
Resizing a warehouse to a larger size is useful while loading and unloading significant amounts of data.
Answers
C.
Resizing a warehouse will have any immediate impact on statements that are currently being executed by the warehouse.
C.
Resizing a warehouse will have any immediate impact on statements that are currently being executed by the warehouse.
Answers
D.
Resizing a suspended warehouse does not provision any new compute resources for the warehouse.
D.
Resizing a suspended warehouse does not provision any new compute resources for the warehouse.
Answers
Suggested answer: C

Explanation:

Resizing a warehouse doesn't have any impact on statements that are currently being executed by the warehouse. When resizing to a larger size, the new compute resources, once fully provisioned, are used only to execute statements that are already in the warehouse queue, as well as all future statements submitted to the warehouse.

Ira a Data Engineer with TESLA IT systems, looking out to Compare Traditional Partitioning vs Snowflake micro-partitions for one of the Snowflake Project implementations. Which one of the following is incorrect understanding of Ira about Micro Partitioning?

A.
All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage compared to traditional partitioning where specialized DDL required.
A.
All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage compared to traditional partitioning where specialized DDL required.
Answers
B.
All DML operations (e.g. DELETE, UPDATE, MERGE) take advantage of the under-lying micropartition metadata to facilitate and simplify table maintenance.
B.
All DML operations (e.g. DELETE, UPDATE, MERGE) take advantage of the under-lying micropartition metadata to facilitate and simplify table maintenance.
Answers
C.
Snowflake stores metadata about all rows stored in a micro-partition, including number of distinct columns.
C.
Snowflake stores metadata about all rows stored in a micro-partition, including number of distinct columns.
Answers
D.
The micro-partition metadata maintained by Snowflake enables precise pruning of col-umns in micro-partitions at query run-time, including columns containing semi-structured data.
D.
The micro-partition metadata maintained by Snowflake enables precise pruning of col-umns in micro-partitions at query run-time, including columns containing semi-structured data.
Answers
E.
In Snowflake, as data is inserted/loaded into a table, clustering metadata is collected and recorded for each micro-partition created during the process.
E.
In Snowflake, as data is inserted/loaded into a table, clustering metadata is collected and recorded for each micro-partition created during the process.
Answers
Suggested answer: C

Explanation:

What are Micro-partitions?

All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage. Each micro-partition contains between 50 MB and 500 MB of uncompressed data (note that the actual size in Snowflake is smaller because data is always stored compressed). Groups of rows in tables are mapped into individual micro-partitions, organized in a columnar fashion. This size and structure allow for extremely granular pruning of very large tables, which can be comprised of millions, or even hundreds of millions, of micro-partitions.

Snowflake stores metadata about all rows stored in a micro-partition, including:

· The range of values for each of the columns in the micro-partition.

· The number of distinct values.

· Additional properties used for both optimization and efficient query processing.

It Never stores number of columns as part of Metadata.

Rest of the statements are correct.

Total 130 questions
Go to page: of 13