ExamGecko
Question list
Search
Search

List of questions

Search

Question 92 - DEA-C01 discussion

Report
Export

What are Common Query Problems a Data Engineer can identified using Query Profiler?

A.
"Exploding" Joins i.e Joins resulting due to a "Cartesian product"
Answers
A.
"Exploding" Joins i.e Joins resulting due to a "Cartesian product"
B.
Queries Too Large to Fit in Memory
Answers
B.
Queries Too Large to Fit in Memory
C.
Inefficient Pruning
Answers
C.
Inefficient Pruning
D.
Ineffective Data Sharing
Answers
D.
Ineffective Data Sharing
Suggested answer: A, B, C

Explanation:

"Exploding" Joins

One of the common mistakes SQL users make is joining tables without providing a join condition (resulting in a "Cartesian product"), or providing a condition where records from one table match multiple records from another table. For such queries, the Join operator produces significantly (often by orders of magnitude) more tuples than it consumes.

This can be observed by looking at the number of records produced by a Join operator in the profile interface, and typically is also reflected in Join operator consuming a lot of time.

Queries Too Large to Fit in Memory

For some operations (e.g. duplicate elimination for a huge data set), the amount of memory available for the compute resources used to execute the operation might not be sufficient to hold intermediate results. As a result, the query processing engine will start spilling the data to local disk.

If the local disk space is not sufficient, the spilled data is then saved to remote disks.

This spilling can have a profound effect on query performance (especially if remote disk is used for spilling).

Spilling statistics can be checked in Query Profile Interface.

Inefficient Pruning

Snowflake collects rich statistics on data allowing it not to read unnecessary parts of a table based on the query filters. However, for this to have an effect, the data storage order needs to be correlat-ed with the query filter attributes.

The efficiency of pruning can be observed by comparing Partitions scanned and Partitions total statistics in the TableScan operators. If the former is a small fraction of the latter, pruning is efficient. If not, the pruning did not have an effect.

Of course, pruning can only help for queries that actually filter out a significant amount of data. If the pruning statistics do not show data reduction, but there is a Filter operator above TableScan which filters out a number of records, this might signal that a different data organization might be beneficial for this query.

asked 23/09/2024
bijay ghimire
37 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first