ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 172 - DAS-C01 discussion

Report
Export

A financial company uses Amazon Athena to query data from an Amazon S3 data lake. Files are stored in the S3 data lake in Apache ORC format. Data analysts recently introduced nested fields in the data lake ORC files, and noticed that queries are taking longer to run in Athena. A data analysts discovered that more data than what is required is being scanned for the queries.

What is the MOST operationally efficient solution to improve query performance?

A.
Flatten nested data and create separate files for each nested dataset.
Answers
A.
Flatten nested data and create separate files for each nested dataset.
B.
Use the Athena query engine V2 and push the query filter to the source ORC file.
Answers
B.
Use the Athena query engine V2 and push the query filter to the source ORC file.
C.
Use Apache Parquet format instead of ORC format.
Answers
C.
Use Apache Parquet format instead of ORC format.
D.
Recreate the data partition strategy and further narrow down the data filter criteria.
Answers
D.
Recreate the data partition strategy and further narrow down the data filter criteria.
Suggested answer: B

Explanation:

This solution meets the requirement because:

The Athena query engine V2 is a new version of the Athena query engine that introduces several improvements and new features, such as federated queries, geospatial functions, prepared statements, schema evolution support, and more1.

One of the improvements of the Athena query engine V2 is that it supports predicate pushdown for nested fields in ORC files. Predicate pushdown is a technique that allows filtering data at the source before it is scanned and loaded into memory.This can reduce the amount of data scanned and processed by Athena, which can improve query performance and reduce cost12.

By using the Athena query engine V2 and pushing the query filter to the source ORC file, the data analysts can leverage the predicate pushdown feature for nested fields and avoid scanning more data than what is required for the queries. This can improve query performance without changing the data format or partitioning strategy.

asked 16/09/2024
Kristian Gutierrez
47 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first