ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 100 - MLS-C01 discussion

Report
Export

A monitoring service generates 1 TB of scale metrics record data every minute A Research team performs queries on this data using Amazon Athena The queries run slowly due to the large volume of data, and the team requires better performance

How should the records be stored in Amazon S3 to improve query performance?

A.
CSV files
Answers
A.
CSV files
B.
Parquet files
Answers
B.
Parquet files
C.
Compressed JSON
Answers
C.
Compressed JSON
D.
RecordIO
Answers
D.
RecordIO
Suggested answer: B

Explanation:

Parquet is a columnar storage format that can store data in a compressed and efficient way. Parquet files can improve query performance by reducing the amount of data that needs to be scanned, as only the relevant columns are read from the files. Parquet files can also support predicate pushdown, which means that the filtering conditions are applied at the storage level, further reducing the data that needs to be processed. Parquet files are compatible with Amazon Athena, which can leverage the benefits of the columnar format and provide faster and cheaper queries. Therefore, the records should be stored in Parquet files in Amazon S3 to improve query performance.

References:

Columnar Storage Formats - Amazon Athena

Parquet SerDe - Amazon Athena

Optimizing Amazon Athena Queries - Amazon Athena

Parquet - Apache Software Foundation

asked 16/09/2024
Tudor Voicu
34 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first