ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 182 - DAS-C01 discussion

Report
Export

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's data analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data.

The amount of data that is ingested into Amazon S3 has increased to 5 PB over time. The query latency also has increased. The company needs to segment the data to reduce the amount of data that is scanned.

Which solutions will improve query performance? (Select TWO.)

A.
Use MySQL Workbench on an Amazon EC2 instance. Connect to Athena by using a JDBC connector. Run the query from MySQL Workbench instead of Athena directly.
Answers
A.
Use MySQL Workbench on an Amazon EC2 instance. Connect to Athena by using a JDBC connector. Run the query from MySQL Workbench instead of Athena directly.
B.
Configure Athena to use S3 Select to load only the files of the data subset.
Answers
B.
Configure Athena to use S3 Select to load only the files of the data subset.
C.
Create the data subset in Apache Parquet format each day by using the Athena CREATE TABLE AS SELECT (CTAS) statement. Query the Parquet data.
Answers
C.
Create the data subset in Apache Parquet format each day by using the Athena CREATE TABLE AS SELECT (CTAS) statement. Query the Parquet data.
D.
Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.
Answers
D.
Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.
E.
Create an S3 gateway endpoint. Configure VPC routing to access Amazon S3 through the gateway endpoint.
Answers
E.
Create an S3 gateway endpoint. Configure VPC routing to access Amazon S3 through the gateway endpoint.
Suggested answer: B, C

Explanation:

This solution will improve query performance because:

Apache Parquet is a columnar storage format that is optimized for analytics and supports compression1.Parquet files can reduce the amount of data scanned and transferred by Athena, thus improving performance and reducing cost1.

The Athena CREATE TABLE AS SELECT (CTAS) statement allows you to create a new table from the results of a SELECT query2.You can use this statement to convert the CSV files to Parquet format and store them in a different location in S32.You can also specify partitioning keys for the new table, which can further improve query performance by filtering out irrelevant data2.

Querying the Parquet data will be faster and cheaper than querying the CSV data, as Parquet files are more efficient for analytical queries1.

C) Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.

This solution will improve query performance because:

AWS Glue is a fully managed extract, transform, and load (ETL) service that can be used to prepare and load data for analytics3.You can use AWS Glue to create a job that copies the CSV files from the source S3 bucket to a new S3 bucket, and converts them to Apache Parquet format3.

asked 16/09/2024
Juy Juy
39 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first