A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.
The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.
Which solutions will meet these requirements? (Choose two.)

Question

A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.

The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.

Which solutions will meet these requirements? (Choose two.)

Arun Samuel · Accepted Answer

Create an AWS Glue partition index. Enable partition filtering.

Arun Samuel · Accepted Answer

Use Athena partition projection based on the S3 bucket prefix.

Arun Samuel · Answer

Bucket the data based on a column that the data have in common in a WHERE clause of the user query

Arun Samuel · Answer

Transform the data that is in the S3 bucket to Apache Parquet format.

Arun Samuel · Answer

Use the Amazon EMR S3DistCP utility to combine smaller objects in the S3 bucket into larger objects.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 48 - DEA-C01 discussion

Suggested answer: A, C

0 comments