A data engineer must ingest a source of structured data that is in .csv format into an Amazon S3 data lake. The .csv files contain 15 columns. Data analysts need to run Amazon Athena queries on one or two columns of the dataset. The data analysts rarely query the entire file.
Which solution will meet these requirements MOST cost-effectively?

Question

A data engineer must ingest a source of structured data that is in .csv format into an Amazon S3 data lake. The .csv files contain 15 columns. Data analysts need to run Amazon Athena queries on one or two columns of the dataset. The data analysts rarely query the entire file.

Which solution will meet these requirements MOST cost-effectively?

Elizaveta Kutuzova · Accepted Answer

Create an AWS Glue extract, transform, and load (ETL) job to read from the .csv structured data source. Configure the job to write the data into the data lake in Apache Parquet format.

Elizaveta Kutuzova · Answer

Use an AWS Glue PySpark job to ingest the source data into the data lake in .csv format.

Elizaveta Kutuzova · Answer

Create an AWS Glue extract, transform, and load (ETL) job to read from the .csv structured data source. Configure the job to ingest the data into the data lake in JSON format.

Elizaveta Kutuzova · Answer

Use an AWS Glue PySpark job to ingest the source data into the data lake in Apache Avro format.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 53 - DEA-C01 discussion

Suggested answer: D

0 comments