A data engineer needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB.
Which solution will meet these requirements MOST cost-effectively?

Question

A data engineer needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB.

Which solution will meet these requirements MOST cost-effectively?

Nezha El Fakraoui · Accepted Answer

Write an AWS Glue Python shell job. Use pandas to transform the data.

Nezha El Fakraoui · Answer

Write a custom Python application. Host the application on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Nezha El Fakraoui · Answer

Write a PySpark ETL script. Host the script on an Amazon EMR cluster.

Nezha El Fakraoui · Answer

Write an AWS Glue PySpark job. Use Apache Spark to transform the data.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 85 - DEA-C01 discussion

Suggested answer: D

0 comments