ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 197 - DAS-C01 discussion

Report
Export

A company receives datasets from partners at various frequencies. The datasets include baseline data and incremental data. The company needs to merge and store all the datasets without reprocessing the data.

Which solution will meet these requirements with the LEAST development effort?

A.
Use an AWS Glue job with a temporary table to process the datasets. Store the data in an Amazon RDS table.
Answers
A.
Use an AWS Glue job with a temporary table to process the datasets. Store the data in an Amazon RDS table.
B.
Use an Apache Spark job in an Amazon EMR cluster to process the datasets. Store the data in EMR File System (EMRFS).
Answers
B.
Use an Apache Spark job in an Amazon EMR cluster to process the datasets. Store the data in EMR File System (EMRFS).
C.
Use an AWS Glue job with job bookmarks enabled to process the datasets. Store the data in Amazon S3.
Answers
C.
Use an AWS Glue job with job bookmarks enabled to process the datasets. Store the data in Amazon S3.
D.
Use an AWS Lambda function to process the datasets. Store the data in Amazon S3.
Answers
D.
Use an AWS Lambda function to process the datasets. Store the data in Amazon S3.
Suggested answer: C

Explanation:

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics1. It can process datasets from various sources and formats, such as JDBC, Amazon S3, Amazon RDS, etc.

AWS Glue job bookmarks are a feature that helps AWS Glue track data that has already been processed during a previous run of an ETL job.This can prevent the reprocessing of old data and enable the processing of new data when rerunning on a scheduled interval2. Job bookmarks can handle both baseline data and incremental data from different sources.

Amazon S3 is a highly scalable, durable, and secure object storage service that can store any amount and type of data3. It can be used as a data lake to store the merged and processed datasets from AWS Glue. It can also integrate with other AWS services, such as Amazon Athena, Amazon Redshift Spectrum, Amazon EMR, etc., for further analysis and processing.

asked 16/09/2024
Eric Zarghami
51 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first