ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 98 - DEA-C01 discussion

Report
Export

A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.

The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.

Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Select TWO.)

A.

Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.

Answers
A.

Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.

B.

Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.

Answers
B.

Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.

C.

Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.

Answers
C.

Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.

D.

Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.

Answers
D.

Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.

E.

Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on 53 events that the SQS queue receives.

Answers
E.

Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on 53 events that the SQS queue receives.

Suggested answer: A, C

Explanation:

The requirement is to update the AWS Glue Data Catalog incrementally based on S3 events. Using an S3 event-based approach is the most automated and operationally efficient solution.

A . Create an S3 event-based AWS Glue crawler:

An event-based Glue crawler can automatically update the Data Catalog when new data arrives in the S3 bucket. This ensures incremental updates with minimal operational overhead.

C . Use an AWS Lambda function to directly update the Data Catalog:

Lambda can be triggered by S3 events delivered to the SQS queue and can directly update the Glue Data Catalog, ensuring that new data is reflected in near real-time without running a full crawler.

Alternatives Considered:

B (Time-based schedule): Scheduling a crawler to run periodically adds unnecessary latency and operational overhead.

D (Manual crawler initiation): Manually starting the crawler defeats the purpose of automation.

E (AWS Step Functions): Step Functions add complexity that is not needed when Lambda can handle the updates directly.

AWS Glue Event-Driven Crawlers

Using AWS Lambda to Update Glue Catalog

asked 29/10/2024
David Guest
41 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first