ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 29 - DEA-C01 discussion

Report
Export

A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.

Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)

A.

Configure AWS Glue triggers to run the ETL jobs even/ hour.

Answers
A.

Configure AWS Glue triggers to run the ETL jobs even/ hour.

B.

Use AWS Glue DataBrewto clean and prepare the data for analytics.

Answers
B.

Use AWS Glue DataBrewto clean and prepare the data for analytics.

C.

Use AWS Lambda functions to schedule and run the ETL jobs even/ hour.

Answers
C.

Use AWS Lambda functions to schedule and run the ETL jobs even/ hour.

D.

Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.

Answers
D.

Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.

E.

Use the Redshift Data API to load transformed data into Amazon Redshift.

Answers
E.

Use the Redshift Data API to load transformed data into Amazon Redshift.

Suggested answer: A, D

Explanation:

The correct answer is to configure AWS Glue triggers to run the ETL jobs every hour and use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift. AWS Glue triggers are a way to schedule and orchestrate ETL jobs with the least operational overhead. AWS Glue connections are a way to securely connect to data sources and targets using JDBC or MongoDB drivers. AWS Glue DataBrew is a visual data preparation tool that does not support MongoDB as a data source. AWS Lambda functions are a serverless option to schedule and run ETL jobs, but they have a limit of 15 minutes for execution time, which may not be enough for complex transformations. The Redshift Data API is a way to run SQL commands on Amazon Redshift clusters without needing a persistent connection, but it does not support loading data from AWS Glue ETL jobs.Reference:

AWS Glue triggers

AWS Glue connections

AWS Glue DataBrew

[AWS Lambda functions]

[Redshift Data API]

asked 29/10/2024
Minoel Prendi
31 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first