A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to query the data that is in the data lake.
The company needs to identify matching records even when the records do not have a common unique identifier.
Which solution will meet this requirement?

Question

A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to query the data that is in the data lake.

The company needs to identify matching records even when the records do not have a common unique identifier.

Which solution will meet this requirement?

Artur Sierszen · Accepted Answer

Train and use the AWS Lake Formation FindMatches transform in the ETL job.

Artur Sierszen · Answer

Use Amazon Made pattern matching as part of the ETL job.

Artur Sierszen · Answer

Train and use the AWS Glue PySpark Filter class in the ETL job.

Artur Sierszen · Answer

Partition tables and use the ETL job to partition the data on a unique identifier.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 89 - DEA-C01 discussion

Suggested answer: D

0 comments