ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 195 - MLS-C01 discussion

Report
Export

A data scientist needs to identify fraudulent user accounts for a company's ecommerce platform. The company wants the ability to determine if a newly created account is associated with a previously known fraudulent user. The data scientist is using AWS Glue to cleanse the company's application logs during ingestion.

Which strategy will allow the data scientist to identify fraudulent accounts?

A.
Execute the built-in FindDuplicates Amazon Athena query.
Answers
A.
Execute the built-in FindDuplicates Amazon Athena query.
B.
Create a FindMatches machine learning transform in AWS Glue.
Answers
B.
Create a FindMatches machine learning transform in AWS Glue.
C.
Create an AWS Glue crawler to infer duplicate accounts in the source data.
Answers
C.
Create an AWS Glue crawler to infer duplicate accounts in the source data.
D.
Search for duplicate accounts in the AWS Glue Data Catalog.
Answers
D.
Search for duplicate accounts in the AWS Glue Data Catalog.
Suggested answer: B

Explanation:

The best strategy to identify fraudulent accounts is to create a FindMatches machine learning transform in AWS Glue. The FindMatches transform enables you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. This can help you improve fraud detection by finding accounts that are associated with a previously known fraudulent user. You can teach the FindMatches transform your definition of a ''duplicate'' or a ''match'' through examples, and it will use machine learning to identify other potential duplicates or matches in your dataset. You can then use the FindMatches transform in your AWS Glue ETL jobs to cleanse your data.

Option A is incorrect because there is no built-in FindDuplicates Amazon Athena query. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. However, Amazon Athena does not provide a predefined query to find duplicate records in a dataset. You would have to write your own SQL query to perform this task, which might not be as effective or accurate as using the FindMatches transform.

Option C is incorrect because creating an AWS Glue crawler to infer duplicate accounts in the source data is not a valid strategy. An AWS Glue crawler is a program that connects to a data store, progresses through a prioritized list of classifiers to determine the schema for your data, and then creates metadata tables in the AWS Glue Data Catalog. A crawler does not perform any data cleansing or record matching tasks.

Option D is incorrect because searching for duplicate accounts in the AWS Glue Data Catalog is not a feasible strategy. The AWS Glue Data Catalog is a central repository to store structural and operational metadata for your data assets. The Data Catalog does not store the actual data, but rather the metadata that describes where the data is located, how it is formatted, and what it contains. Therefore, you cannot search for duplicate records in the Data Catalog.

References:

Record matching with AWS Lake Formation FindMatches - AWS Glue

Amazon Athena -- Interactive SQL Queries for Data in Amazon S3

AWS Glue Crawlers - AWS Glue

AWS Glue Data Catalog - AWS Glue

asked 16/09/2024
Amir Trujillo
38 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first