ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 2 - BDS-C00 discussion

Report
Export

A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage. Which AWS service strategy is best for this use case?

A.
Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.
Answers
A.
Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.
B.
Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.
Answers
B.
Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.
C.
Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.
Answers
C.
Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.
D.
Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.
Answers
D.
Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.
Suggested answer: C

Explanation:

Reference: https://aws.amazon.com/blogs/database/indexing-metadata-in-amazon-elasticsearch-service-using-aws-lambda-and-python/

asked 16/09/2024
Stan Nichols
32 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first