List of questions
Related questions
Question 2 - BDS-C00 discussion
A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage. Which AWS service strategy is best for this use case?
A.
Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.
B.
Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.
C.
Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.
D.
Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.
Your answer:
0 comments
Sorted by
Leave a comment first