A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage. Which AWS service strategy is best for this use case?

Question

Stan Nichols · Accepted Answer

Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.

Stan Nichols · Answer

Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.

Stan Nichols · Answer

Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.

Stan Nichols · Answer

Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 2 - BDS-C00 discussion

Suggested answer: C

0 comments