ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 101 - MLS-C01 discussion

Report
Export

A Machine Learning Specialist needs to create a data repository to hold a large amount of time-based training data for a new model. In the source system, new files are added every hour Throughout a single 24-hour period, the volume of hourly updates will change significantly. The Specialist always wants to train on the last 24 hours of the data

Which type of data repository is the MOST cost-effective solution?

A.
An Amazon EBS-backed Amazon EC2 instance with hourly directories
Answers
A.
An Amazon EBS-backed Amazon EC2 instance with hourly directories
B.
An Amazon RDS database with hourly table partitions
Answers
B.
An Amazon RDS database with hourly table partitions
C.
An Amazon S3 data lake with hourly object prefixes
Answers
C.
An Amazon S3 data lake with hourly object prefixes
D.
An Amazon EMR cluster with hourly hive partitions on Amazon EBS volumes
Answers
D.
An Amazon EMR cluster with hourly hive partitions on Amazon EBS volumes
Suggested answer: C

Explanation:

An Amazon S3 data lake is a cost-effective solution for storing and analyzing large amounts of time-based training data for a new model. Amazon S3 is a highly scalable, durable, and secure object storage service that can store any amount of data in any format. Amazon S3 also offers low-cost storage classes, such as S3 Standard-IA and S3 One Zone-IA, that can reduce the storage costs for infrequently accessed data. By using hourly object prefixes, the Machine Learning Specialist can organize the data into logical partitions based on the time of ingestion. This can enable efficient data access and management, as well as support incremental updates and deletes. The Specialist can also use Amazon S3 lifecycle policies to automatically transition the data to lower-cost storage classes or delete the data after a certain period of time. This way, the Specialist can always train on the last 24 hours of the data and optimize the storage costs.

References:

What is a data lake? - Amazon Web Services

Amazon S3 Storage Classes - Amazon Simple Storage Service

Managing your storage lifecycle - Amazon Simple Storage Service

Best Practices Design Patterns: Optimizing Amazon S3 Performance

asked 16/09/2024
Steve Parnell
30 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first