ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 304 - MLS-C01 discussion

Report
Export

An ecommerce company wants to train a large image classification model with 10.000 classes. The company runs multiple model training iterations and needs to minimize operational overhead and cost. The company also needs to avoid loss of work and model retraining.

Which solution will meet these requirements?

A.

Create the training jobs as AWS Batch jobs that use Amazon EC2 Spot Instances in a managed compute environment.

Answers
A.

Create the training jobs as AWS Batch jobs that use Amazon EC2 Spot Instances in a managed compute environment.

B.

Use Amazon EC2 Spot Instances to run the training jobs. Use a Spot Instance interruption notice to save a snapshot of the model to Amazon S3 before an instance is terminated.

Answers
B.

Use Amazon EC2 Spot Instances to run the training jobs. Use a Spot Instance interruption notice to save a snapshot of the model to Amazon S3 before an instance is terminated.

C.

Use AWS Lambda to run the training jobs. Save model weights to Amazon S3.

Answers
C.

Use AWS Lambda to run the training jobs. Save model weights to Amazon S3.

D.

Use managed spot training in Amazon SageMaker. Launch the training jobs with checkpointing enabled.

Answers
D.

Use managed spot training in Amazon SageMaker. Launch the training jobs with checkpointing enabled.

Suggested answer: D

Explanation:

Amazon SageMaker managed spot training allows for cost-effective training by utilizing Spot Instances, which are lower-cost EC2 instances that can be interrupted when demand is high. By enabling checkpointing in SageMaker, the company can save intermediate model states to Amazon S3, allowing training to resume from the last checkpoint if interrupted. This solution minimizes operational overhead by automating the checkpointing process and resuming work after interruptions, reducing the need for retraining from scratch.

This setup provides a reliable and cost-efficient approach to training large models with minimal operational overhead and risk of data loss.

asked 31/10/2024
Heritier kandolo
47 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first