List of questions
Related questions
Question 304 - MLS-C01 discussion
An ecommerce company wants to train a large image classification model with 10.000 classes. The company runs multiple model training iterations and needs to minimize operational overhead and cost. The company also needs to avoid loss of work and model retraining.
Which solution will meet these requirements?
Create the training jobs as AWS Batch jobs that use Amazon EC2 Spot Instances in a managed compute environment.
Use Amazon EC2 Spot Instances to run the training jobs. Use a Spot Instance interruption notice to save a snapshot of the model to Amazon S3 before an instance is terminated.
Use AWS Lambda to run the training jobs. Save model weights to Amazon S3.
Use managed spot training in Amazon SageMaker. Launch the training jobs with checkpointing enabled.
0 comments
Leave a comment first