ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 175 - MLS-C01 discussion

Report
Export

A technology startup is using complex deep neural networks and GPU compute to recommend the company's products to its existing customers based upon each customer's habits and interactions. The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a TensorFlow model pulled from the company's Git repository that runs locally. This job then runs for several hours while continually outputting its progress to the same S3 bucket. The job can be paused, restarted, and continued at any time in the event of a failure, and is run from a central queue.

Senior managers are concerned about the complexity of the solution's resource management and the costs involved in repeating the process regularly. They ask for the workload to be automated so it runs once a week, starting Monday and completing by the close of business Friday.

Which architecture should be used to scale the solution at the lowest cost?

A.
Implement the solution using AWS Deep Learning Containers and run the container as a job using AWS Batch on a GPU-compatible Spot Instance
Answers
A.
Implement the solution using AWS Deep Learning Containers and run the container as a job using AWS Batch on a GPU-compatible Spot Instance
B.
Implement the solution using a low-cost GPU-compatible Amazon EC2 instance and use the AWS Instance Scheduler to schedule the task
Answers
B.
Implement the solution using a low-cost GPU-compatible Amazon EC2 instance and use the AWS Instance Scheduler to schedule the task
C.
Implement the solution using AWS Deep Learning Containers, run the workload using AWS Fargate running on Spot Instances, and then schedule the task using the built-in task scheduler
Answers
C.
Implement the solution using AWS Deep Learning Containers, run the workload using AWS Fargate running on Spot Instances, and then schedule the task using the built-in task scheduler
D.
Implement the solution using Amazon ECS running on Spot Instances and schedule the task using the ECS service scheduler
Answers
D.
Implement the solution using Amazon ECS running on Spot Instances and schedule the task using the ECS service scheduler
Suggested answer: A

Explanation:

The best architecture to scale the solution at the lowest cost is to implement the solution using AWS Deep Learning Containers and run the container as a job using AWS Batch on a GPU-compatible Spot Instance. This option has the following advantages:

AWS Deep Learning Containers: These are Docker images that are pre-installed and optimized with popular deep learning frameworks such as TensorFlow, PyTorch, and MXNet. They can be easily deployed on Amazon EC2, Amazon ECS, Amazon EKS, and AWS Fargate. They can also be integrated with AWS Batch to run containerized batch jobs. Using AWS Deep Learning Containers can simplify the setup and configuration of the deep learning environment and reduce the complexity of the resource management.

AWS Batch: This is a fully managed service that enables you to run batch computing workloads on AWS. You can define compute environments, job queues, and job definitions to run your batch jobs. You can also use AWS Batch to automatically provision compute resources based on the requirements of the batch jobs. You can specify the type and quantity of the compute resources, such as GPU instances, and the maximum price you are willing to pay for them. You can also use AWS Batch to monitor the status and progress of your batch jobs and handle any failures or interruptions.

GPU-compatible Spot Instance: This is an Amazon EC2 instance that uses a spare compute capacity that is available at a lower price than the On-Demand price. You can use Spot Instances to run your deep learning training jobs at a lower cost, as long as you are flexible about when your instances run and how long they run. You can also use Spot Instances with AWS Batch to automatically launch and terminate instances based on the availability and price of the Spot capacity. You can also use Spot Instances with Amazon EBS volumes to store your datasets, checkpoints, and logs, and attach them to your instances when they are launched. This way, you can preserve your data and resume your training even if your instances are interrupted.

References:

AWS Deep Learning Containers

AWS Batch

Amazon EC2 Spot Instances

Using Amazon EBS Volumes with Amazon EC2 Spot Instances

asked 16/09/2024
Michael Sheard
45 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first