ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 193 - MLS-C01 discussion

Report
Export

A machine learning specialist is running an Amazon SageMaker endpoint using the built-in object detection algorithm on a P3 instance for real-time predictions in a company's production application. When evaluating the model's resource utilization, the specialist notices that the model is using only a fraction of the GPU.

Which architecture changes would ensure that provisioned resources are being utilized effectively?

A.
Redeploy the model as a batch transform job on an M5 instance.
Answers
A.
Redeploy the model as a batch transform job on an M5 instance.
B.
Redeploy the model on an M5 instance. Attach Amazon Elastic Inference to the instance.
Answers
B.
Redeploy the model on an M5 instance. Attach Amazon Elastic Inference to the instance.
C.
Redeploy the model on a P3dn instance.
Answers
C.
Redeploy the model on a P3dn instance.
D.
Deploy the model onto an Amazon Elastic Container Service (Amazon ECS) cluster using a P3 instance.
Answers
D.
Deploy the model onto an Amazon Elastic Container Service (Amazon ECS) cluster using a P3 instance.
Suggested answer: B

Explanation:

The best way to ensure that provisioned resources are being utilized effectively is to redeploy the model on an M5 instance and attach Amazon Elastic Inference to the instance. Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75%. By using Amazon Elastic Inference, you can choose the instance type that is best suited to the overall CPU and memory needs of your application, and then separately configure the amount of inference acceleration that you need with no code changes. This way, you can avoid wasting GPU resources and pay only for what you use.

Option A is incorrect because a batch transform job is not suitable for real-time predictions. Batch transform is a high-performance and cost-effective feature for generating inferences using your trained models. Batch transform manages all of the compute resources required to get inferences. Batch transform is ideal for scenarios where you're working with large batches of data, don't need sub-second latency, or need to process data that is stored in Amazon S3.

Option C is incorrect because redeploying the model on a P3dn instance would not improve the resource utilization. P3dn instances are designed for distributed machine learning and high performance computing applications that need high network throughput and packet rate performance. They are not optimized for inference workloads.

Option D is incorrect because deploying the model onto an Amazon ECS cluster using a P3 instance would not ensure that provisioned resources are being utilized effectively. Amazon ECS is a fully managed container orchestration service that allows you to run and scale containerized applications on AWS. However, using Amazon ECS would not address the issue of underutilized GPU resources. In fact, it might introduce additional overhead and complexity in managing the cluster.

References:

Amazon Elastic Inference - Amazon SageMaker

Batch Transform - Amazon SageMaker

Amazon EC2 P3 Instances

Amazon EC2 P3dn Instances

Amazon Elastic Container Service

asked 16/09/2024
Farrah Colson
34 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first