ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 18 - MLS-C01 discussion

Report
Export

A company has an ecommerce website with a product recommendation engine built in TensorFlow. The recommendation engine endpoint is hosted by Amazon SageMaker. Three compute-optimized instances support the expected peak load of the website.

Response times on the product recommendation page are increasing at the beginning of each month. Some users are encountering errors. The website receives the majority of its traffic between 8 AM and 6 PM on weekdays in a single time zone.

Which of the following options are the MOST effective in solving the issue while keeping costs to a minimum? (Choose two.)

A.
Configure the endpoint to use Amazon Elastic Inference (EI) accelerators.
Answers
A.
Configure the endpoint to use Amazon Elastic Inference (EI) accelerators.
B.
Create a new endpoint configuration with two production variants.
Answers
B.
Create a new endpoint configuration with two production variants.
C.
Configure the endpoint to automatically scale with the Invocations Per Instance metric.
Answers
C.
Configure the endpoint to automatically scale with the Invocations Per Instance metric.
D.
Deploy a second instance pool to support a blue/green deployment of models.
Answers
D.
Deploy a second instance pool to support a blue/green deployment of models.
E.
Reconfigure the endpoint to use burstable instances.
Answers
E.
Reconfigure the endpoint to use burstable instances.
Suggested answer: A, C

Explanation:

The solution A and C are the most effective in solving the issue while keeping costs to a minimum. The solution A and C involve the following steps:

Configure the endpoint to use Amazon Elastic Inference (EI) accelerators. This will enable the company to reduce the cost and latency of running TensorFlow inference on SageMaker. Amazon EI provides GPU-powered acceleration for deep learning models without requiring the use of GPU instances.Amazon EI can attach to any SageMaker instance type and provide the right amount of acceleration based on the workload1.

Configure the endpoint to automatically scale with the Invocations Per Instance metric. This will enable the company to adjust the number of instances based on the demand and traffic patterns of the website. The Invocations Per Instance metric measures the average number of requests that each instance processes over a period of time. By using this metric, the company can scale out the endpoint when the load increases and scale in when the load decreases.This can improve the response time and availability of the product recommendation engine2.

The other options are not suitable because:

Option B: Creating a new endpoint configuration with two production variants will not solve the issue of increasing response time and errors. Production variants are used to split the traffic between different models or versions of the same model. They can be useful for testing, updating, or A/B testing models.However, they do not provide any scaling or acceleration benefits for the inference workload3.

Option D: Deploying a second instance pool to support a blue/green deployment of models will not solve the issue of increasing response time and errors. Blue/green deployment is a technique for updating models without downtime or disruption. It involves creating a new endpoint configuration with a different instance pool and model version, and then shifting the traffic from the old endpoint to the new endpoint gradually.However, this technique does not provide any scaling or acceleration benefits for the inference workload4.

Option E: Reconfiguring the endpoint to use burstable instances will not solve the issue of increasing response time and errors. Burstable instances are instances that provide a baseline level of CPU performance with the ability to burst above the baseline when needed. They can be useful for workloads that have moderate CPU utilization and occasional spikes. However, they are not suitable for workloads that have high and consistent CPU utilization, such as the product recommendation engine.Moreover, burstable instances may incur additional charges when they exceed their CPU credits5.

References:

1: Amazon Elastic Inference

2: How to Scale Amazon SageMaker Endpoints

3: Deploying Models to Amazon SageMaker Hosting Services

4: Updating Models in Amazon SageMaker Hosting Services

5: Burstable Performance Instances

asked 16/09/2024
Amin Dashti
51 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first