A company has set up and deployed its machine learning (ML) model into production with an endpoint using Amazon SageMaker hosting services. The ML team has configured automatic scaling for its SageMaker instances to support workload changes. During testing, the team notices that additional instances are being launched before the new instances are ready. This behavior needs to change as soon as possible.
How can the ML team solve this issue?

Question

A company has set up and deployed its machine learning (ML) model into production with an endpoint using Amazon SageMaker hosting services. The ML team has configured automatic scaling for its SageMaker instances to support workload changes. During testing, the team notices that additional instances are being launched before the new instances are ready. This behavior needs to change as soon as possible.

How can the ML team solve this issue?

Srinivasan Krishnamoorthy · Accepted Answer

Increase the cooldown period for the scale-out activity.

Srinivasan Krishnamoorthy · Answer

Decrease the cooldown period for the scale-in activity. Increase the configured maximum capacity of instances.

Srinivasan Krishnamoorthy · Answer

Replace the current endpoint with a multi-model endpoint using SageMaker.

Srinivasan Krishnamoorthy · Answer

Set up Amazon API Gateway and AWS Lambda to trigger the SageMaker inference endpoint.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 33 - MLS-C01 discussion

Suggested answer: D

0 comments