You recently deployed a scikit-learn model to a Vertex Al endpoint You are now testing the model on live production traffic While monitoring the endpoint. you discover twice as many requests per hour than expected throughout the day You want the endpoint to efficiently scale when the demand increases in the future to prevent users from experiencing high latency What should you do?

Question

Timothy Luisterburg · Accepted Answer

Configure an appropriate minReplicaCount value based on expected baseline traffic.

Timothy Luisterburg · Answer

Deploy two models to the same endpoint and distribute requests among them evenly.

Timothy Luisterburg · Answer

Set the target utilization percentage in the autcscalir.gMetricspecs configuration to a higher value

Timothy Luisterburg · Answer

Change the model's machine type to one that utilizes GPUs.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 157 - Professional Machine Learning Engineer discussion

Suggested answer: B

0 comments