You have deployed a scikit-learn model to a Vertex Al endpoint using a custom model server. You enabled auto scaling; however, the deployed model fails to scale beyond one replica, which led to dropped requests. You notice that CPU utilization remains low even during periods of high load. What should you do?

Question

Amir Arefi · Accepted Answer

Increase the number of workers in your model server.

Amir Arefi · Answer

Attach a GPU to the prediction nodes.

Amir Arefi · Answer

Schedule scaling of the nodes to match expected demand.

Amir Arefi · Answer

Increase the minReplicaCount in your DeployedModel configuration.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 207 - Professional Machine Learning Engineer discussion

Suggested answer: B

0 comments