ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 93 - Professional Machine Learning Engineer discussion

Report
Export

You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an ''Out of Memory'' error. What should you do?

A.
Use batch prediction mode instead of online mode.
Answers
A.
Use batch prediction mode instead of online mode.
B.
Send the request again with a smaller batch of instances.
Answers
B.
Send the request again with a smaller batch of instances.
C.
Use base64 to encode your data before using it for prediction.
Answers
C.
Use base64 to encode your data before using it for prediction.
D.
Apply for a quota increase for the number of prediction requests.
Answers
D.
Apply for a quota increase for the number of prediction requests.
Suggested answer: B

Explanation:

Option A is incorrect because using batch prediction mode instead of online mode does not solve the ''Out of Memory'' error, but rather changes the latency and throughput of the prediction service.Batch prediction mode is suitable for large-scale, asynchronous, and non-urgent predictions, while online prediction mode is suitable for low-latency, synchronous, and real-time predictions1.

Option B is correct because sending the request again with a smaller batch of instances can reduce the memory consumption of the prediction service and avoid the ''Out of Memory'' error. The batch size is the number of instances that are processed together in one request.A smaller batch size means less data to load into memory at once2.

Option C is incorrect because using base64 to encode your data before using it for prediction does not reduce the memory consumption of the prediction service, but rather increases it.Base64 encoding is a way of representing binary data as ASCII characters, which increases the size of the data by about 33%3.Base64 encoding is only required for certain data types, such as images and audio, that cannot be represented as JSON or CSV4.

Option D is incorrect because applying for a quota increase for the number of prediction requests does not solve the ''Out of Memory'' error, but rather increases the number of requests that can be sent to the prediction service per day.Quotas are limits on the usage of Google Cloud resources, such as CPU, memory, disk, and network5. Quotas do not affect the performance of the prediction service, but rather the availability and cost of the service.

Choosing between online and batch prediction

Online prediction input data

Base64 encoding

Preparing data for prediction

Quotas and limits

asked 18/09/2024
Jesse Serrano
37 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first