ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 144 - Professional Machine Learning Engineer discussion

Report
Export

You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator;

estimator = tf.estimator.DNNRegressor(

feature_columns=[YOUR_LIST_OF_FEATURES],

hidden_units-[1024, 512, 256],

dropout=None)

Your model performs well, but Just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You are willing to accept a small decrease in performance in order to reach the latency requirement Therefore your plan is to improve latency while evaluating how much the model's prediction decreases. What should you first try to quickly lower the serving latency?

A.
Increase the dropout rate to 0.8 in_PREDICT mode by adjusting the TensorFlow Serving parameters
Answers
A.
Increase the dropout rate to 0.8 in_PREDICT mode by adjusting the TensorFlow Serving parameters
B.
Increase the dropout rate to 0.8 and retrain your model.
Answers
B.
Increase the dropout rate to 0.8 and retrain your model.
C.
Switch from CPU to GPU serving
Answers
C.
Switch from CPU to GPU serving
D.
Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.
Answers
D.
Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.
Suggested answer: D

Explanation:

Quantizationis a technique that reduces the numerical precision of the weights and activations of a neural network, which can improve the inference speed and reduce the memory footprint of the model1.

Reducing the floating point precision fromtf.float64totf.float16can potentially halve the latency and memory usage of the model, while having minimal impact on the accuracy2.

Increasing the dropout rate to 0.8 in either mode would not affect the latency, but would likely degrade the performance of the model significantly, as dropout is a regularization technique that randomly drops out units during training to prevent overfitting3.

Switching from CPU to GPU serving may or may not improve the latency, depending on the hardware specifications and the model complexity, but it would also incur additional costs and complexity for deployment4

asked 18/09/2024
Angélica González
41 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first