List of questions
Related questions
Question 128 - Professional Machine Learning Engineer discussion
You are an ML engineer at a mobile gaming company. A data scientist on your team recently trained a TensorFlow model, and you are responsible for deploying this model into a mobile application. You discover that the inference latency of the current model doesn't meet production requirements. You need to reduce the inference time by 50%, and you are willing to accept a small decrease in model accuracy in order to reach the latency requirement. Without training a new model, which model optimization technique for reducing latency should you try first?
A.
Weight pruning
B.
Dynamic range quantization
C.
Model distillation
D.
Dimensionality reduction
Your answer:
0 comments
Sorted by
Leave a comment first