While running a model training pipeline on Vertex Al, you discover that the evaluation step is failing because of an out-of-memory error. You are currently using TensorFlow Model Analysis (TFMA) with a standard Evaluator TensorFlow Extended (TFX) pipeline component for the evaluation step. You want to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead. What should you do?

Question

monet washington · Accepted Answer

Include the flag -runner=DataflowRunner in beam_pipeline_args to run the evaluation step on Dataflow.

monet washington · Answer

Add tfma.MetricsSpec () to limit the number of metrics in the evaluation step.

monet washington · Answer

Migrate your pipeline to Kubeflow hosted on Google Kubernetes Engine, and specify the appropriate node parameters for the evaluation step.

monet washington · Answer

Move the evaluation step out of your pipeline and run it on custom Compute Engine VMs with sufficient memory.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 247 - Professional Machine Learning Engineer discussion

Suggested answer: C

0 comments