You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?

Question

Daniel Yontz · Accepted Answer

Use the tf.distribute.Strategy API and run a distributed training job.

Daniel Yontz · Answer

Increase the instance memory to 512GB and increase the batch size.

Daniel Yontz · Answer

Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.

Daniel Yontz · Answer

Enable early stopping in your Vertex AI Training job.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 113 - Professional Machine Learning Engineer discussion

Suggested answer: D

0 comments