ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 92 - Professional Machine Learning Engineer discussion

Report
Export

You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level TensorFlow APIs on AI Platform with GPUs. Your team usually takes a few weeks or months to iterate on a new version of a model. You were recently asked to review your team's spending. How should you reduce your Google Cloud compute costs without impacting the model's performance?

A.
Use AI Platform to run distributed training jobs with checkpoints.
Answers
A.
Use AI Platform to run distributed training jobs with checkpoints.
B.
Use AI Platform to run distributed training jobs without checkpoints.
Answers
B.
Use AI Platform to run distributed training jobs without checkpoints.
C.
Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.
Answers
C.
Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.
D.
Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.
Answers
D.
Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.
Suggested answer: C

Explanation:

Option A is incorrect because using AI Platform to run distributed training jobs with checkpoints does not reduce the compute costs, but rather increases them by using more resources and storing the checkpoints.

Option B is incorrect because using AI Platform to run distributed training jobs without checkpoints may reduce the compute costs, but it also risks losing the progress of the training if the job fails or is interrupted.

Option C is correct because migrating to training with Kubeflow on Google Kubernetes Engine, and using preemptible VMs with checkpoints can reduce the compute costs significantly by using cheaper and more scalable resources, while also preserving the state of the training with checkpoints.

Option D is incorrect because using preemptible VMs without checkpoints may reduce the compute costs, but it also risks losing the training progress if the VMs are preempted.

Kubeflow on Google Cloud

Using preemptible VMs and GPUs

Saving and loading models

asked 18/09/2024
Marie Joyce Candice Dancel
42 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first