ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 122 - Professional Machine Learning Engineer discussion

Report
Export

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, scikit-learn, and custom libraries. What should you do?

A.
Use the Vertex AI Training to submit training jobs using any framework.
Answers
A.
Use the Vertex AI Training to submit training jobs using any framework.
B.
Configure Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob.
Answers
B.
Configure Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob.
C.
Create a library of VM images on Compute Engine, and publish these images on a centralized repository.
Answers
C.
Create a library of VM images on Compute Engine, and publish these images on a centralized repository.
D.
Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.
Answers
D.
Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.
Suggested answer: A

Explanation:

The best option for using a managed service to submit training jobs with different frameworks is to use Vertex AI Training. Vertex AI Training is a fully managed service that allows you to train custom models on Google Cloud using any framework, such as TensorFlow, PyTorch, scikit-learn, XGBoost, etc. You can also use custom containers to run your own libraries and dependencies. Vertex AI Training handles the infrastructure provisioning, scaling, and monitoring for you, so you can focus on your model development and optimization. Vertex AI Training also integrates with other Vertex AI services, such as Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Prediction. The other options are not as suitable for using a managed service to submit training jobs with different frameworks, because:

Configuring Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob would require more infrastructure maintenance, as Kubeflow is not a fully managed service, and you would have to provision and manage your own Kubernetes cluster. This would also incur more costs, as you would have to pay for the cluster resources, regardless of the training job usage. TFJob is also mainly designed for TensorFlow models, and might not support other frameworks as well as Vertex AI Training.

Creating a library of VM images on Compute Engine, and publishing these images on a centralized repository would require more development time and effort, as you would have to create and maintain different VM images for different frameworks and libraries. You would also have to manually configure and launch the VMs for each training job, and handle the scaling and monitoring yourself. This would not leverage the benefits of a managed service, such as Vertex AI Training.

Setting up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure would require more configuration and administration, as Slurm is not a native Google Cloud service, and you would have to install and manage it on your own VMs or clusters. Slurm is also a general-purpose workload manager, and might not have the same level of integration and optimization for ML frameworks and libraries as Vertex AI Training.Reference:

Vertex AI Training | Google Cloud

Kubeflow on Google Cloud | Google Cloud

TFJob for training TensorFlow models with Kubernetes | Kubeflow

Compute Engine | Google Cloud

Slurm Workload Manager

asked 18/09/2024
ALEXANDRE NGUYEN
36 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first