ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 25 - Professional Machine Learning Engineer discussion

Report
Export

You recently designed and built a custom neural network that uses critical dependencies specific to your organization's framework. You need to train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by Al Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the scheduler, workers, and servers distribution structure. What should you do?

A.
Use a built-in model available on Al Platform Training
Answers
A.
Use a built-in model available on Al Platform Training
B.
Build your custom container to run jobs on Al Platform Training
Answers
B.
Build your custom container to run jobs on Al Platform Training
C.
Build your custom containers to run distributed training jobs on Al Platform Training
Answers
C.
Build your custom containers to run distributed training jobs on Al Platform Training
D.
Reconfigure your code to a ML framework with dependencies that are supported by Al Platform Training
Answers
D.
Reconfigure your code to a ML framework with dependencies that are supported by Al Platform Training
Suggested answer: C

Explanation:

AI Platform Training is a service that allows you to run your machine learning training jobs on Google Cloud using various features, model architectures, and hyperparameters.You can use AI Platform Training to scale up your training jobs, leverage distributed training, and access specialized hardware such as GPUs and TPUs1.AI Platform Training supports several pre-built containers that provide different ML frameworks and dependencies, such as TensorFlow, PyTorch, scikit-learn, and XGBoost2.However, if the ML framework and related dependencies that you need are not supported by the pre-built containers, you can build your own custom containers and use them to run your training jobs on AI Platform Training3.

Custom containers are Docker images that you create to run your training application.By using custom containers, you can specify and pre-install all the dependencies needed for your application, and have full control over the code, serving, and deployment of your model4.Custom containers also enable you to run distributed training jobs on AI Platform Training, which can help you train large-scale and complex models faster and more efficiently5. Distributed training is a technique that splits the training data and computation across multiple machines, and coordinates them to update the model parameters. AI Platform Training supports two types of distributed training: parameter server and collective all-reduce. The parameter server architecture consists of a set of workers that perform the computation, and a set of servers that store and update the model parameters. The collective all-reduce architecture consists of a set of workers that perform the computation and synchronize the model parameters among themselves. Both architectures also have a scheduler that coordinates the workers and servers.

For the use case of training a custom neural network that uses critical dependencies specific to your organization's framework, the best option is to build your custom containers to run distributed training jobs on AI Platform Training. This option allows you to use the ML framework and dependencies of your choice, and train your model on multiple machines without having to manage the infrastructure. Since your ML framework of choice uses the scheduler, workers, and servers distribution structure, you can use the parameter server architecture to run your distributed training job on AI Platform Training. You can specify the number and type of machines, the custom container image, and the training application arguments when you submit your training job. Therefore, building your custom containers to run distributed training jobs on AI Platform Training is the best option for this use case.

AI Platform Training documentation

Pre-built containers for training

Custom containers for training

Custom containers overview | Vertex AI | Google Cloud

Distributed training overview

[Types of distributed training]

[Distributed training architectures]

[Using custom containers for training with the parameter server architecture]

asked 18/09/2024
Mustafa Hussien
41 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first