ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 142 - MLS-C01 discussion

Report
Export

A company's Machine Learning Specialist needs to improve the training speed of a time-series forecasting model using TensorFlow. The training is currently implemented on a single-GPU machine and takes approximately 23 hours to complete. The training needs to be run daily.

The model accuracy js acceptable, but the company anticipates a continuous increase in the size of the training data and a need to update the model on an hourly, rather than a daily, basis. The company also wants to minimize coding effort and infrastructure changes

What should the Machine Learning Specialist do to the training solution to allow it to scale for future demand?

A.
Do not change the TensorFlow code. Change the machine to one with a more powerful GPU to speed up the training.
Answers
A.
Do not change the TensorFlow code. Change the machine to one with a more powerful GPU to speed up the training.
B.
Change the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker. Parallelize the training to as many machines as needed to achieve the business goals.
Answers
B.
Change the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker. Parallelize the training to as many machines as needed to achieve the business goals.
C.
Switch to using a built-in AWS SageMaker DeepAR model. Parallelize the training to as many machines as needed to achieve the business goals.
Answers
C.
Switch to using a built-in AWS SageMaker DeepAR model. Parallelize the training to as many machines as needed to achieve the business goals.
D.
Move the training to Amazon EMR and distribute the workload to as many machines as needed to achieve the business goals.
Answers
D.
Move the training to Amazon EMR and distribute the workload to as many machines as needed to achieve the business goals.
Suggested answer: B

Explanation:

To improve the training speed of a time-series forecasting model using TensorFlow, the Machine Learning Specialist should change the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker.Horovod is a free and open-source software framework for distributed deep learning training using TensorFlow, Keras, PyTorch, and Apache MXNet1.Horovod can scale up to hundreds of GPUs with upwards of 90% scaling efficiency2.Horovod is easy to use, as it requires only a few lines of Python code to modify an existing training script2.Horovod is also portable, as it runs the same for TensorFlow, Keras, PyTorch, and MXNet; on premise, in the cloud, and on Apache Spark2.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly3.Amazon SageMaker supports Horovod as a built-in distributed training framework, which means that the Machine Learning Specialist does not need to install or configure Horovod separately4.Amazon SageMaker also provides a number of features and tools to simplify and optimize the distributed training process, such as automatic scaling, debugging, profiling, and monitoring4. By using Amazon SageMaker, the Machine Learning Specialist can parallelize the training to as many machines as needed to achieve the business goals, while minimizing coding effort and infrastructure changes.

References:

1: Horovod (machine learning) - Wikipedia

2: Home - Horovod

3: Amazon SageMaker -- Machine Learning Service -- AWS

4: Use Horovod with Amazon SageMaker - Amazon SageMaker

asked 16/09/2024
Grégory CALIX
35 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first