ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 41 - Professional Machine Learning Engineer discussion

Report
Export

You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?

A.
Create a tf.data.Dataset.prefetch transformation
Answers
A.
Create a tf.data.Dataset.prefetch transformation
B.
Convert the images to tf .Tensor Objects, and then run Dataset. from_tensor_slices{).
Answers
B.
Convert the images to tf .Tensor Objects, and then run Dataset. from_tensor_slices{).
C.
Convert the images to tf .Tensor Objects, and then run tf. data. Dataset. from_tensors ().
Answers
C.
Convert the images to tf .Tensor Objects, and then run tf. data. Dataset. from_tensors ().
D.
Convert the images Into TFRecords, store the images in Cloud Storage, and then use the tf. data API to read the images for training
Answers
D.
Convert the images Into TFRecords, store the images in Cloud Storage, and then use the tf. data API to read the images for training
Suggested answer: D

Explanation:

An input pipeline is a way to prepare and feed data to a machine learning model for training or inference. An input pipeline typically consists of several steps, such as reading, parsing, transforming, batching, and prefetching the data.An input pipeline can improve the performance and efficiency of the model, as it can handle large and complex datasets, optimize the data processing, and reduce the latency and memory usage1.

For the use case of developing an input pipeline for an ML training model that processes images from disparate sources at a low latency, the best option is to convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training. This option involves using the following components and techniques:

TFRecords: TFRecords is a binary file format that can store a sequence of data records, such as images, text, or audio. TFRecords can help to compress, serialize, and store the data efficiently, and reduce the data loading and parsing time.TFRecords can also support data sharding and interleaving, which can improve the data throughput and parallelism2.

Cloud Storage: Cloud Storage is a service that allows you to store and access data on Google Cloud. Cloud Storage can help to store and manage large and distributed datasets, such as images from different sources, and provide high availability, durability, and scalability.Cloud Storage can also integrate with other Google Cloud services, such as Compute Engine, AI Platform, and Dataflow3.

tf.data API: tf.data API is a set of tools and methods that allow you to create and manipulate data pipelines in TensorFlow. tf.data API can help to read, transform, batch, and prefetch the data efficiently, and optimize the data processing for performance and memory. tf.data API can also support various data sources and formats, such as TFRecords, CSV, JSON, and images.

By using these components and techniques, the input pipeline can process large datasets of images from disparate sources that do not fit in memory, and provide low latency and high performance for the ML training model. Therefore, converting the images into TFRecords, storing the images in Cloud Storage, and using the tf.data API to read the images for training is the best option for this use case.

Build TensorFlow input pipelines | TensorFlow Core

TFRecord and tf.Example | TensorFlow Core

Cloud Storage documentation | Google Cloud

[tf.data: Build TensorFlow input pipelines | TensorFlow Core]

asked 18/09/2024
Michael Sheard
45 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first