You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance. Which action should you try first to increase the efficiency of your pipeline?

Question

Rahul Chugh · Accepted Answer

Preprocess the input CSV file into a TFRecord file.

Rahul Chugh · Answer

Randomly select a 10 gigabyte subset of the data to train your model.

Rahul Chugh · Answer

Split into multiple CSV files and use a parallel interleave transformation.

Rahul Chugh · Answer

Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Question list

List of questions

Question 1

(0)

Question 2

(0)

Question 3

(0)

Question 4

(0)

Question 5

(0)

Question 6

(0)

Question 7

(0)

Question 8

(0)

Question 9

(0)

Question 10

(0)

Related questions

Question 99 - Professional Machine Learning Engineer discussion

Suggested answer: A

0 comments