ExamGecko
Home Home / Google / Professional Machine Learning Engineer

Google Professional Machine Learning Engineer Practice Test - Questions Answers, Page 20

Question list
Search
Search

List of questions

Search

Related questions











You work at a leading healthcare firm developing state-of-the-art algorithms for various use cases You have unstructured textual data with custom labels You need to extract and classify various medical phrases with these labels What should you do?

A.
Use the Healthcare Natural Language API to extract medical entities.
A.
Use the Healthcare Natural Language API to extract medical entities.
Answers
B.
Use a BERT-based model to fine-tune a medical entity extraction model.
B.
Use a BERT-based model to fine-tune a medical entity extraction model.
Answers
C.
Use AutoML Entity Extraction to train a medical entity extraction model.
C.
Use AutoML Entity Extraction to train a medical entity extraction model.
Answers
D.
Use TensorFlow to build a custom medical entity extraction model.
D.
Use TensorFlow to build a custom medical entity extraction model.
Answers
Suggested answer: B

Explanation:

Medical entity extraction is a task that involves identifying and classifying medical terms or concepts from unstructured textual data, such as electronic health records, clinical notes, or research papers.Medical entity extraction can help with various use cases, such as information retrieval, knowledge discovery, decision support, and data analysis1.

One possible approach to perform medical entity extraction is to use a BERT-based model to fine-tune a medical entity extraction model.BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model that can capture the contextual information from both left and right sides of a given token2.BERT can be fine-tuned on a specific downstream task, such as medical entity extraction, by adding a task-specific layer on top of the pre-trained model and updating the model parameters with a small amount of labeled data3.

A BERT-based model can achieve high performance on medical entity extraction by leveraging the large-scale pre-training on general-domain corpora and the fine-tuning on domain-specific data.For example, Nesterov and Umerenkov4proposed a novel method of doing medical entity extraction from electronic health records as a single-step multi-label classification task by fine-tuning a transformer model pre-trained on a large EHR dataset. They showed that their model can achieve human-level quality for most frequent entities.

1: Medical Named Entity Recognition from Un-labelled Medical Records based on Pre-trained Language Models and Domain Dictionary | Data Intelligence | MIT Press

2: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

3: Fine-tuning BERT for Medical Entity Extraction

4: Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality

You developed a custom model by using Vertex Al to predict your application's user churn rate You are using Vertex Al Model Monitoring for skew detection The training data stored in BigQuery contains two sets of features - demographic and behavioral You later discover that two separate models trained on each set perform better than the original model

You need to configure a new model mentioning pipeline that splits traffic among the two models You want to use the same prediction-sampling-rate and monitoring-frequency for each model You also want to minimize management effort What should you do?

A.
Keep the training dataset as is Deploy the models to two separate endpoints and submit two Vertex Al Model Monitoring jobs with appropriately selected feature-thresholds parameters
A.
Keep the training dataset as is Deploy the models to two separate endpoints and submit two Vertex Al Model Monitoring jobs with appropriately selected feature-thresholds parameters
Answers
B.
Keep the training dataset as is Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and feature selections
B.
Keep the training dataset as is Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and feature selections
Answers
C.
Separate the training dataset into two tables based on demographic and behavioral features Deploy the models to two separate endpoints, and submit two Vertex Al Model Monitoring jobs
C.
Separate the training dataset into two tables based on demographic and behavioral features Deploy the models to two separate endpoints, and submit two Vertex Al Model Monitoring jobs
Answers
D.
Separate the training dataset into two tables based on demographic and behavioral features. Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and training datasets
D.
Separate the training dataset into two tables based on demographic and behavioral features. Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and training datasets
Answers
Suggested answer: D

Explanation:

Option A is incorrect because it does not separate the training dataset into two tables based on the features, which is necessary to train the two models separately and accurately.

Option B is incorrect because it does not separate the training dataset into two tables based on the features, and because it uses the same monitoring-config-from parameter for both models, which would not account for the different feature selections.

Option C is incorrect because it deploys the models to two separate endpoints, which would increase the management effort and complexity of the pipeline.

Option D is correct because it separates the training dataset into two tables based on the features, which would enable the two models to be trained separately and accurately. It also deploys both models to the same endpoint, which would simplify the pipeline and reduce the management effort. It also submits a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and training datasets, which would enable the skew detection to work properly for each model.

You work for a pharmaceutical company based in Canada. Your team developed a BigQuery ML model to predict the number of flu infections for the next month in Canada Weather data is published weekly and flu infection statistics are published monthly. You need to configure a model retraining policy that minimizes cost What should you do?

A.
Download the weather and flu data each week Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model weekly.
A.
Download the weather and flu data each week Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model weekly.
Answers
B.
Download the weather and flu data each month Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model monthly.
B.
Download the weather and flu data each month Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model monthly.
Answers
C.
Download the weather and flu data each week Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model every month.
C.
Download the weather and flu data each week Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model every month.
Answers
D.
Download the weather data each week, and download the flu data each month Deploy the model to a Vertex Al endpoint with feature drift monitoring. and retrain the model if a monitoring alert is detected.
D.
Download the weather data each week, and download the flu data each month Deploy the model to a Vertex Al endpoint with feature drift monitoring. and retrain the model if a monitoring alert is detected.
Answers
Suggested answer: D

Explanation:

To configure a model retraining policy that minimizes cost, you should follow these steps:

Download the weather data each week, and download the flu data each month. This way, you can keep your data up to date with the latest information available, without downloading unnecessary or redundant data.

Deploy the model to a Vertex AI endpoint with feature drift monitoring.This feature allows you to detect when the distribution of the input data changes significantly from the training data, which could affect the model performance1.

Retrain the model if a monitoring alert is detected. This way, you can update your model only when needed, instead of retraining it on a fixed schedule, which could incur more cost and time.

1: Monitor models for feature drift | Vertex AI | Google Cloud

You are building a MLOps platform to automate your company's ML experiments and model retraining. You need to organize the artifacts for dozens of pipelines How should you store the pipelines' artifacts'?

A.
Store parameters in Cloud SQL and store the models' source code and binaries in GitHub
A.
Store parameters in Cloud SQL and store the models' source code and binaries in GitHub
Answers
B.
Store parameters in Cloud SQL store the models' source code in GitHub, and store the models' binaries in Cloud Storage.
B.
Store parameters in Cloud SQL store the models' source code in GitHub, and store the models' binaries in Cloud Storage.
Answers
C.
Store parameters in Vertex ML Metadata store the models' source code in GitHub and store the models' binaries in Cloud Storage.
C.
Store parameters in Vertex ML Metadata store the models' source code in GitHub and store the models' binaries in Cloud Storage.
Answers
D.
Store parameters in Vertex ML Metadata and store the models source code and binaries in GitHub.
D.
Store parameters in Vertex ML Metadata and store the models source code and binaries in GitHub.
Answers
Suggested answer: C

Explanation:

To organize the artifacts for dozens of pipelines, you should store the parameters in Vertex ML Metadata, store the models' source code in GitHub, and store the models' binaries in Cloud Storage. This option has the following advantages:

Vertex ML Metadata is a service that helps you track and manage the metadata of your ML workflows, such as datasets, models, metrics, and parameters1.It can also help you with data lineage, model versioning, and model performance monitoring2.

GitHub is a popular platform for hosting and collaborating on code repositories.It can help you manage the source code of your models, as well as the configuration files, scripts, and notebooks that are part of your ML pipelines3.

Cloud Storage is a scalable and durable object storage service that can store any type of data, including model binaries4.It can also integrate with other services, such as Vertex AI, Cloud Functions, and Cloud Run, to enable easy deployment and serving of your models5.

1: Introduction to Vertex ML Metadata | Vertex AI | Google Cloud

2: Manage metadata for ML workflows | Vertex AI | Google Cloud

3: GitHub - Where the world builds software

4: Cloud Storage | Google Cloud

5: Deploying models | Vertex AI | Google Cloud

You work for a telecommunications company You're building a model to predict which customers may fail to pay their next phone bill. The purpose of this model is to proactively offer at-risk customers assistance such as service discounts and bill deadline extensions. The data is stored in BigQuery, and the predictive features that are available for model training include

- Customer_id -Age

- Salary (measured in local currency) -Sex

-Average bill value (measured in local currency)

- Number of phone calls in the last month (integer) -Average duration of phone calls (measured in minutes)

You need to investigate and mitigate potential bias against disadvantaged groups while preserving model accuracy What should you do?

A.
Determine whether there is a meaningful correlation between the sensitive features and the other features Train a BigQuery ML boosted trees classification model and exclude the sensitive features and any meaningfully correlated features
A.
Determine whether there is a meaningful correlation between the sensitive features and the other features Train a BigQuery ML boosted trees classification model and exclude the sensitive features and any meaningfully correlated features
Answers
B.
Train a BigQuery ML boosted trees classification model with all features Use the ml. global explain method to calculate the global attribution values for each feature of the model If the feature importance value for any of the sensitive features exceeds a threshold, discard the model and tram without this feature
B.
Train a BigQuery ML boosted trees classification model with all features Use the ml. global explain method to calculate the global attribution values for each feature of the model If the feature importance value for any of the sensitive features exceeds a threshold, discard the model and tram without this feature
Answers
C.
Train a BigQuery ML boosted trees classification model with all features Use the ml. exflain_predict method to calculate the attribution values for each feature for each customer in a test set If for any individual customer the importance value for any feature exceeds a predefined threshold, discard the model and train the model again without this feature.
C.
Train a BigQuery ML boosted trees classification model with all features Use the ml. exflain_predict method to calculate the attribution values for each feature for each customer in a test set If for any individual customer the importance value for any feature exceeds a predefined threshold, discard the model and train the model again without this feature.
Answers
D.
Define a fairness metric that is represented by accuracy across the sensitive features Train a BigQuery ML boosted trees classification model with all features Use the trained model to make predictions on a test set Join the data back with the sensitive features, and calculate a fairness metric to investigate whether it meets your requirements.
D.
Define a fairness metric that is represented by accuracy across the sensitive features Train a BigQuery ML boosted trees classification model with all features Use the trained model to make predictions on a test set Join the data back with the sensitive features, and calculate a fairness metric to investigate whether it meets your requirements.
Answers
Suggested answer: D

Explanation:

A fairness metric is a way to measure how well a machine learning model treats different groups of customers, such as by sex or age. A common fairness metric is accuracy, which is the proportion of correct predictions among all predictions. Accuracy across the sensitive features means calculating the accuracy for each group separately, and then comparing them. For example, if the model has 90% accuracy for male customers and 80% accuracy for female customers, there is a 10% accuracy gap that indicates potential bias against female customers.

To investigate and mitigate potential bias, it is important to define a fairness metric and evaluate it on a test set. A test set is a subset of the data that is not used for training the model, but only for evaluating its performance. By joining the test set predictions with the sensitive features, you can calculate the fairness metric and see if it meets your requirements. For example, you may require that the accuracy gap between any two groups is less than 5%. If the fairness metric does not meet your requirements, you may need to adjust the model or the data to reduce bias.

Option A is not the best answer because excluding the sensitive features and any meaningfully correlated features may not eliminate bias. For example, if salary is correlated with sex, and salary is also a predictive feature for the target variable, excluding both features may reduce the model accuracy and still leave some residual bias. Moreover, excluding features based on correlation may not capture the complex interactions and dependencies among the features that may affect bias.

Option B is not the best answer because using the global attribution values for each feature of the model may not reflect the individual-level impact of the features on the predictions. Global attribution values are calculated by averaging the attribution values across all the data points, and they indicate how important each feature is for the overall model performance. However, they do not show how each feature affects each customer's prediction, which may vary depending on the values of the other features. For example, sex may have a low global attribution value, but it may have a high impact on some customers' predictions, especially if it interacts with other features such as salary or age.

Option C is not the best answer because discarding the model and training the model again without a feature based on a single customer's attribution value may not be a robust or scalable way to mitigate bias. Attribution values are calculated by measuring how much each feature contributes to the prediction for a given data point, and they indicate how sensitive the prediction is to the feature value. However, they do not show how the feature affects the overall fairness metric or the model accuracy. For example, sex may have a high attribution value for a customer, but it may not affect the accuracy gap between the groups. Moreover, discarding and retraining the model based on a single customer's attribution value may not be feasible if there are many customers with high attribution values for different features.

You recently trained a XGBoost model that you plan to deploy to production for online inference Before sending a predict request to your model's binary you need to perform a simple data preprocessing step This step exposes a REST API that accepts requests in your internal VPC Service Controls and returns predictions You want to configure this preprocessing step while minimizing cost and effort What should you do?

A.
Store a pickled model in Cloud Storage Build a Flask-based app packages the app in a custom container image, and deploy the model to Vertex Al Endpoints.
A.
Store a pickled model in Cloud Storage Build a Flask-based app packages the app in a custom container image, and deploy the model to Vertex Al Endpoints.
Answers
B.
Build a Flask-based app. package the app and a pickled model in a custom container image, and deploy the model to Vertex Al Endpoints.
B.
Build a Flask-based app. package the app and a pickled model in a custom container image, and deploy the model to Vertex Al Endpoints.
Answers
C.
Build a custom predictor class based on XGBoost Predictor from the Vertex Al SDK. package it and a pickled model in a custom container image based on a Vertex built-in image, and deploy the model to Vertex Al Endpoints.
C.
Build a custom predictor class based on XGBoost Predictor from the Vertex Al SDK. package it and a pickled model in a custom container image based on a Vertex built-in image, and deploy the model to Vertex Al Endpoints.
Answers
D.
Build a custom predictor class based on XGBoost Predictor from the Vertex Al SDK and package the handler in a custom container image based on a Vertex built-in container image Store a pickled model in Cloud Storage and deploy the model to Vertex Al Endpoints.
D.
Build a custom predictor class based on XGBoost Predictor from the Vertex Al SDK and package the handler in a custom container image based on a Vertex built-in container image Store a pickled model in Cloud Storage and deploy the model to Vertex Al Endpoints.
Answers
Suggested answer: D

Explanation:

Option A is not the best answer because it requires storing the pickled model in Cloud Storage, which may incur additional cost and latency for loading the model. It also requires building a Flask-based app, which may not be necessary for a simple data preprocessing step.

Option B is not the best answer because it requires building a Flask-based app, which may not be necessary for a simple data preprocessing step. It also requires packaging the app and the pickled model in a custom container image, which may increase the size and complexity of the image.

Option C is not the best answer because it requires packaging the pickled model in a custom container image, which may increase the size and complexity of the image. It also does not leverage the Vertex built-in container image, which may provide some optimizations and integrations for XGBoost models.

Option D is the best answer because it leverages the Vertex built-in container image, which may provide some optimizations and integrations for XGBoost models. It also allows storing the pickled model in Cloud Storage, which may reduce the size and complexity of the image. It also allows building a custom predictor class based on XGBoost Predictor from the Vertex AI SDK, which may simplify the data preprocessing step and the prediction logic.

You work at a bank. You need to develop a credit risk model to support loan application decisions You decide to implement the model by using a neural network in TensorFlow Due to regulatory requirements, you need to be able to explain the models predictions based on its features When the model is deployed, you also want to monitor the model's performance overtime You decided to use Vertex Al for both model development and deployment What should you do?

A.
Use Vertex Explainable Al with the sampled Shapley method, and enable Vertex Al Model Monitoring to check for feature distribution drift.
A.
Use Vertex Explainable Al with the sampled Shapley method, and enable Vertex Al Model Monitoring to check for feature distribution drift.
Answers
B.
Use Vertex Explainable Al with the sampled Shapley method, and enable Vertex Al Model Monitoring to check for feature distribution skew.
B.
Use Vertex Explainable Al with the sampled Shapley method, and enable Vertex Al Model Monitoring to check for feature distribution skew.
Answers
C.
Use Vertex Explainable Al with the XRAI method, and enable Vertex Al Model Monitoring to check for feature distribution drift.
C.
Use Vertex Explainable Al with the XRAI method, and enable Vertex Al Model Monitoring to check for feature distribution drift.
Answers
D.
Use Vertex Explainable Al with the XRAI method and enable Vertex Al Model Monitoring to check for feature distribution skew.
D.
Use Vertex Explainable Al with the XRAI method and enable Vertex Al Model Monitoring to check for feature distribution skew.
Answers
Suggested answer: A

Explanation:

To develop a credit risk model that meets the regulatory requirements and can be monitored over time, you should follow these steps:

Use Vertex Explainable AI with the sampled Shapley method.Vertex Explainable AI is a service that provides feature attributions for machine learning models, which can help you understand how each feature contributes to the prediction1.The sampled Shapley method is a technique that estimates the Shapley values for each feature, which are based on the marginal contribution of each feature to the prediction across all possible feature subsets2.The sampled Shapley method is suitable for neural networks and other complex models, as it can capture the non-linear and interaction effects of the features3.

Enable Vertex AI Model Monitoring to check for feature distribution drift.Vertex AI Model Monitoring is a service that helps you track and manage the performance and quality of your deployed models over time4. Feature distribution drift is a type of data drift that occurs when the distribution of the input features changes significantly from the training data, which can affect the model accuracy and reliability. By checking for feature distribution drift, you can detect when your model needs to be retrained or updated with new data.

1: Introduction to Vertex Explainable AI | Vertex AI | Google Cloud

2: Shapley value - Wikipedia

3: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning

4: Introduction to Vertex AI Model Monitoring | Vertex AI | Google Cloud

[5]: Monitor models for data drift | Vertex AI | Google Cloud

You are investigating the root cause of a misclassification error made by one of your models. You used Vertex Al Pipelines to tram and deploy the model. The pipeline reads data from BigQuery. creates a copy of the data in Cloud Storage in TFRecord format trains the model in Vertex Al Training on that copy, and deploys the model to a Vertex Al endpoint. You have identified the specific version of that model that misclassified: and you need to recover the data this model was trained on. How should you find that copy of the data'?

A.
Use Vertex Al Feature Store Modify the pipeline to use the feature store; and ensure that all training data is stored in it Search the feature store for the data used for the training.
A.
Use Vertex Al Feature Store Modify the pipeline to use the feature store; and ensure that all training data is stored in it Search the feature store for the data used for the training.
Answers
B.
Use the lineage feature of Vertex Al Metadata to find the model artifact Determine the version of the model and identify the step that creates the data copy, and search in the metadata for its location.
B.
Use the lineage feature of Vertex Al Metadata to find the model artifact Determine the version of the model and identify the step that creates the data copy, and search in the metadata for its location.
Answers
C.
Use the logging features in the Vertex Al endpoint to determine the timestamp of the models deployment Find the pipeline run at that timestamp Identify the step that creates the data copy; and search in the logs for its location.
C.
Use the logging features in the Vertex Al endpoint to determine the timestamp of the models deployment Find the pipeline run at that timestamp Identify the step that creates the data copy; and search in the logs for its location.
Answers
D.
Find the job ID in Vertex Al Training corresponding to the training for the model Search in the logs of that job for the data used for the training.
D.
Find the job ID in Vertex Al Training corresponding to the training for the model Search in the logs of that job for the data used for the training.
Answers
Suggested answer: B

Explanation:

Option A is not the best answer because it requires modifying the pipeline to use the Vertex AI Feature Store, which may not be feasible or necessary for recovering the data that the model was trained on.The Vertex AI Feature Store is a service that helps you manage, store, and serve feature values for your machine learning models1, but it is not designed for storing the raw data or the TFRecord files.

Option B is the best answer because it leverages the lineage feature of Vertex AI Metadata, which is a service that helps you track and manage the metadata of your machine learning workflows, such as datasets, models, metrics, and parameters2.The lineage feature allows you to view the relationships and dependencies among the artifacts and executions in your pipeline, and trace back the origin and history of any artifact3. By using the lineage feature, you can find the model artifact, determine the version of the model, identify the step that creates the data copy, and search in the metadata for its location.

Option C is not the best answer because it relies on the logging features in the Vertex AI endpoint, which may not be accurate or reliable for finding the data copy.The logging features in the Vertex AI endpoint help you monitor and troubleshoot the online predictions made by your deployed models, but they do not provide information about the training data or the pipeline steps4. Moreover, the timestamp of the model deployment may not match the timestamp of the pipeline run, as there may be delays or errors in the deployment process.

Option D is not the best answer because it requires finding the job ID in Vertex AI Training, which may not be easy or straightforward. Vertex AI Training is a service that helps you train your custom models on Google Cloud, but it does not provide a direct way to link the training job to the model version or the pipeline run. Moreover, searching in the logs of the job may not reveal the location of the data copy, as the logs may only contain information about the training process and the metrics.

1: Introduction to Vertex AI Feature Store | Vertex AI | Google Cloud

2: Introduction to Vertex AI Metadata | Vertex AI | Google Cloud

3: View lineage for ML workflows | Vertex AI | Google Cloud

4: Monitor online predictions | Vertex AI | Google Cloud

[5]: Train custom models | Vertex AI | Google Cloud

You work for a manufacturing company. You need to train a custom image classification model to detect product defects at the end of an assembly line Although your model is performing well some images in your holdout set are consistently mislabeled with high confidence You want to use Vertex Al to understand your model's results What should you do?

A.
A.
Answers
B.
B.
Answers
C.
C.
Answers
D.
D.
Answers
Suggested answer: C

Explanation:

Vertex Explainable AI is a set of tools and frameworks to help you understand and interpret predictions made by your machine learning models, natively integrated with a number of Google's products and services1.With Vertex Explainable AI, you can generate feature-based explanations that show how much each input feature contributed to the model's prediction2. This can help you debug and improve your model performance, and build confidence in your model's behavior.Feature-based explanations are supported for custom image classification models deployed on Vertex AI Prediction3.Reference:

Explainable AI | Google Cloud

Introduction to Vertex Explainable AI | Vertex AI | Google Cloud

Supported model types for feature-based explanations | Vertex AI | Google Cloud

You are using Keras and TensorFlow to develop a fraud detection model Records of customer transactions are stored in a large table in BigQuery. You need to preprocess these records in a cost-effective and efficient way before you use them to train the model. The trained model will be used to perform batch inference in BigQuery. How should you implement the preprocessing workflow?

A.
Implement a preprocessing pipeline by using Apache Spark, and run the pipeline on Dataproc Save the preprocessed data as CSV files in a Cloud Storage bucket.
A.
Implement a preprocessing pipeline by using Apache Spark, and run the pipeline on Dataproc Save the preprocessed data as CSV files in a Cloud Storage bucket.
Answers
B.
Load the data into a pandas DataFrame Implement the preprocessing steps using panda's transformations. and train the model directly on the DataFrame.
B.
Load the data into a pandas DataFrame Implement the preprocessing steps using panda's transformations. and train the model directly on the DataFrame.
Answers
C.
Perform preprocessing in BigQuery by using SQL Use the BigQueryClient in TensorFlow to read the data directly from BigQuery.
C.
Perform preprocessing in BigQuery by using SQL Use the BigQueryClient in TensorFlow to read the data directly from BigQuery.
Answers
D.
Implement a preprocessing pipeline by using Apache Beam, and run the pipeline on Dataflow Save the preprocessed data as CSV files in a Cloud Storage bucket.
D.
Implement a preprocessing pipeline by using Apache Beam, and run the pipeline on Dataflow Save the preprocessed data as CSV files in a Cloud Storage bucket.
Answers
Suggested answer: C

Explanation:

Option A is not the best answer because it requires using Apache Spark and Dataproc, which may incur additional cost and complexity for running and managing the cluster. It also requires saving the preprocessed data as CSV files in a Cloud Storage bucket, which may increase the storage cost and the data transfer latency.

Option B is not the best answer because it requires loading the data into a pandas DataFrame, which may not be scalable or efficient for large datasets. It also requires training the model directly on the DataFrame, which may not leverage the distributed computing capabilities of BigQuery.

Option C is the best answer because it allows performing preprocessing in BigQuery by using SQL, which is a cost-effective and efficient way to manipulate large datasets.It also allows using the BigQueryClient in TensorFlow to read the data directly from BigQuery, which is a convenient and fast way to access the data for training the model1.

Option D is not the best answer because it requires using Apache Beam and Dataflow, which may incur additional cost and complexity for running and managing the pipeline. It also requires saving the preprocessed data as CSV files in a Cloud Storage bucket, which may increase the storage cost and the data transfer latency.

1: Read data from BigQuery | TensorFlow I/O

Total 285 questions
Go to page: of 29