ExamGecko
Home Home / Google / Professional Machine Learning Engineer

Google Professional Machine Learning Engineer Practice Test - Questions Answers, Page 18

Question list
Search
Search

List of questions

Search

Related questions











You developed a Vertex Al pipeline that trains a classification model on data stored in a large BigQuery table. The pipeline has four steps, where each step is created by a Python function that uses the KubeFlow v2 API The components have the following names:

You launch your Vertex Al pipeline as the following:

You perform many model iterations by adjusting the code and parameters of the training step. You observe high costs associated with the development, particularly the data export and preprocessing steps. You need to reduce model development costs. What should you do?

A.
A.
Answers
B.
B.
Answers
C.
C.
Answers
D.
D.
Answers
Suggested answer: A

Explanation:

According to the official exam guide1, one of the skills assessed in the exam is to ''automate and orchestrate ML pipelines using Cloud Composer''.Vertex AI Pipelines2is a service that allows you to orchestrate your ML workflows using Kubeflow Pipelines SDK v2 or TensorFlow Extended. Vertex AI Pipelines supports execution caching, which means that if you run a pipeline and it reaches a component that has already been run with the same inputs and parameters, the component does not run again. Instead, the component uses the output from the previous run. This can save you time and resources when you are iterating on your pipeline. Therefore, option A is the best way to reduce model development costs, as it enables execution caching for the data export and preprocessing steps, which are likely to be the same for each model iteration. The other options are not relevant or optimal for this scenario.Reference:

Professional ML Engineer Exam Guide

Vertex AI Pipelines

Google Professional Machine Learning Certification Exam 2023

Latest Google Professional Machine Learning Engineer Actual Free Exam Questions

You work for a startup that has multiple data science workloads. Your compute infrastructure is currently on-premises. and the data science workloads are native to PySpark Your team plans to migrate their data science workloads to Google Cloud You need to build a proof of concept to migrate one data science job to Google Cloud You want to propose a migration process that requires minimal cost and effort. What should you do first?

A.
Create a n2-standard-4 VM instance and install Java, Scala and Apache Spark dependencies on it.
A.
Create a n2-standard-4 VM instance and install Java, Scala and Apache Spark dependencies on it.
Answers
B.
Create a Google Kubemetes Engine cluster with a basic node pool configuration install Java Scala, and Apache Spark dependencies on it.
B.
Create a Google Kubemetes Engine cluster with a basic node pool configuration install Java Scala, and Apache Spark dependencies on it.
Answers
C.
Create a Standard (1 master. 3 workers) Dataproc cluster, and run a Vertex Al Workbench notebook instance on it.
C.
Create a Standard (1 master. 3 workers) Dataproc cluster, and run a Vertex Al Workbench notebook instance on it.
Answers
D.
Create a Vertex Al Workbench notebook with instance type n2-standard-4.
D.
Create a Vertex Al Workbench notebook with instance type n2-standard-4.
Answers
Suggested answer: C

Explanation:

According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.Dataproc2is a fully managed, fast, and easy-to-use service for running Apache Spark and Apache Hadoop clusters on Google Cloud. Dataproc supports PySpark workloads and provides a simple way to migrate your existing Spark jobs to the cloud. You can create a Dataproc cluster with a few clicks or commands, and run your PySpark jobs on it.You can also use Vertex AI Workbench3, a managed notebook service, to create and run PySpark notebooks on Dataproc clusters. This way, you can interactively develop and test your PySpark code on the cloud. Therefore, option C is the best way to build a proof of concept to migrate one data science job to Google Cloud with minimal cost and effort. The other options are not relevant or optimal for this scenario.Reference:

Professional ML Engineer Exam Guide

Dataproc

Vertex AI Workbench

Google Professional Machine Learning Certification Exam 2023

Latest Google Professional Machine Learning Engineer Actual Free Exam Questions

You work for a bank You have been asked to develop an ML model that will support loan application decisions. You need to determine which Vertex Al services to include in the workflow You want to track the model's training parameters and the metrics per training epoch. You plan to compare the performance of each version of the model to determine the best model based on your chosen metrics. Which Vertex Al services should you use?

A.
Vertex ML Metadata Vertex Al Feature Store, and Vertex Al Vizier
A.
Vertex ML Metadata Vertex Al Feature Store, and Vertex Al Vizier
Answers
B.
Vertex Al Pipelines. Vertex Al Experiments, and Vertex Al Vizier
B.
Vertex Al Pipelines. Vertex Al Experiments, and Vertex Al Vizier
Answers
C.
Vertex ML Metadata Vertex Al Experiments, and Vertex Al TensorBoard
C.
Vertex ML Metadata Vertex Al Experiments, and Vertex Al TensorBoard
Answers
D.
Vertex Al Pipelines. Vertex Al Feature Store, and Vertex Al TensorBoard
D.
Vertex Al Pipelines. Vertex Al Feature Store, and Vertex Al TensorBoard
Answers
Suggested answer: C

Explanation:

According to the official exam guide1, one of the skills assessed in the exam is to ''track the lineage of pipeline artifacts''.Vertex ML Metadata2is a service that allows you to store, query, and visualize metadata associated with your ML workflows, such as datasets, models, metrics, and executions. Vertex ML Metadata helps you track the provenance and lineage of your ML artifacts and understand the relationships between them.Vertex AI Experiments3is a service that allows you to track and compare the results of your model training runs. Vertex AI Experiments automatically logs metadata such as hyperparameters, metrics, and artifacts for each training run. You can use Vertex AI Experiments to train your custom model using TensorFlow, PyTorch, XGBoost, or scikit-learn.Vertex AI TensorBoard4is a service that allows you to visualize and monitor your ML experiments using TensorBoard, an open source tool for ML visualization. Vertex AI TensorBoard helps you track the model's training parameters and the metrics per training epoch, and compare the performance of each version of the model. Therefore, option C is the best way to determine which Vertex AI services to include in the workflow for the given use case. The other options are not relevant or optimal for this scenario.Reference:

Professional ML Engineer Exam Guide

Vertex ML Metadata

Vertex AI Experiments

Vertex AI TensorBoard

Google Professional Machine Learning Certification Exam 2023

Latest Google Professional Machine Learning Engineer Actual Free Exam Questions

You work for an auto insurance company. You are preparing a proof-of-concept ML application that uses images of damaged vehicles to infer damaged parts Your team has assembled a set of annotated images from damage claim documents in the company's database The annotations associated with each image consist of a bounding box for each identified damaged part and the part name. You have been given a sufficient budget to tram models on Google Cloud You need to quickly create an initial model What should you do?

A.
Download a pre-trained object detection mode! from TensorFlow Hub Fine-tune the model in Vertex Al Workbench by using the annotated image data.
A.
Download a pre-trained object detection mode! from TensorFlow Hub Fine-tune the model in Vertex Al Workbench by using the annotated image data.
Answers
B.
Train an object detection model in AutoML by using the annotated image data.
B.
Train an object detection model in AutoML by using the annotated image data.
Answers
C.
Create a pipeline in Vertex Al Pipelines and configure the AutoMLTrainingJobRunOp compon it to train a custom object detection model by using the annotated image data.
C.
Create a pipeline in Vertex Al Pipelines and configure the AutoMLTrainingJobRunOp compon it to train a custom object detection model by using the annotated image data.
Answers
D.
Train an object detection model in Vertex Al custom training by using the annotated image data.
D.
Train an object detection model in Vertex Al custom training by using the annotated image data.
Answers
Suggested answer: B

Explanation:

According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.AutoML Vision2is a service that allows you to train and deploy custom vision models for image classification and object detection. AutoML Vision simplifies the model development process by providing a graphical user interface and a no-code approach.You can use AutoML Vision to train an object detection model by using the annotated image data, and evaluate the model performance using metrics such as mean average precision (mAP) and intersection over union (IoU)3. Therefore, option B is the best way to quickly create an initial model for the given use case. The other options are not relevant or optimal for this scenario.Reference:

Professional ML Engineer Exam Guide

AutoML Vision

Object detection evaluation

Google Professional Machine Learning Certification Exam 2023

Latest Google Professional Machine Learning Engineer Actual Free Exam Questions

You are analyzing customer data for a healthcare organization that is stored in Cloud Storage. The data contains personally identifiable information (PII) You need to perform data exploration and preprocessing while ensuring the security and privacy of sensitive fields What should you do?

A.
Use the Cloud Data Loss Prevention (DLP) API to de-identify the PI! before performing data exploration and preprocessing.
A.
Use the Cloud Data Loss Prevention (DLP) API to de-identify the PI! before performing data exploration and preprocessing.
Answers
B.
Use customer-managed encryption keys (CMEK) to encrypt the Pll data at rest and decrypt the Pll data during data exploration and preprocessing.
B.
Use customer-managed encryption keys (CMEK) to encrypt the Pll data at rest and decrypt the Pll data during data exploration and preprocessing.
Answers
C.
Use a VM inside a VPC Service Controls security perimeter to perform data exploration and preprocessing.
C.
Use a VM inside a VPC Service Controls security perimeter to perform data exploration and preprocessing.
Answers
D.
Use Google-managed encryption keys to encrypt the Pll data at rest, and decrypt the Pll data during data exploration and preprocessing.
D.
Use Google-managed encryption keys to encrypt the Pll data at rest, and decrypt the Pll data during data exploration and preprocessing.
Answers
Suggested answer: A

Explanation:

According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.Cloud Data Loss Prevention (DLP) API2is a service that provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams, such as text blocks and images. Cloud DLP API helps you discover, classify, and protect your sensitive data by using techniques such as de-identification, masking, tokenization, and bucketing. You can use Cloud DLP API to de-identify the PII data before performing data exploration and preprocessing, and retain the data utility for ML purposes. Therefore, option A is the best way to perform data exploration and preprocessing while ensuring the security and privacy of sensitive fields. The other options are not relevant or optimal for this scenario.Reference:

Professional ML Engineer Exam Guide

Cloud Data Loss Prevention (DLP) API

Google Professional Machine Learning Certification Exam 2023

Latest Google Professional Machine Learning Engineer Actual Free Exam Questions

You are building a predictive maintenance model to preemptively detect part defects in bridges. You plan to use high definition images of the bridges as model inputs. You need to explain the output of the model to the relevant stakeholders so they can take appropriate action. How should you build the model?

A.
Use scikit-learn to build a tree-based model, and use SHAP values to explain the model output.
A.
Use scikit-learn to build a tree-based model, and use SHAP values to explain the model output.
Answers
B.
Use scikit-lean to build a tree-based model, and use partial dependence plots (PDP) to explain the model output.
B.
Use scikit-lean to build a tree-based model, and use partial dependence plots (PDP) to explain the model output.
Answers
C.
Use TensorFlow to create a deep learning-based model and use Integrated Gradients to explain the model output.
C.
Use TensorFlow to create a deep learning-based model and use Integrated Gradients to explain the model output.
Answers
D.
Use TensorFlow to create a deep learning-based model and use the sampled Shapley method to explain the model output.
D.
Use TensorFlow to create a deep learning-based model and use the sampled Shapley method to explain the model output.
Answers
Suggested answer: C

Explanation:

According to the official exam guide1, one of the skills assessed in the exam is to ''explain the predictions of a trained model''.TensorFlow2is an open source framework for developing and deploying machine learning and deep learning models.TensorFlow supports various model explainability methods, such as Integrated Gradients3, which is a technique that assigns an importance score to each input feature by approximating the integral of the gradients along the path from a baseline input to the actual input. Integrated Gradients can help explain the output of a deep learning-based model by highlighting the most influential features in the input images. Therefore, option C is the best way to build the model for the given use case. The other options are not relevant or optimal for this scenario.Reference:

Professional ML Engineer Exam Guide

TensorFlow

Integrated Gradients

Google Professional Machine Learning Certification Exam 2023

Latest Google Professional Machine Learning Engineer Actual Free Exam Questions

You work for a hospital that wants to optimize how it schedules operations. You need to create a model that uses the relationship between the number of surgeries scheduled and beds used You want to predict how many beds will be needed for patients each day in advance based on the scheduled surgeries You have one year of data for the hospital organized in 365 rows

The data includes the following variables for each day

* Number of scheduled surgeries

* Number of beds occupied

* Date

You want to maximize the speed of model development and testing What should you do?

A.
Create a BigQuery table Use BigQuery ML to build a regression model, with number of beds as the target variable and number of scheduled surgeries and date features (such as day of week) as the predictors
A.
Create a BigQuery table Use BigQuery ML to build a regression model, with number of beds as the target variable and number of scheduled surgeries and date features (such as day of week) as the predictors
Answers
B.
Create a BigQuery table Use BigQuery ML to build an ARIMA model, with number of beds as the target variable and date as the time variable.
B.
Create a BigQuery table Use BigQuery ML to build an ARIMA model, with number of beds as the target variable and date as the time variable.
Answers
C.
Create a Vertex Al tabular dataset Tram an AutoML regression model, with number of beds as the target variable and number of scheduled minor surgeries and date features (such as day of the week) as the predictors
C.
Create a Vertex Al tabular dataset Tram an AutoML regression model, with number of beds as the target variable and number of scheduled minor surgeries and date features (such as day of the week) as the predictors
Answers
D.
Create a Vertex Al tabular dataset Train a Vertex Al AutoML Forecasting model with number of beds as the target variable, number of scheduled surgeries as a covariate, and date as the time variable.
D.
Create a Vertex Al tabular dataset Train a Vertex Al AutoML Forecasting model with number of beds as the target variable, number of scheduled surgeries as a covariate, and date as the time variable.
Answers
Suggested answer: D

Explanation:

According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.Vertex AI AutoML Forecasting2is a service that allows you to train and deploy custom time-series forecasting models for batch prediction. Vertex AI AutoML Forecasting simplifies the model development process by providing a graphical user interface and a no-code approach. You can use Vertex AI AutoML Forecasting to train a model by using your tabular data, and specify the target variable, the covariates, and the time variable. Vertex AI AutoML Forecasting automatically handles the feature engineering, model selection, and hyperparameter tuning. Therefore, option D is the best way to maximize the speed of model development and testing for the given use case. The other options are not relevant or optimal for this scenario.Reference:

Professional ML Engineer Exam Guide

Vertex AI AutoML Forecasting

Google Professional Machine Learning Certification Exam 2023

Latest Google Professional Machine Learning Engineer Actual Free Exam Questions

You recently developed a wide and deep model in TensorFlow. You generated training datasets using a SQL script that preprocessed raw data in BigQuery by performing instance-level transformations of the data. You need to create a training pipeline to retrain the model on a weekly basis. The trained model will be used to generate daily recommendations. You want to minimize model development and training time. How should you develop the training pipeline?

A.
Use the Kubeflow Pipelines SDK to implement the pipeline Use the BigQueryJobop component to run the preprocessing script and the customTrainingJobop component to launch a Vertex Al training job.
A.
Use the Kubeflow Pipelines SDK to implement the pipeline Use the BigQueryJobop component to run the preprocessing script and the customTrainingJobop component to launch a Vertex Al training job.
Answers
B.
Use the Kubeflow Pipelines SDK to implement the pipeline. Use the dataflowpythonjobopcomponent to preprocess the data and the customTraining JobOp component to launch a Vertex Al training job.
B.
Use the Kubeflow Pipelines SDK to implement the pipeline. Use the dataflowpythonjobopcomponent to preprocess the data and the customTraining JobOp component to launch a Vertex Al training job.
Answers
C.
Use the TensorFlow Extended SDK to implement the pipeline Use the Examplegen component with the BigQuery executor to ingest the data the Transform component to preprocess the data, and the Trainer component to launch a Vertex Al training job.
C.
Use the TensorFlow Extended SDK to implement the pipeline Use the Examplegen component with the BigQuery executor to ingest the data the Transform component to preprocess the data, and the Trainer component to launch a Vertex Al training job.
Answers
D.
Use the TensorFlow Extended SDK to implement the pipeline Implement the preprocessing steps as part of the input_fn of the model Use the ExampleGen component with the BigQuery executor to ingest the data and the Trainer component to launch a Vertex Al training job.
D.
Use the TensorFlow Extended SDK to implement the pipeline Implement the preprocessing steps as part of the input_fn of the model Use the ExampleGen component with the BigQuery executor to ingest the data and the Trainer component to launch a Vertex Al training job.
Answers
Suggested answer: C

Explanation:

Why not A: Using the Kubeflow Pipelines SDK to implement the pipeline is a valid option, but using the BigQueryJobOp component to run the preprocessing script is not optimal. This would require writing and maintaining a separate SQL script for data transformation, which could introduce inconsistencies and errors. It would also make it harder to reuse the same preprocessing logic for both training and serving.

Why not B: Using the Kubeflow Pipelines SDK to implement the pipeline is a valid option, but using the DataflowPythonJobOp component to preprocess the data is not optimal. This would require writing and maintaining a separate Python script for data transformation, which could introduce inconsistencies and errors. It would also make it harder to reuse the same preprocessing logic for both training and serving.

Why not D: Using the TensorFlow Extended SDK to implement the pipeline is a valid option, but implementing the preprocessing steps as part of the input_fn of the model is not optimal. This would make the preprocessing logic tightly coupled with the model code, which could reduce modularity and flexibility. It would also make it harder to reuse the same preprocessing logic for both training and serving.

You are training a custom language model for your company using a large dataset. You plan to use the ReductionServer strategy on Vertex Al. You need to configure the worker pools of the distributed training job. What should you do?

A.
Configure the machines of the first two worker pools to have GPUs and to use a container image where your training code runs Configure the third worker pool to have GPUs: and use the reduction server container image.
A.
Configure the machines of the first two worker pools to have GPUs and to use a container image where your training code runs Configure the third worker pool to have GPUs: and use the reduction server container image.
Answers
B.
Configure the machines of the first two worker pools to have GPUs and to use a container image where your training code runs. Configure the third worker pool to use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
B.
Configure the machines of the first two worker pools to have GPUs and to use a container image where your training code runs. Configure the third worker pool to use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
Answers
C.
Configure the machines of the first two worker pools to have TPUs and to use a container image where your training code runs Configure the third worker pool without accelerators, and use the reductionserver container image without accelerators and choose a machine type that prioritizes bandwidth.
C.
Configure the machines of the first two worker pools to have TPUs and to use a container image where your training code runs Configure the third worker pool without accelerators, and use the reductionserver container image without accelerators and choose a machine type that prioritizes bandwidth.
Answers
D.
Configure the machines of the first two pools to have TPUs. and to use a container image where your training code runs Configure the third pool to have TPUs: and use the reductionserver container image.
D.
Configure the machines of the first two pools to have TPUs. and to use a container image where your training code runs Configure the third pool to have TPUs: and use the reductionserver container image.
Answers
Suggested answer: B

Explanation:

According to the web search results, Reduction Server is a faster GPU all-reduce algorithm developed at Google that uses a dedicated set of reducers to aggregate gradients from workers12.Reducers are lightweight CPU VM instances that are significantly cheaper than GPU VMs2.Therefore, the third worker pool should not have any accelerators, and should use a machine type that has high network bandwidth to optimize the communication between workers and reducers2.TPUs are not supported by Reduction Server, so the first two worker pools should have GPUs and use a container image that contains the training code12.The reduction-server container image is provided by Google and should be used for the third worker pool2.

You have trained a model by using data that was preprocessed in a batch Dataflow pipeline Your use case requires real-time inference. You want to ensure that the data preprocessing logic is applied consistently between training and serving. What should you do?

A.
Perform data validation to ensure that the input data to the pipeline is the same format as the input data to the endpoint.
A.
Perform data validation to ensure that the input data to the pipeline is the same format as the input data to the endpoint.
Answers
B.
Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline Use the same code in the endpoint.
B.
Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline Use the same code in the endpoint.
Answers
C.
Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline Share this code with the end users of the endpoint.
C.
Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline Share this code with the end users of the endpoint.
Answers
D.
Batch the real-time requests by using a time window and then use the Dataflow pipeline to preprocess the batched requests. Send the preprocessed requests to the endpoint.
D.
Batch the real-time requests by using a time window and then use the Dataflow pipeline to preprocess the batched requests. Send the preprocessed requests to the endpoint.
Answers
Suggested answer: B

Explanation:

According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.Dataflow2is a fully managed, fast, and easy-to-use service for running Apache Spark and Apache Hadoop clusters on Google Cloud. Dataflow supports both batch and streaming data processing pipelines. However, if your use case requires real-time inference, you need to ensure that the data preprocessing logic is applied consistently between training and serving. One way to achieve this is to refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline, and use the same code in the endpoint. This way, you can avoid data skew and drift issues that might arise from using different preprocessing methods for training and serving. Therefore, option B is the best way to ensure the data preprocessing logic is applied consistently between training and serving. The other options are not relevant or optimal for this scenario.Reference:

Professional ML Engineer Exam Guide

Dataflow

Google Professional Machine Learning Certification Exam 2023

Latest Google Professional Machine Learning Engineer Actual Free Exam Questions

Total 285 questions
Go to page: of 29