Google Professional Machine Learning Engineer Practice Test - Questions Answers, Page 18
List of questions
Related questions
Question 171
You developed a Vertex Al pipeline that trains a classification model on data stored in a large BigQuery table. The pipeline has four steps, where each step is created by a Python function that uses the KubeFlow v2 API The components have the following names:
You launch your Vertex Al pipeline as the following:
You perform many model iterations by adjusting the code and parameters of the training step. You observe high costs associated with the development, particularly the data export and preprocessing steps. You need to reduce model development costs. What should you do?
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to ''automate and orchestrate ML pipelines using Cloud Composer''.Vertex AI Pipelines2is a service that allows you to orchestrate your ML workflows using Kubeflow Pipelines SDK v2 or TensorFlow Extended. Vertex AI Pipelines supports execution caching, which means that if you run a pipeline and it reaches a component that has already been run with the same inputs and parameters, the component does not run again. Instead, the component uses the output from the previous run. This can save you time and resources when you are iterating on your pipeline. Therefore, option A is the best way to reduce model development costs, as it enables execution caching for the data export and preprocessing steps, which are likely to be the same for each model iteration. The other options are not relevant or optimal for this scenario.Reference:
Professional ML Engineer Exam Guide
Vertex AI Pipelines
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
Question 172
You work for a startup that has multiple data science workloads. Your compute infrastructure is currently on-premises. and the data science workloads are native to PySpark Your team plans to migrate their data science workloads to Google Cloud You need to build a proof of concept to migrate one data science job to Google Cloud You want to propose a migration process that requires minimal cost and effort. What should you do first?
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.Dataproc2is a fully managed, fast, and easy-to-use service for running Apache Spark and Apache Hadoop clusters on Google Cloud. Dataproc supports PySpark workloads and provides a simple way to migrate your existing Spark jobs to the cloud. You can create a Dataproc cluster with a few clicks or commands, and run your PySpark jobs on it.You can also use Vertex AI Workbench3, a managed notebook service, to create and run PySpark notebooks on Dataproc clusters. This way, you can interactively develop and test your PySpark code on the cloud. Therefore, option C is the best way to build a proof of concept to migrate one data science job to Google Cloud with minimal cost and effort. The other options are not relevant or optimal for this scenario.Reference:
Professional ML Engineer Exam Guide
Dataproc
Vertex AI Workbench
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
Question 173
You work for a bank You have been asked to develop an ML model that will support loan application decisions. You need to determine which Vertex Al services to include in the workflow You want to track the model's training parameters and the metrics per training epoch. You plan to compare the performance of each version of the model to determine the best model based on your chosen metrics. Which Vertex Al services should you use?
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to ''track the lineage of pipeline artifacts''.Vertex ML Metadata2is a service that allows you to store, query, and visualize metadata associated with your ML workflows, such as datasets, models, metrics, and executions. Vertex ML Metadata helps you track the provenance and lineage of your ML artifacts and understand the relationships between them.Vertex AI Experiments3is a service that allows you to track and compare the results of your model training runs. Vertex AI Experiments automatically logs metadata such as hyperparameters, metrics, and artifacts for each training run. You can use Vertex AI Experiments to train your custom model using TensorFlow, PyTorch, XGBoost, or scikit-learn.Vertex AI TensorBoard4is a service that allows you to visualize and monitor your ML experiments using TensorBoard, an open source tool for ML visualization. Vertex AI TensorBoard helps you track the model's training parameters and the metrics per training epoch, and compare the performance of each version of the model. Therefore, option C is the best way to determine which Vertex AI services to include in the workflow for the given use case. The other options are not relevant or optimal for this scenario.Reference:
Professional ML Engineer Exam Guide
Vertex ML Metadata
Vertex AI Experiments
Vertex AI TensorBoard
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
Question 174
You work for an auto insurance company. You are preparing a proof-of-concept ML application that uses images of damaged vehicles to infer damaged parts Your team has assembled a set of annotated images from damage claim documents in the company's database The annotations associated with each image consist of a bounding box for each identified damaged part and the part name. You have been given a sufficient budget to tram models on Google Cloud You need to quickly create an initial model What should you do?
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.AutoML Vision2is a service that allows you to train and deploy custom vision models for image classification and object detection. AutoML Vision simplifies the model development process by providing a graphical user interface and a no-code approach.You can use AutoML Vision to train an object detection model by using the annotated image data, and evaluate the model performance using metrics such as mean average precision (mAP) and intersection over union (IoU)3. Therefore, option B is the best way to quickly create an initial model for the given use case. The other options are not relevant or optimal for this scenario.Reference:
Professional ML Engineer Exam Guide
AutoML Vision
Object detection evaluation
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
Question 175
You are analyzing customer data for a healthcare organization that is stored in Cloud Storage. The data contains personally identifiable information (PII) You need to perform data exploration and preprocessing while ensuring the security and privacy of sensitive fields What should you do?
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.Cloud Data Loss Prevention (DLP) API2is a service that provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams, such as text blocks and images. Cloud DLP API helps you discover, classify, and protect your sensitive data by using techniques such as de-identification, masking, tokenization, and bucketing. You can use Cloud DLP API to de-identify the PII data before performing data exploration and preprocessing, and retain the data utility for ML purposes. Therefore, option A is the best way to perform data exploration and preprocessing while ensuring the security and privacy of sensitive fields. The other options are not relevant or optimal for this scenario.Reference:
Professional ML Engineer Exam Guide
Cloud Data Loss Prevention (DLP) API
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
Question 176
You are building a predictive maintenance model to preemptively detect part defects in bridges. You plan to use high definition images of the bridges as model inputs. You need to explain the output of the model to the relevant stakeholders so they can take appropriate action. How should you build the model?
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to ''explain the predictions of a trained model''.TensorFlow2is an open source framework for developing and deploying machine learning and deep learning models.TensorFlow supports various model explainability methods, such as Integrated Gradients3, which is a technique that assigns an importance score to each input feature by approximating the integral of the gradients along the path from a baseline input to the actual input. Integrated Gradients can help explain the output of a deep learning-based model by highlighting the most influential features in the input images. Therefore, option C is the best way to build the model for the given use case. The other options are not relevant or optimal for this scenario.Reference:
Professional ML Engineer Exam Guide
TensorFlow
Integrated Gradients
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
Question 177
You work for a hospital that wants to optimize how it schedules operations. You need to create a model that uses the relationship between the number of surgeries scheduled and beds used You want to predict how many beds will be needed for patients each day in advance based on the scheduled surgeries You have one year of data for the hospital organized in 365 rows
The data includes the following variables for each day
* Number of scheduled surgeries
* Number of beds occupied
* Date
You want to maximize the speed of model development and testing What should you do?
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.Vertex AI AutoML Forecasting2is a service that allows you to train and deploy custom time-series forecasting models for batch prediction. Vertex AI AutoML Forecasting simplifies the model development process by providing a graphical user interface and a no-code approach. You can use Vertex AI AutoML Forecasting to train a model by using your tabular data, and specify the target variable, the covariates, and the time variable. Vertex AI AutoML Forecasting automatically handles the feature engineering, model selection, and hyperparameter tuning. Therefore, option D is the best way to maximize the speed of model development and testing for the given use case. The other options are not relevant or optimal for this scenario.Reference:
Professional ML Engineer Exam Guide
Vertex AI AutoML Forecasting
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
Question 178
You recently developed a wide and deep model in TensorFlow. You generated training datasets using a SQL script that preprocessed raw data in BigQuery by performing instance-level transformations of the data. You need to create a training pipeline to retrain the model on a weekly basis. The trained model will be used to generate daily recommendations. You want to minimize model development and training time. How should you develop the training pipeline?
Explanation:
Why not A: Using the Kubeflow Pipelines SDK to implement the pipeline is a valid option, but using the BigQueryJobOp component to run the preprocessing script is not optimal. This would require writing and maintaining a separate SQL script for data transformation, which could introduce inconsistencies and errors. It would also make it harder to reuse the same preprocessing logic for both training and serving.
Why not B: Using the Kubeflow Pipelines SDK to implement the pipeline is a valid option, but using the DataflowPythonJobOp component to preprocess the data is not optimal. This would require writing and maintaining a separate Python script for data transformation, which could introduce inconsistencies and errors. It would also make it harder to reuse the same preprocessing logic for both training and serving.
Why not D: Using the TensorFlow Extended SDK to implement the pipeline is a valid option, but implementing the preprocessing steps as part of the input_fn of the model is not optimal. This would make the preprocessing logic tightly coupled with the model code, which could reduce modularity and flexibility. It would also make it harder to reuse the same preprocessing logic for both training and serving.
Question 179
You are training a custom language model for your company using a large dataset. You plan to use the ReductionServer strategy on Vertex Al. You need to configure the worker pools of the distributed training job. What should you do?
Explanation:
According to the web search results, Reduction Server is a faster GPU all-reduce algorithm developed at Google that uses a dedicated set of reducers to aggregate gradients from workers12.Reducers are lightweight CPU VM instances that are significantly cheaper than GPU VMs2.Therefore, the third worker pool should not have any accelerators, and should use a machine type that has high network bandwidth to optimize the communication between workers and reducers2.TPUs are not supported by Reduction Server, so the first two worker pools should have GPUs and use a container image that contains the training code12.The reduction-server container image is provided by Google and should be used for the third worker pool2.
Question 180
You have trained a model by using data that was preprocessed in a batch Dataflow pipeline Your use case requires real-time inference. You want to ensure that the data preprocessing logic is applied consistently between training and serving. What should you do?
Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to ''design, build, and productionalize ML models to solve business challenges using Google Cloud technologies''.Dataflow2is a fully managed, fast, and easy-to-use service for running Apache Spark and Apache Hadoop clusters on Google Cloud. Dataflow supports both batch and streaming data processing pipelines. However, if your use case requires real-time inference, you need to ensure that the data preprocessing logic is applied consistently between training and serving. One way to achieve this is to refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline, and use the same code in the endpoint. This way, you can avoid data skew and drift issues that might arise from using different preprocessing methods for training and serving. Therefore, option B is the best way to ensure the data preprocessing logic is applied consistently between training and serving. The other options are not relevant or optimal for this scenario.Reference:
Professional ML Engineer Exam Guide
Dataflow
Google Professional Machine Learning Certification Exam 2023
Latest Google Professional Machine Learning Engineer Actual Free Exam Questions
Question