Google Professional Machine Learning Engineer Practice Test

Question 1

You are training a TensorFlow model on a structured data set with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?

A.

Load the data into BigQuery and read the data from BigQuery.

B.

Load the data into Cloud Bigtable, and read the data from Bigtable

C.

Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage

D.

Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS)

Show Answer Comment (0)

Question 2

You have deployed multiple versions of an image classification model on Al Platform. You want to monitor the performance of the model versions overtime. How should you perform this comparison?

A.

Compare the loss performance for each model on a held-out dataset.

B.

Compare the loss performance for each model on the validation data

C.

Compare the receiver operating characteristic (ROC) curve for each model using the What-lf Tool

D.

Compare the mean average precision across the models using the Continuous Evaluation feature

Show Answer Comment (0)

Suggested answer: D

Explanation:

The performance of an image classification model can be measured by various metrics, such as accuracy, precision, recall, F1-score, and mean average precision (mAP).These metrics can be calculated based on the confusion matrix, which compares the predicted labels and the true labels of the images1

One of the best ways to monitor the performance of multiple versions of an image classification model on AI Platform is to compare the mean average precision across the models using the Continuous Evaluation feature. Mean average precision is a metric that summarizes the precision and recall of a model across different confidence thresholds and classes. Mean average precision is especially useful for multi-class and multi-label image classification problems, where the model has to assign one or more labels to each image from a set of possible labels.Mean average precision can range from 0 to 1, where a higher value indicates a better performance2

Continuous Evaluation is a feature of AI Platform that allows you to automatically evaluate the performance of your deployed models using online prediction requests and responses. Continuous Evaluation can help you monitor the quality and consistency of your models over time, and detect any issues or anomalies that may affect the model performance.Continuous Evaluation can also provide various evaluation metrics and visualizations, such as accuracy, precision, recall, F1-score, ROC curve, and confusion matrix, for different types of models, such as classification, regression, and object detection3

To compare the mean average precision across the models using the Continuous Evaluation feature, you need to do the following steps:

Enable the online prediction logging for each model version that you want to evaluate.This will allow AI Platform to collect the prediction requests and responses from your models and store them in BigQuery4

Create an evaluation job for each model version that you want to evaluate. This will allow AI Platform to compare the predicted labels and the true labels of the images, and calculate the evaluation metrics, such as mean average precision. You need to specify the BigQuery table that contains the prediction logs, the data schema, the label column, and the evaluation interval.

View the evaluation results for each model version on the AI Platform Models page in the Google Cloud console. You can see the mean average precision and other metrics for each model version over time, and compare them using charts and tables. You can also filter the results by different classes and confidence thresholds.

The other options are not as effective or feasible. Comparing the loss performance for each model on a held-out dataset or on the validation data is not a good idea, as the loss function may not reflect the actual performance of the model on the online prediction data, and may vary depending on the choice of the loss function and the optimization algorithm. Comparing the receiver operating characteristic (ROC) curve for each model using the What-If Tool is not possible, as the What-If Tool does not support image data or multi-class classification problems.

asked 18/09/2024

Jose Manuel GONZALEZ BALSEIRO

43 questions

Question 3

Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a change in the distribution of the input data. How should you address the input differences in production?

A.

Create alerts to monitor for skew, and retrain the model.

B.

Perform feature selection on the model, and retrain the model with fewer features

C.

Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service

D.

Perform feature selection on the model, and retrain the model on a monthly basis with fewer features

Show Answer Comment (0)

Question 4

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano. Scikit-team, and custom libraries. What should you do?

A.

Use the Al Platform custom containers feature to receive training jobs using any framework

B.

Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TFJob

C.

Create a library of VM images on Compute Engine; and publish these images on a centralized repository

D.

Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Show Answer Comment (0)

Question 5

You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you do?

A.

Use the BigQuery console to execute your query and then save the query results Into a new BigQuery table.

B.

Write a Python script that uses the BigQuery API to execute queries against BigQuery Execute this script as the first step in your Kubeflow pipeline

C.

Use the Kubeflow Pipelines domain-specific language to create a custom component that uses the Python BigQuery client library to execute queries

D.

Locate the Kubeflow Pipelines repository on GitHub Find the BigQuery Query Component, copy that component's URL, and use it to load the component into your pipeline. Use the component to execute queries against BigQuery

Show Answer Comment (0)

Suggested answer: D

Explanation:

Kubeflow is an open source platform for developing, orchestrating, deploying, and running scalable and portable machine learning workflows on Kubernetes. Kubeflow Pipelines is a component of Kubeflow that allows you to build and manage end-to-end machine learning pipelines using a graphical user interface or a Python-based domain-specific language (DSL).Kubeflow Pipelines can help you automate and orchestrate your machine learning workflows, and integrate with various Google Cloud services and tools1

One of the Google Cloud services that you can use with Kubeflow Pipelines is BigQuery, which is a serverless, scalable, and cost-effective data warehouse that allows you to run fast and complex queries on large-scale data.BigQuery can help you analyze and prepare your data for machine learning, and store and manage your machine learning models2

To execute a query against BigQuery as the first step in your Kubeflow pipeline, and use the results of that query as the input to the next step in your pipeline, the easiest way to do that is to use the BigQuery Query Component, which is a pre-built component that you can find in the Kubeflow Pipelines repository on GitHub. The BigQuery Query Component allows you to run a SQL query on BigQuery, and output the results as a table or a file. You can use the component's URL to load the component into your pipeline, and specify the query and the output parameters.You can then use the output of the component as the input to the next step in your pipeline, such as a data processing or a model training step3

The other options are not as easy or feasible. Using the BigQuery console to execute your query and then save the query results into a new BigQuery table is not a good idea, as it does not integrate with your Kubeflow pipeline, and requires manual intervention and duplication of data. Writing a Python script that uses the BigQuery API to execute queries against BigQuery is not ideal, as it requires writing custom code and handling authentication and error handling. Using the Kubeflow Pipelines DSL to create a custom component that uses the Python BigQuery client library to execute queries is not optimal, as it requires creating and packaging a Docker container image for the component, and testing and debugging the component.

asked 18/09/2024

Mustaqueahmed Ghanchibhai

46 questions

Question 6

You are developing ML models with Al Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

A.

Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job

B.

Use the gcloud command-line tool to submit training jobs on Al Platform when you update your code

C.

Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository

D.

Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

Show Answer Comment (0)

Question 7

Your organization's call center has asked you to develop a model that analyzes customer sentiments in each call. The call center receives over one million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region in which the call originated, and no Personally Identifiable Information (Pll) can be stored or analyzed. The data science team has a third-party tool for visualization and access which requires a SQL ANSI-2011 compliant interface. You need to select components for data processing and for analytics. How should the data pipeline be designed?

Google Professional Machine Learning Engineer image Question 7 30172 09182024191448000000

A.

1 = Dataflow, 2 = BigQuery

B.

1 = Pub/Sub, 2 = Datastore

C.

1 = Dataflow, 2 = Cloud SQL

D.

1 = Cloud Function, 2 = Cloud SQL

Show Answer Comment (0)

Question 8

You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your company's product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use Al Platform's continuous evaluation service to ensure that the models have high accuracy on your test data set. What should you do?

A.

Keep the original test dataset unchanged even if newer products are incorporated into retraining

B.

Extend your test dataset with images of the newer products when they are introduced to retraining

C.

Replace your test dataset with images of the newer products when they are introduced to retraining.

D.

Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.

Show Answer Comment (0)

Question 9

You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?

A.

Dataflow

B.

Dataprep

C.

Apache Flink

D.

Cloud Data Fusion

Show Answer Comment (0)

Suggested answer: D

Explanation:

Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. It provides a graphical interface to increase time efficiency and reduce complexity, and allows users to easily create and explore data pipelines using a code-free, point and click visual interface. Cloud Data Fusion also supports a broad range of data sources and formats, including on-premises data marts, and ensures data quality and security by using built-in transformation capabilities and Cloud Data Loss Prevention. Cloud Data Fusion lowers the total cost of ownership by handling performance, scalability, availability, security, and compliance needs automatically.Reference:

Cloud Data Fusion documentation

Cloud Data Fusion overview

asked 18/09/2024

Nazarii Bybyk

38 questions

Question 10

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

A.

Use Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery

B.

Convert your PySpark into SparkSQL queries to transform the data and then run your pipeline on Dataproc to write the data into BigQuery.

C.

Ingest your data into Cloud SQL convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning

D.

Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table

Show Answer Comment (0)

Suggested answer: D

Explanation:

BigQuery is a serverless, scalable, and cost-effective data warehouse that allows users to run SQL queries on large volumes of data. BigQuery Load is a tool that can ingest data from Cloud Storage into BigQuery tables. BigQuery SQL is a dialect of SQL that supports many of the same functions and operations as PySpark, such as window functions, aggregate functions, joins, and subqueries. By using BigQuery Load and BigQuery SQL, you can rebuild your ML pipeline for structured data on Google Cloud without having to manage any servers or clusters, and with faster performance and lower cost than using PySpark on Dataproc. You can also use BigQuery ML to create and evaluate ML models using SQL commands.Reference:

BigQuery documentation

BigQuery Load documentation

BigQuery SQL reference

BigQuery ML documentation

asked 18/09/2024

Dennis Spring

52 questions

Google Professional Machine Learning Engineer Practice Test - Questions Answers

List of questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Related questions

Google Professional Machine Learning Engineer Practice Test - Questions Answers

List of questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question

Case Study

Drag and Drop

Hot Area

Related questions

Export

Practice Tests