Google Professional Machine Learning Engineer Practice Test

Question 11

You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (Pll) to Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the Pll is not accessible by unauthorized individuals?

A.

Stream all files to Google CloudT and then write the data to BigQuery Periodically conduct a bulk scan of the table using the DLP API.

B.

Stream all files to Google Cloud, and write batches of the data to BigQuery While the data is being written to BigQuery conduct a bulk scan of the data using the DLP API.

C.

Create two buckets of data Sensitive and Non-sensitive Write all data to the Non-sensitive bucket Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket

D.

Create three buckets of data: Quarantine, Sensitive, and Non-sensitive Write all data to the Quarantine bucket.

E.

Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket

Show Answer Comment (0)

Question 12

As the lead ML Engineer for your company, you are responsible for building ML models to digitize scanned customer forms. You have developed a TensorFlow model that converts the scanned images into text and stores them in Cloud Storage. You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention. What should you do?

A.

Use the batch prediction functionality of Al Platform

B.

Create a serving pipeline in Compute Engine for prediction

C.

Use Cloud Functions for prediction each time a new data point is ingested

D.

Deploy the model on Al Platform and create a version of it for online inference.

Show Answer Comment (0)

Question 13

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data. Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation?

A.

Use the TFX ModelValidator tools to specify performance metrics for production readiness

B.

Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.

C.

Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data

D.

Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.

Show Answer Comment (0)

Question 14

You work on a growing team of more than 50 data scientists who all use Al Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?

A.

Set up restrictive I AM permissions on the Al Platform notebooks so that only a single user or group can access a given instance.

B.

Separate each data scientist's work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.

C.

Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources

D.

Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about Al Platform resource usage In BigQuery create a SQL view that maps users to the resources they are using.

Show Answer Comment (0)

Question 15

During batch training of a neural network, you notice that there is an oscillation in the loss. How should you adjust your model to ensure that it converges?

A.

Increase the size of the training batch

B.

Decrease the size of the training batch

C.

Increase the learning rate hyperparameter

D.

Decrease the learning rate hyperparameter

Show Answer Comment (0)

Question 16

You are building a linear model with over 100 input features, all with values between -1 and 1. You suspect that many features are non-informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which technique should you use?

A.

Use Principal Component Analysis to eliminate the least informative features.

B.

Use L1 regularization to reduce the coefficients of uninformative features to 0.

C.

After building your model, use Shapley values to determine which features are the most informative.

D.

Use an iterative dropout technique to identify which features do not degrade the model when removed.

Show Answer Comment (0)

Question 17

Your team has been tasked with creating an ML solution in Google Cloud to classify support requests for one of your platforms. You analyzed the requirements and decided to use TensorFlow to build the classifier so that you have full control of the model's code, serving, and deployment. You will use Kubeflow pipelines for the ML platform. To save time, you want to build on existing resources and use managed services instead of building a completely new model. How should you build the classifier?

A.

Use the Natural Language API to classify support requests

B.

Use AutoML Natural Language to build the support requests classifier

C.

Use an established text classification model on Al Platform to perform transfer learning

D.

Use an established text classification model on Al Platform as-is to classify support requests

Show Answer Comment (0)

Question 18

Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written. You have a large training dataset that is structured like this: