ExamGecko
Home Home / Google / Professional Machine Learning Engineer

Google Professional Machine Learning Engineer Practice Test - Questions Answers, Page 3

Question list
Search
Search

List of questions

Search

Related questions











You are building a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

A.
Create a new view with BigQuery that does not include a column with city information
A.
Create a new view with BigQuery that does not include a column with city information
Answers
B.
Use Dataprep to transform the state column using a one-hot encoding method, and make each city a column with binary values.
B.
Use Dataprep to transform the state column using a one-hot encoding method, and make each city a column with binary values.
Answers
C.
Use Cloud Data Fusion to assign each city to a region labeled as 1, 2, 3, 4, or 5r and then use that number to represent the city in the model.
C.
Use Cloud Data Fusion to assign each city to a region labeled as 1, 2, 3, 4, or 5r and then use that number to represent the city in the model.
Answers
D.
Use TensorFlow to create a categorical variable with a vocabulary list Create the vocabulary file, and upload it as part of your model to BigQuery ML.
D.
Use TensorFlow to create a categorical variable with a vocabulary list Create the vocabulary file, and upload it as part of your model to BigQuery ML.
Answers
Suggested answer: B

Explanation:

One-hot encoding is a technique that converts categorical variables into numerical variables by creating dummy variables for each possible category.Each dummy variable has a value of 1 if the original variable belongs to that category, and 0 otherwise1.One-hot encoding can help linear regression models to capture the effect of different categories on the target variable without imposing any ordinal relationship among them2. Dataprep is a service that allows you to explore, clean, and transform your data for analysis and machine learning.You can use Dataprep to apply one-hot encoding to your city name variable and make each city a column with binary values3. This way, you can prepare your data using the least amount of coding while maintaining the predictive variables. Therefore, using Dataprep to transform the state column using a one-hot encoding method is the best option for this use case.

One Hot Encoding: A Beginner's Guide

One-Hot Encoding in Linear Regression Models

Dataprep documentation

You work for a toy manufacturer that has been experiencing a large increase in demand. You need to build an ML model to reduce the amount of time spent by quality control inspectors checking for product defects. Faster defect detection is a priority. The factory does not have reliable Wi-Fi. Your company wants to implement the new ML model as soon as possible. Which model should you use?

A.
AutoML Vision model
A.
AutoML Vision model
Answers
B.
AutoML Vision Edge mobile-versatile-1 model
B.
AutoML Vision Edge mobile-versatile-1 model
Answers
C.
AutoML Vision Edge mobile-low-latency-1 model
C.
AutoML Vision Edge mobile-low-latency-1 model
Answers
D.
AutoML Vision Edge mobile-high-accuracy-1 model
D.
AutoML Vision Edge mobile-high-accuracy-1 model
Answers
Suggested answer: C

Explanation:

AutoML Vision Edge is a service that allows you to create custom image classification and object detection models that can run on edge devices, such as mobile phones, tablets, or IoT devices1.AutoML Vision Edge offers four types of models that vary in size, accuracy, and latency: mobile-versatile-1, mobile-low-latency-1, mobile-high-accuracy-1, and mobile-core-ml-low-latency-12. Each model has its own trade-offs and use cases, depending on the device specifications and the application requirements.

For the use case of building an ML model to reduce the amount of time spent by quality control inspectors checking for product defects, the best model to use is the AutoML Vision Edge mobile-low-latency-1 model.This model is optimized for fast inference on mobile devices, with a latency of less than 50 milliseconds on a Pixel 1 phone2. Faster defect detection is a priority for the toy manufacturer, and the factory does not have reliable Wi-Fi, so a low-latency model that can run on the device without internet connection is ideal.The mobile-low-latency-1 model also has a small size of less than 4 MB, which makes it easy to deploy and update2.The mobile-low-latency-1 model has a slightly lower accuracy than the mobile-high-accuracy-1 model, but it is still suitable for most image classification tasks2. Therefore, the AutoML Vision Edge mobile-low-latency-1 model is the best option for this use case.

AutoML Vision Edge documentation

AutoML Vision Edge model types

You are going to train a DNN regression model with Keras APIs using this code:

How many trainable weights does your model have? (The arithmetic below is correct.)

A.
501*256+257*128+2 = 161154
A.
501*256+257*128+2 = 161154
Answers
B.
500*256+256*128+128*2 = 161024
B.
500*256+256*128+128*2 = 161024
Answers
C.
501*256+257*128+128*2=161408
C.
501*256+257*128+128*2=161408
Answers
D.
500*256*0 25+256*128*0 25+128*2 = 40448
D.
500*256*0 25+256*128*0 25+128*2 = 40448
Answers
Suggested answer: B

Explanation:

The number of trainable weights in a DNN regression model with Keras APIs can be calculated by multiplying the number of input units by the number of output units for each layer, and adding the number of bias units for each layer.The bias units are usually equal to the number of output units, except for the last layer, which does not have bias units if the activation function is softmax1. In this code, the model has three layers: a dense layer with 256 units and relu activation, a dropout layer with 0.25 rate, and a dense layer with 2 units and softmax activation. The input shape is 500. Therefore, the number of trainable weights is:

For the first layer: 500 input units * 256 output units + 256 bias units = 128256

For the second layer: The dropout layer does not have any trainable weights, as it only randomly sets some of the input units to zero to prevent overfitting2.

For the third layer: 256 input units * 2 output units + 0 bias units = 512

The total number of trainable weights is 128256 + 512 = 161024. Therefore, the correct answer is B.

How to calculate the number of parameters for a Convolutional Neural Network?

Dropout (keras.io)

You recently joined a machine learning team that will soon release a new project. As a lead on the project, you are asked to determine the production readiness of the ML components. The team has already tested features and data, model development, and infrastructure. Which additional readiness check should you recommend to the team?

A.
Ensure that training is reproducible
A.
Ensure that training is reproducible
Answers
B.
Ensure that all hyperparameters are tuned
B.
Ensure that all hyperparameters are tuned
Answers
C.
Ensure that model performance is monitored
C.
Ensure that model performance is monitored
Answers
D.
Ensure that feature expectations are captured in the schema
D.
Ensure that feature expectations are captured in the schema
Answers
Suggested answer: C

Explanation:

Monitoring model performance is an essential part of production readiness, as it allows the team to detect and address any issues that may arise after deployment, such as data drift, model degradation, or errors.

Other Options:

A) Ensuring that training is reproducible is important for model development, but not necessarily for production readiness. Reproducibility helps the team to track and compare different experiments, but it does not guarantee that the model will perform well in production.

B) Ensuring that all hyperparameters are tuned is also important for model development, but not sufficient for production readiness. Hyperparameter tuning helps the team to find the optimal configuration for the model, but it does not account for the dynamic and changing nature of the production environment.

D) Ensuring that feature expectations are captured in the schema is a part of testing features and data, which the team has already done. The schema defines the expected format, type, and range of the features, and helps the team to validate and preprocess the data.

You recently designed and built a custom neural network that uses critical dependencies specific to your organization's framework. You need to train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by Al Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the scheduler, workers, and servers distribution structure. What should you do?

A.
Use a built-in model available on Al Platform Training
A.
Use a built-in model available on Al Platform Training
Answers
B.
Build your custom container to run jobs on Al Platform Training
B.
Build your custom container to run jobs on Al Platform Training
Answers
C.
Build your custom containers to run distributed training jobs on Al Platform Training
C.
Build your custom containers to run distributed training jobs on Al Platform Training
Answers
D.
Reconfigure your code to a ML framework with dependencies that are supported by Al Platform Training
D.
Reconfigure your code to a ML framework with dependencies that are supported by Al Platform Training
Answers
Suggested answer: C

Explanation:

AI Platform Training is a service that allows you to run your machine learning training jobs on Google Cloud using various features, model architectures, and hyperparameters.You can use AI Platform Training to scale up your training jobs, leverage distributed training, and access specialized hardware such as GPUs and TPUs1.AI Platform Training supports several pre-built containers that provide different ML frameworks and dependencies, such as TensorFlow, PyTorch, scikit-learn, and XGBoost2.However, if the ML framework and related dependencies that you need are not supported by the pre-built containers, you can build your own custom containers and use them to run your training jobs on AI Platform Training3.

Custom containers are Docker images that you create to run your training application.By using custom containers, you can specify and pre-install all the dependencies needed for your application, and have full control over the code, serving, and deployment of your model4.Custom containers also enable you to run distributed training jobs on AI Platform Training, which can help you train large-scale and complex models faster and more efficiently5. Distributed training is a technique that splits the training data and computation across multiple machines, and coordinates them to update the model parameters. AI Platform Training supports two types of distributed training: parameter server and collective all-reduce. The parameter server architecture consists of a set of workers that perform the computation, and a set of servers that store and update the model parameters. The collective all-reduce architecture consists of a set of workers that perform the computation and synchronize the model parameters among themselves. Both architectures also have a scheduler that coordinates the workers and servers.

For the use case of training a custom neural network that uses critical dependencies specific to your organization's framework, the best option is to build your custom containers to run distributed training jobs on AI Platform Training. This option allows you to use the ML framework and dependencies of your choice, and train your model on multiple machines without having to manage the infrastructure. Since your ML framework of choice uses the scheduler, workers, and servers distribution structure, you can use the parameter server architecture to run your distributed training job on AI Platform Training. You can specify the number and type of machines, the custom container image, and the training application arguments when you submit your training job. Therefore, building your custom containers to run distributed training jobs on AI Platform Training is the best option for this use case.

AI Platform Training documentation

Pre-built containers for training

Custom containers for training

Custom containers overview | Vertex AI | Google Cloud

Distributed training overview

[Types of distributed training]

[Distributed training architectures]

[Using custom containers for training with the parameter server architecture]

You are an ML engineer in the contact center of a large enterprise. You need to build a sentiment analysis tool that predicts customer sentiment from recorded phone conversations. You need to identify the best approach to building a model while ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results. What should you do?

A.
Extract sentiment directly from the voice recordings
A.
Extract sentiment directly from the voice recordings
Answers
B.
Convert the speech to text and build a model based on the words
B.
Convert the speech to text and build a model based on the words
Answers
C.
Convert the speech to text and extract sentiments based on the sentences
C.
Convert the speech to text and extract sentiments based on the sentences
Answers
D.
Convert the speech to text and extract sentiment using syntactical analysis
D.
Convert the speech to text and extract sentiment using syntactical analysis
Answers
Suggested answer: C

Explanation:

Sentiment analysis is the process of identifying and extracting the emotions, opinions, and attitudes expressed in a text or speech. Sentiment analysis can help businesses understand their customers' feedback, satisfaction, and preferences. There are different approaches to building a sentiment analysis tool, depending on the input data and the output format. Some of the common approaches are:

Extracting sentiment directly from the voice recordings: This approach involves using acoustic features, such as pitch, intensity, and prosody, to infer the sentiment of the speaker. This approach can capture the nuances and subtleties of the vocal expression, but it also requires a large and diverse dataset of labeled voice recordings, which may not be easily available or accessible. Moreover, this approach may not account for the semantic and contextual information of the speech, which can also affect the sentiment.

Converting the speech to text and building a model based on the words: This approach involves using automatic speech recognition (ASR) to transcribe the voice recordings into text, and then using lexical features, such as word frequency, polarity, and valence, to infer the sentiment of the text. This approach can leverage the existing text-based sentiment analysis models and tools, but it also introduces some challenges, such as the accuracy and reliability of the ASR system, the ambiguity and variability of the natural language, and the loss of the acoustic information of the speech.

Converting the speech to text and extracting sentiments based on the sentences: This approach involves using ASR to transcribe the voice recordings into text, and then using syntactic and semantic features, such as sentence structure, word order, and meaning, to infer the sentiment of the text. This approach can capture the higher-level and complex aspects of the natural language, such as negation, sarcasm, and irony, which can affect the sentiment. However, this approach also requires more sophisticated and advanced natural language processing techniques, such as parsing, dependency analysis, and semantic role labeling, which may not be readily available or easy to implement.

Converting the speech to text and extracting sentiment using syntactical analysis: This approach involves using ASR to transcribe the voice recordings into text, and then using syntactical analysis, such as part-of-speech tagging, phrase chunking, and constituency parsing, to infer the sentiment of the text. This approach can identify the grammatical and structural elements of the natural language, such as nouns, verbs, adjectives, and clauses, which can indicate the sentiment. However, this approach may not account for the pragmatic and contextual information of the speech, such as the speaker's intention, tone, and situation, which can also influence the sentiment.

For the use case of building a sentiment analysis tool that predicts customer sentiment from recorded phone conversations, the best approach is to convert the speech to text and extract sentiments based on the sentences. This approach can balance the trade-offs between the accuracy, complexity, and feasibility of the sentiment analysis tool, while ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results. This approach can also handle different types and levels of sentiment, such as polarity (positive, negative, or neutral), intensity (strong or weak), and emotion (anger, joy, sadness, etc.). Therefore, converting the speech to text and extracting sentiments based on the sentences is the best approach for this use case.

You work for an advertising company and want to understand the effectiveness of your company's latest advertising campaign. You have streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas dataframe in an Al Platform notebook. What should you do?

A.
Use Al Platform Notebooks' BigQuery cell magic to query the data, and ingest the results as a pandas dataframe
A.
Use Al Platform Notebooks' BigQuery cell magic to query the data, and ingest the results as a pandas dataframe
Answers
B.
Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance
B.
Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance
Answers
C.
Download your table from BigQuery as a local CSV file, and upload it to your Al Platform notebook instance Use pandas. read_csv to ingest the file as a pandas dataframe
C.
Download your table from BigQuery as a local CSV file, and upload it to your Al Platform notebook instance Use pandas. read_csv to ingest the file as a pandas dataframe
Answers
D.
From a bash cell in your Al Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use gsutii cp to copy the data into the notebook Use pandas. read_csv to ingest the file as a pandas dataframe
D.
From a bash cell in your Al Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use gsutii cp to copy the data into the notebook Use pandas. read_csv to ingest the file as a pandas dataframe
Answers
Suggested answer: A

Explanation:

AI Platform Notebooks is a service that provides managed Jupyter notebooks for data science and machine learning.You can use AI Platform Notebooks to create, run, and share your code and analysis in a collaborative and interactive environment1. BigQuery is a service that allows you to analyze large-scale and complex data using SQL queries.You can use BigQuery to stream, store, and query your data in a fast and cost-effective way2. Pandas is a popular Python library that provides data structures and tools for data analysis and manipulation.You can use pandas to create, manipulate, and visualize dataframes, which are tabular data structures with rows and columns3.

AI Platform Notebooks provides a cell magic, %%bigquery, that allows you to run SQL queries on BigQuery data and ingest the results as a pandas dataframe. A cell magic is a special command that applies to the whole cell in a Jupyter notebook.The %%bigquery cell magic can take various arguments, such as the name of the destination dataframe, the name of the destination table in BigQuery, the project ID, and the query parameters4. By using the %%bigquery cell magic, you can query the data in BigQuery with minimal code and manipulate the results with pandas in AI Platform Notebooks. This is the most convenient and efficient way to achieve your goal.

The other options are not as good as option A, because they involve more steps, more code, and more manual effort. Option B requires you to export your table as a CSV file from BigQuery to Google Drive, and then use the Google Drive API to ingest the file into your notebook instance. This option is cumbersome and time-consuming, as it involves moving the data across different services and formats. Option C requires you to download your table from BigQuery as a local CSV file, and then upload it to your AI Platform notebook instance. This option is also inefficient and impractical, as it involves downloading and uploading large files, which can take a long time and consume a lot of bandwidth. Option D requires you to use a bash cell in your AI Platform notebook to export the table as a CSV file to Cloud Storage, and then copy the data into the notebook. This option is also complex and unnecessary, as it involves using different commands and tools to move the data around. Therefore, option A is the best option for this use case.

AI Platform Notebooks documentation

BigQuery documentation

pandas documentation

Using Jupyter magics to query BigQuery data

You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on Al Platform for high-throughput online prediction. Which architecture should you use?

A.
* Validate the accuracy of the model that you trained on preprocessed data * Create a new model that uses the raw data and is available in real time * Deploy the new model onto Al Platform for online prediction
A.
* Validate the accuracy of the model that you trained on preprocessed data * Create a new model that uses the raw data and is available in real time * Deploy the new model onto Al Platform for online prediction
Answers
B.
* Send incoming prediction requests to a Pub/Sub topic * Transform the incoming data using a Dataflow job * Submit a prediction request to Al Platform using the transformed data * Write the predictions to an outbound Pub/Sub queue
B.
* Send incoming prediction requests to a Pub/Sub topic * Transform the incoming data using a Dataflow job * Submit a prediction request to Al Platform using the transformed data * Write the predictions to an outbound Pub/Sub queue
Answers
C.
* Stream incoming prediction request data into Cloud Spanner * Create a view to abstract your preprocessing logic. * Query the view every second for new records * Submit a prediction request to Al Platform using the transformed data * Write the predictions to an outbound Pub/Sub queue.
C.
* Stream incoming prediction request data into Cloud Spanner * Create a view to abstract your preprocessing logic. * Query the view every second for new records * Submit a prediction request to Al Platform using the transformed data * Write the predictions to an outbound Pub/Sub queue.
Answers
D.
* Send incoming prediction requests to a Pub/Sub topic * Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic. * Implement your preprocessing logic in the Cloud Function * Submit a prediction request to Al Platform using the transformed data * Write the predictions to an outbound Pub/Sub queue
D.
* Send incoming prediction requests to a Pub/Sub topic * Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic. * Implement your preprocessing logic in the Cloud Function * Submit a prediction request to Al Platform using the transformed data * Write the predictions to an outbound Pub/Sub queue
Answers
Suggested answer: D

Explanation:

Option A is incorrect because creating a new model that uses the raw data and is available in real time would require retraining the model and deploying it again, which is not efficient or scalable.

Option B is incorrect because using a Dataflow job to transform the incoming data would introduce unnecessary latency and complexity for online prediction, which requires fast and simple processing.

Option C is incorrect because using Cloud Spanner to stream and query the incoming data would incur high costs and overhead for online prediction, which does not need a relational database.

Option D is correct because using a Cloud Function to preprocess the data and submit a prediction request to Al Platform is a simple and scalable solution for online prediction, which leverages the serverless and event-driven features of Cloud Functions.

You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model's accuracy dropped to 66%. How can you make your production model more accurate?

A.
Normalize the data for the training, and test datasets as two separate steps.
A.
Normalize the data for the training, and test datasets as two separate steps.
Answers
B.
Split the training and test data based on time rather than a random split to avoid leakage
B.
Split the training and test data based on time rather than a random split to avoid leakage
Answers
C.
Add more data to your test set to ensure that you have a fair distribution and sample for testing
C.
Add more data to your test set to ensure that you have a fair distribution and sample for testing
Answers
D.
Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.
D.
Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.
Answers
Suggested answer: B

Explanation:

When building a model to predict daily temperatures, it is important to split the training and test data based on time rather than a random split. This is because temperature data is likely to have temporal dependencies and patterns, such as seasonality, trends, and cycles. If the data is split randomly, there is a risk of data leakage, which occurs when information from the future is used to train or validate the model. Data leakage can lead to overfitting and unrealistic performance estimates, as the model may learn from data that it should not have access to. By splitting the data based on time, such as using the most recent data as the test set and the older data as the training set, the model can be evaluated on how well it can forecast future temperatures based on past data, which is the realistic scenario in production. Therefore, splitting the data based on time rather than a random split is the best way to make the production model more accurate.

You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?

A.
Normalize the data using Google Kubernetes Engine
A.
Normalize the data using Google Kubernetes Engine
Answers
B.
Translate the normalization algorithm into SQL for use with BigQuery
B.
Translate the normalization algorithm into SQL for use with BigQuery
Answers
C.
Use the normalizer_fn argument in TensorFlow's Feature Column API
C.
Use the normalizer_fn argument in TensorFlow's Feature Column API
Answers
D.
Normalize the data with Apache Spark using the Dataproc connector for BigQuery
D.
Normalize the data with Apache Spark using the Dataproc connector for BigQuery
Answers
Suggested answer: B

Explanation:

Z-score normalization is a technique that transforms the values of a numeric variable into standardized units, such that the mean is zero and the standard deviation is one. Z-score normalization can help to compare variables with different scales and ranges, and to reduce the effect of outliers and skewness. The formula for z-score normalization is:

z = (x - mu) / sigma

where x is the original value, mu is the mean of the variable, and sigma is the standard deviation of the variable.

Dataflow is a service that allows you to create and run data processing pipelines on Google Cloud. You can use Dataflow to preprocess raw data prior to model training and prediction, such as applying z-score normalization on data stored in BigQuery. However, using Dataflow for this task may not be the most efficient option, as it involves reading and writing data from and to BigQuery, which can be time-consuming and costly. Moreover, using Dataflow requires manual intervention to update the pipeline whenever new training data is added.

A more efficient way to perform z-score normalization on data stored in BigQuery is to translate the normalization algorithm into SQL and use it with BigQuery. BigQuery is a service that allows you to analyze large-scale and complex data using SQL queries. You can use BigQuery to perform z-score normalization on your data using SQL functions such as AVG(), STDDEV_POP(), and OVER(). For example, the following SQL query can normalize the values of a column called temperature in a table called weather:

SELECT (temperature - AVG(temperature) OVER ()) / STDDEV_POP(temperature) OVER () AS normalized_temperature FROM weather;

By using SQL to perform z-score normalization on BigQuery, you can make the process more efficient by minimizing computation time and manual intervention. You can also leverage the scalability and performance of BigQuery to handle large and complex datasets. Therefore, translating the normalization algorithm into SQL for use with BigQuery is the best option for this use case.

Total 285 questions
Go to page: of 29