ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 16

Question list
Search
Search

List of questions

Search

Related questions











You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?

A.
Call the Cloud Natural Language API from your application. Process the generated Entity Analysis as labels.
A.
Call the Cloud Natural Language API from your application. Process the generated Entity Analysis as labels.
Answers
B.
Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis as labels.
B.
Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis as labels.
Answers
C.
Build and train a text classification model using TensorFlow. Deploy the model using Cloud Machine Learning Engine. Call the model from your application and process the results as labels.
C.
Build and train a text classification model using TensorFlow. Deploy the model using Cloud Machine Learning Engine. Call the model from your application and process the results as labels.
Answers
D.
Build and train a text classification model using TensorFlow. Deploy the model using a Kubernetes Engine cluster. Call the model from your application and process the results as labels.
D.
Build and train a text classification model using TensorFlow. Deploy the model using a Kubernetes Engine cluster. Call the model from your application and process the results as labels.
Answers
Suggested answer: B

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud.

Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?

A.
Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.
A.
Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.
Answers
B.
Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
B.
Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
Answers
C.
Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
C.
Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
Answers
D.
Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
D.
Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
Answers
Suggested answer: A

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud.

You want to support transactions that scale horizontally. You also want to optimize data for range queries on nonkey columns. What should you do?

A.
Use Cloud SQL for storage. Add secondary indexes to support query patterns.
A.
Use Cloud SQL for storage. Add secondary indexes to support query patterns.
Answers
B.
Use Cloud SQL for storage. Use Cloud Dataflow to transform data to support query patterns.
B.
Use Cloud SQL for storage. Use Cloud Dataflow to transform data to support query patterns.
Answers
C.
Use Cloud Spanner for storage. Add secondary indexes to support query patterns.
C.
Use Cloud Spanner for storage. Add secondary indexes to support query patterns.
Answers
D.
Use Cloud Spanner for storage. Use Cloud Dataflow to transform data to support query patterns.
D.
Use Cloud Spanner for storage. Use Cloud Dataflow to transform data to support query patterns.
Answers
Suggested answer: D

Explanation:

Reference: https://cloud.google.com/solutions/data-lifecycle-cloud-platform

Your financial services company is moving to cloud technology and wants to store 50 TB of financial timeseries data in the cloud. This data is updated frequently and new data will be streaming in all the time. Your company also wants to move their existing Apache Hadoop jobs to the cloud to get insights into this data.

Which product should they use to store the data?

A.
Cloud Bigtable
A.
Cloud Bigtable
Answers
B.
Google BigQuery
B.
Google BigQuery
Answers
C.
Google Cloud Storage
C.
Google Cloud Storage
Answers
D.
Google Cloud Datastore
D.
Google Cloud Datastore
Answers
Suggested answer: A

Explanation:

Reference: https://cloud.google.com/bigtable/docs/schema-design-time-series

An organization maintains a Google BigQuery dataset that contains tables with user-level dat

A.
They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?
A.
They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?
Answers
B.
Create and share an authorized view that provides the aggregate results.
B.
Create and share an authorized view that provides the aggregate results.
Answers
C.
Create and share a new dataset and view that provides the aggregate results.
C.
Create and share a new dataset and view that provides the aggregate results.
Answers
D.
Create and share a new dataset and table that contains the aggregate results.
D.
Create and share a new dataset and table that contains the aggregate results.
Answers
E.
Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.
E.
Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.
Answers
Suggested answer: D

Explanation:

Reference: https://cloud.google.com/bigquery/docs/access-control

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of dat

A.
Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?
A.
Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?
Answers
B.
Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.
B.
Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.
Answers
C.
In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used to provide the auditability.
C.
In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used to provide the auditability.
Answers
D.
In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.
D.
In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.
Answers
E.
In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.
E.
In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.
Answers
Suggested answer: B

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

A.
Subsample your test dataset.
A.
Subsample your test dataset.
Answers
B.
Subsample your training dataset.
B.
Subsample your training dataset.
Answers
C.
Increase the number of input features to your model.
C.
Increase the number of input features to your model.
Answers
D.
Increase the number of layers in your neural network.
D.
Increase the number of layers in your neural network.
Answers
Suggested answer: D

Explanation:

Reference: https://towardsdatascience.com/how-to-increase-the-accuracy-of-a-neural-network-9f5d1c6f407d

You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster.

The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?

A.
PigLatin using Pig
A.
PigLatin using Pig
Answers
B.
HiveQL using Hive
B.
HiveQL using Hive
Answers
C.
Java using MapReduce
C.
Java using MapReduce
Answers
D.
Python using MapReduce
D.
Python using MapReduce
Answers
Suggested answer: D

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer dat a. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

A.
Increase the CPU size on your server.
A.
Increase the CPU size on your server.
Answers
B.
Increase the size of the Google Persistent Disk on your server.
B.
Increase the size of the Google Persistent Disk on your server.
Answers
C.
Increase your network bandwidth from your datacenter to GCP.
C.
Increase your network bandwidth from your datacenter to GCP.
Answers
D.
Increase your network bandwidth from Compute Engine to Cloud Storage.
D.
Increase your network bandwidth from Compute Engine to Cloud Storage.
Answers
Suggested answer: C

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.

What should you do?

A.
Select random samples from the tables using the RAND() function and compare the samples.
A.
Select random samples from the tables using the RAND() function and compare the samples.
Answers
B.
Select random samples from the tables using the HASH() function and compare the samples.
B.
Select random samples from the tables using the HASH() function and compare the samples.
Answers
C.
Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.
C.
Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.
Answers
D.
Create stratified random samples using the OVER() function and compare equivalent samples from each table.
D.
Create stratified random samples using the OVER() function and compare equivalent samples from each table.
Answers
Suggested answer: B
Total 372 questions
Go to page: of 38