ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 21

Question list
Search
Search

List of questions

Search

Related questions











You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?

A.
Store and process the entire dataset in BigQuery.
A.
Store and process the entire dataset in BigQuery.
Answers
B.
Store and process the entire dataset in Cloud Bigtable.
B.
Store and process the entire dataset in Cloud Bigtable.
Answers
C.
Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket.
C.
Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket.
Answers
D.
Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active.
D.
Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active.
Answers
Suggested answer: C

You work for a manufacturing company that sources up to 750 different components, each from a different supplier. You've collected a labeled dataset that has on average 1000 examples for each unique component. Your team wants to implement an app to help warehouse workers recognize incoming components based on a photo of the component. You want to implement the first working version of this app (as Proof-Of-Concept) within a few working days. What should you do?

A.
Use Cloud Vision AutoML with the existing dataset.
A.
Use Cloud Vision AutoML with the existing dataset.
Answers
B.
Use Cloud Vision AutoML, but reduce your dataset twice.
B.
Use Cloud Vision AutoML, but reduce your dataset twice.
Answers
C.
Use Cloud Vision API by providing custom labels as recognition hints.
C.
Use Cloud Vision API by providing custom labels as recognition hints.
Answers
D.
Train your own image recognition model leveraging transfer learning techniques.
D.
Train your own image recognition model leveraging transfer learning techniques.
Answers
Suggested answer: A

You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?

A.
Use Cloud TPUs without any additional adjustment to your code.
A.
Use Cloud TPUs without any additional adjustment to your code.
Answers
B.
Use Cloud TPUs after implementing GPU kernel support for your customs ops.
B.
Use Cloud TPUs after implementing GPU kernel support for your customs ops.
Answers
C.
Use Cloud GPUs after implementing GPU kernel support for your customs ops.
C.
Use Cloud GPUs after implementing GPU kernel support for your customs ops.
Answers
D.
Stay on CPUs, and increase the size of the cluster you're training your model on.
D.
Stay on CPUs, and increase the size of the cluster you're training your model on.
Answers
Suggested answer: B

You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio).

After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?

A.
Increase the share of the test sample in the train-test split.
A.
Increase the share of the test sample in the train-test split.
Answers
B.
Try to collect more data and increase the size of your dataset.
B.
Try to collect more data and increase the size of your dataset.
Answers
C.
Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.
C.
Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.
Answers
D.
Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.
D.
Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.
Answers
Suggested answer: D

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

A.
Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.
A.
Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.
Answers
B.
Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.
B.
Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.
Answers
C.
Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.
C.
Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.
Answers
D.
Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.
D.
Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.
Answers
Suggested answer: D

The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error.

What should you do?

A.
Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.
A.
Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.
Answers
B.
Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.
B.
Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.
Answers
C.
Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.
C.
Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.
Answers
D.
Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.
D.
Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.
Answers
Suggested answer: D

As your organization expands its usage of GCP, many teams have started to create their own projects.

Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take?

Choose 2 answers.

A.
Use Cloud Deployment Manager to automate access provision.
A.
Use Cloud Deployment Manager to automate access provision.
Answers
B.
Introduce resource hierarchy to leverage access control policy inheritance.
B.
Introduce resource hierarchy to leverage access control policy inheritance.
Answers
C.
Create distinct groups for various teams, and specify groups in Cloud IAM policies.
C.
Create distinct groups for various teams, and specify groups in Cloud IAM policies.
Answers
D.
Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.
D.
Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.
Answers
E.
For each Cloud Storage bucket or BigQuery dataset, decide which projects need access. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.
E.
For each Cloud Storage bucket or BigQuery dataset, decide which projects need access. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.
Answers
Suggested answer: A, C

Your United States-based company has created an application for assessing and responding to user actions. The primary table's data volume grows by 250,000 records per second. Many third parties use your application's APIs to build the functionality into their own frontend applications. Your application's APIs should comply with the following requirements:

Single global endpoint

ANSI SQL support

Consistent access to the most up-to-date data

What should you do?

A.
Implement BigQuery with no region selected for storage or processing.
A.
Implement BigQuery with no region selected for storage or processing.
Answers
B.
Implement Cloud Spanner with the leader in North America and read-only replicas in Asia and Europe.
B.
Implement Cloud Spanner with the leader in North America and read-only replicas in Asia and Europe.
Answers
C.
Implement Cloud SQL for PostgreSQL with the master in Norht America and read replicas in Asia and Europe.
C.
Implement Cloud SQL for PostgreSQL with the master in Norht America and read replicas in Asia and Europe.
Answers
D.
Implement Cloud Bigtable with the primary cluster in North America and secondary clusters in Asia and Europe.
D.
Implement Cloud Bigtable with the primary cluster in North America and secondary clusters in Asia and Europe.
Answers
Suggested answer: B

A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL 'dataset.model', table user_features). How should you create the ML pipeline?

A.
Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.
A.
Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.
Answers
B.
Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.
B.
Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.
Answers
C.
Create a Cloud Dataflow pipeline using BigQueryIO to read results from the query. Grant the Dataflow Worker role to the application service account.
C.
Create a Cloud Dataflow pipeline using BigQueryIO to read results from the query. Grant the Dataflow Worker role to the application service account.
Answers
D.
Create a Cloud Dataflow pipeline using BigQueryIO to read predictions for all users from the query.Write the results to Cloud Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Cloud Bigtable.
D.
Create a Cloud Dataflow pipeline using BigQueryIO to read predictions for all users from the query.Write the results to Cloud Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Cloud Bigtable.
Answers
Suggested answer: D

You are building an application to share financial market data with consumers, who will receive data feeds. Data is collected from the markets in real time. Consumers will receive the data in the following ways:

Real-time event stream

ANSI SQL access to real-time stream and historical data

Batch historical exports

Which solution should you use?

A.
Cloud Dataflow, Cloud SQL, Cloud Spanner
A.
Cloud Dataflow, Cloud SQL, Cloud Spanner
Answers
B.
Cloud Pub/Sub, Cloud Storage, BigQuery
B.
Cloud Pub/Sub, Cloud Storage, BigQuery
Answers
C.
Cloud Dataproc, Cloud Dataflow, BigQuery
C.
Cloud Dataproc, Cloud Dataflow, BigQuery
Answers
D.
Cloud Pub/Sub, Cloud Dataproc, Cloud SQL
D.
Cloud Pub/Sub, Cloud Dataproc, Cloud SQL
Answers
Suggested answer: A
Total 372 questions
Go to page: of 38