ExamGecko
Home / Google / Professional Data Engineer / List of questions
Ask Question

Google Professional Data Engineer Practice Test - Questions Answers, Page 16

List of questions

Question 151

Report
Export
Collapse

You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?

Call the Cloud Natural Language API from your application. Process the generated Entity Analysis as labels.
Call the Cloud Natural Language API from your application. Process the generated Entity Analysis as labels.
Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis as labels.
Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis as labels.
Build and train a text classification model using TensorFlow. Deploy the model using Cloud Machine Learning Engine. Call the model from your application and process the results as labels.
Build and train a text classification model using TensorFlow. Deploy the model using Cloud Machine Learning Engine. Call the model from your application and process the results as labels.
Build and train a text classification model using TensorFlow. Deploy the model using a Kubernetes Engine cluster. Call the model from your application and process the results as labels.
Build and train a text classification model using TensorFlow. Deploy the model using a Kubernetes Engine cluster. Call the model from your application and process the results as labels.
Suggested answer: B
asked 18/09/2024
Lea Kohl
46 questions

Question 152

Report
Export
Collapse

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud.

Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?

Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.
Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.
Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
Suggested answer: A
asked 18/09/2024
Yener Yuksel
39 questions

Question 153

Report
Export
Collapse

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud.

You want to support transactions that scale horizontally. You also want to optimize data for range queries on nonkey columns. What should you do?

Use Cloud SQL for storage. Add secondary indexes to support query patterns.
Use Cloud SQL for storage. Add secondary indexes to support query patterns.
Use Cloud SQL for storage. Use Cloud Dataflow to transform data to support query patterns.
Use Cloud SQL for storage. Use Cloud Dataflow to transform data to support query patterns.
Use Cloud Spanner for storage. Add secondary indexes to support query patterns.
Use Cloud Spanner for storage. Add secondary indexes to support query patterns.
Use Cloud Spanner for storage. Use Cloud Dataflow to transform data to support query patterns.
Use Cloud Spanner for storage. Use Cloud Dataflow to transform data to support query patterns.
Suggested answer: D

Explanation:

Reference: https://cloud.google.com/solutions/data-lifecycle-cloud-platform

asked 18/09/2024
Dusan Munjiza
42 questions

Question 154

Report
Export
Collapse

Your financial services company is moving to cloud technology and wants to store 50 TB of financial timeseries data in the cloud. This data is updated frequently and new data will be streaming in all the time. Your company also wants to move their existing Apache Hadoop jobs to the cloud to get insights into this data.

Which product should they use to store the data?

Cloud Bigtable
Cloud Bigtable
Google BigQuery
Google BigQuery
Google Cloud Storage
Google Cloud Storage
Google Cloud Datastore
Google Cloud Datastore
Suggested answer: A

Explanation:

Reference: https://cloud.google.com/bigtable/docs/schema-design-time-series

asked 18/09/2024
Lukas Reker
31 questions

Question 155

Report
Export
Collapse

An organization maintains a Google BigQuery dataset that contains tables with user-level dat

They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?
They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?
Create and share an authorized view that provides the aggregate results.
Create and share an authorized view that provides the aggregate results.
Create and share a new dataset and view that provides the aggregate results.
Create and share a new dataset and view that provides the aggregate results.
Create and share a new dataset and table that contains the aggregate results.
Create and share a new dataset and table that contains the aggregate results.
Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.
Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.
Suggested answer: D

Explanation:

Reference: https://cloud.google.com/bigquery/docs/access-control

asked 18/09/2024
Solomon Waya
40 questions

Question 156

Report
Export
Collapse

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of dat

Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?
Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?
Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.
Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.
In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used to provide the auditability.
In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used to provide the auditability.
In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.
In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.
In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.
In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.
Suggested answer: B
asked 18/09/2024
Tuukka Valkeasuo
33 questions

Question 157

Report
Export
Collapse

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Subsample your test dataset.
Subsample your test dataset.
Subsample your training dataset.
Subsample your training dataset.
Increase the number of input features to your model.
Increase the number of input features to your model.
Increase the number of layers in your neural network.
Increase the number of layers in your neural network.
Suggested answer: D

Explanation:

Reference: https://towardsdatascience.com/how-to-increase-the-accuracy-of-a-neural-network-9f5d1c6f407d

asked 18/09/2024
István Balla
37 questions

Question 158

Report
Export
Collapse

You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster.

The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?

PigLatin using Pig
PigLatin using Pig
HiveQL using Hive
HiveQL using Hive
Java using MapReduce
Java using MapReduce
Python using MapReduce
Python using MapReduce
Suggested answer: D
asked 18/09/2024
Rolf Johannesen|
28 questions

Question 159

Report
Export
Collapse

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer dat a. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

Increase the CPU size on your server.
Increase the CPU size on your server.
Increase the size of the Google Persistent Disk on your server.
Increase the size of the Google Persistent Disk on your server.
Increase your network bandwidth from your datacenter to GCP.
Increase your network bandwidth from your datacenter to GCP.
Increase your network bandwidth from Compute Engine to Cloud Storage.
Increase your network bandwidth from Compute Engine to Cloud Storage.
Suggested answer: C
asked 18/09/2024
Jeffrey Cayao
36 questions

Question 160

Report
Export
Collapse

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.

What should you do?

Select random samples from the tables using the RAND() function and compare the samples.
Select random samples from the tables using the RAND() function and compare the samples.
Select random samples from the tables using the HASH() function and compare the samples.
Select random samples from the tables using the HASH() function and compare the samples.
Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.
Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.
Create stratified random samples using the OVER() function and compare equivalent samples from each table.
Create stratified random samples using the OVER() function and compare equivalent samples from each table.
Suggested answer: B
asked 18/09/2024
mr yosh
37 questions
Total 377 questions
Go to page: of 38
Search

Related questions