Home / Google / Professional Data Engineer / List of questions

Google Professional Data Engineer Practice Test - Questions Answers, Page 16

Add to Whishlist

List of questions

Question 151

Report

You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 152

Report

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud.

Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?

Become a Premium Member for full access

Unlock Premium Member

Question 153

Report

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud.

You want to support transactions that scale horizontally. You also want to optimize data for range queries on nonkey columns. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 154

Report

Your financial services company is moving to cloud technology and wants to store 50 TB of financial timeseries data in the cloud. This data is updated frequently and new data will be streaming in all the time. Your company also wants to move their existing Apache Hadoop jobs to the cloud to get insights into this data.

Which product should they use to store the data?

Become a Premium Member for full access

Unlock Premium Member

Question 155

Report

An organization maintains a Google BigQuery dataset that contains tables with user-level dat

Become a Premium Member for full access

Unlock Premium Member

Question 156

Report

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of dat

Become a Premium Member for full access

Unlock Premium Member

Question 157

Report

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Become a Premium Member for full access

Unlock Premium Member

Question 158

Report

You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster.

The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?

Become a Premium Member for full access

Unlock Premium Member

Question 159

Report

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer dat a. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

Become a Premium Member for full access

Unlock Premium Member

Question 160

Report

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.

What should you do?

Become a Premium Member for full access

Unlock Premium Member

Total 377 questions

First

Prev

Next

Last

Go to page: of 38

Question 151 (0)

You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quickly, and you have

Question 152 (0)

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate values

Question 153 (0)

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud. You want to support transactions that scale horizontally. You also want to optimize data for r

Question 154 (0)

Your financial services company is moving to cloud technology and wants to store 50 TB of financial timeseries data in the cloud. This data is updated frequently and new data will be streaming in al

Question 155 (0)

An organization maintains a Google BigQuery dataset that contains tables with user-level dat

Question 156 (0)

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of dat

Question 157 (0)

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Question 158 (0)

You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method should you use to

Question 159 (0)

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer dat a. The data are imported to Cloud Storage from your data center through parallel up

Question 160 (0)

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the ori

Open VPLUS File

Convert VPLUS to PDF

Related questions

You are designing a data mesh on Google Cloud by using Dataplex to manage data in BigQuery and Cloud Storage. You want to simplify data asset permissions. You are creating a customer virtual lake with two user groups: * Data engineers, which require lull data lake access * Analytic users, which require access to curated data You need to assign access rights to these two groups. What should you do?

You need to connect multiple applications with dynamic public IP addresses to a Cloud SQL instance. You configured users with strong passwords and enforced the SSL connection to your Cloud SOL instance. You want to use Cloud SQL public IP and ensure that you have secured connections. What should you do?

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

Your company built a TensorFlow neural-network model with a large number of neurons and layers. The model fits well for the training dat a. However, when tested against new data, it performs poorly. What method can you employ to address this?

You are designing the architecture to process your data from Cloud Storage to BigQuery by using Dataflow. The network team provided you with the Shared VPC network and subnetwork to be used by your pipelines. You need to enable the deployment of the pipeline on the Shared VPC network. What should you do?

One of your encryption keys stored in Cloud Key Management Service (Cloud KMS) was exposed. You need to re-encrypt all of your CMEK-protected Cloud Storage data that used that key. and then delete the compromised key. You also want to reduce the risk of objects getting written without customer-managed encryption key (CMEK protection in the future. What should you do?

What are all of the BigQuery operations that Google charges for?

You work for a large real estate firm and are preparing 6 TB of home sales data lo be used for machine learning You will use SOL to transform the data and use BigQuery ML lo create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?

You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS endpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of these duplicate messages?

You have a streaming pipeline that ingests data from Pub/Sub in production. You need to update this streaming pipeline with improved business logic. You need to ensure that the updated pipeline reprocesses the previous two days of delivered Pub/Sub messages. What should you do? Choose 2 answers

Unlock Premium Member Feature for Professional Data Engineer

BASIC - 1 Month

$ 3

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

PLUS - 2 Months

$ 10

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

PRO - 6 Months

$ 20

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

BASIC - 1 Month

$ 3

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

PLUS - 2 Months

$ 10

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

PRO - 6 Months

$ 20

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

How to open VPLUS file?

Convert VPLUS to PDF (DOCX)