Home / Google / Professional Data Engineer / List of questions

Google Professional Data Engineer Practice Test - Questions Answers, Page 35

Add to Whishlist

List of questions

Question 341

Report

You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and SQL. Security policies need to be enforced at the column level. You need a cost-effective solution that can scale into a data mesh. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 342

Report

You are administering shared BigQuery datasets that contain views used by multiple teams in your organization. The marketing team is concerned about the variability of their monthly BigQuery analytics spend using the on-demand billing model. You need to help the marketing team establish a consistent BigQuery analytics spend each month. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 343

Report

You need to create a SQL pipeline. The pipeline runs an aggregate SOL transformation on a BigQuery table every two hours and appends the result to another existing BigQuery table. You need to configure the pipeline to retry if errors occur. You want the pipeline to send an email notification after three consecutive failures. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 344

Report

You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 345

Report

Your company operates in three domains: airlines, hotels, and ride-hailing services. Each domain has two teams: analytics and data science, which create data assets in BigQuery with the help of a central data platform team. However, as each domain is evolving rapidly, the central data platform team is becoming a bottleneck. This is causing delays in deriving insights from data, and resulting in stale data when pipelines are not kept up to date. You need to design a data mesh architecture by using Dataplex to eliminate the bottleneck. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 346

Report

You have important legal hold documents in a Cloud Storage bucket. You need to ensure that these documents are not deleted or modified. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 347

Report

You are running your BigQuery project in the on-demand billing model and are executing a change data capture (CDC) process that ingests data. The CDC process loads 1 GB of data every 10 minutes into a temporary table, and then performs a merge into a 10 TB target table. This process is very scan intensive and you want to explore options to enable a predictable cost model. You need to create a BigQuery reservation based on utilization information gathered from BigQuery Monitoring and apply the reservation to the CDC process. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 348

Report

A web server sends click events to a Pub/Sub topic as messages. The web server includes an event Timestamp attribute in the messages, which is the time when the click occurred. You have a Dataflow streaming job that reads from this Pub/Sub topic through a subscription, applies some transformations, and writes the result to another Pub/Sub topic for use by the advertising department. The advertising department needs to receive each message within 30 seconds of the corresponding click occurrence, but they report receiving the messages late. Your Dataflow job's system lag is about 5 seconds, and the data freshness is about 40 seconds. Inspecting a few messages show no more than 1 second lag between their event Timestamp and publish Time. What is the problem and what should you do?

Become a Premium Member for full access

Unlock Premium Member

Question 349

Report

You have a BigQuery dataset named 'customers'. All tables will be tagged by using a Data Catalog tag template named 'gdpr'. The template contains one mandatory field, 'has sensitive data~. with a boolean value. All employees must be able to do a simple search and find tables in the dataset that have either true or false in the 'has sensitive data' field. However, only the Human Resources (HR) group should be able to see the data inside the tables for which 'hass-ensitive-data' is true. You give the all employees group the bigquery.metadataViewer and bigquery.connectionUser roles on the dataset. You want to minimize configuration overhead. What should you do next?

Become a Premium Member for full access

Unlock Premium Member

Question 350

Report

You are architecting a data transformation solution for BigQuery. Your developers are proficient with SOL and want to use the ELT development technique. In addition, your developers need an intuitive coding environment and the ability to manage SQL as code. You need to identify a solution for your developers to build these pipelines. What should you do?

Become a Premium Member for full access

Unlock Premium Member

Total 377 questions

First

Prev

Next

Last

Go to page: of 38

Question 341 (0)

You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and SQL. Security polic

Question 342 (0)

You are administering shared BigQuery datasets that contain views used by multiple teams in your organization. The marketing team is concerned about the variability of their monthly BigQuery analyti

Question 343 (0)

You need to create a SQL pipeline. The pipeline runs an aggregate SOL transformation on a BigQuery table every two hours and appends the result to another existing BigQuery table. You need to config

Question 344 (0)

You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mas

Question 345 (0)

Your company operates in three domains: airlines, hotels, and ride-hailing services. Each domain has two teams: analytics and data science, which create data assets in BigQuery with the help of a ce

Question 346 (0)

You have important legal hold documents in a Cloud Storage bucket. You need to ensure that these documents are not deleted or modified. What should you do?

Question 347 (0)

You are running your BigQuery project in the on-demand billing model and are executing a change data capture (CDC) process that ingests data. The CDC process loads 1 GB of data every 10 minutes into

Question 348 (0)

A web server sends click events to a Pub/Sub topic as messages. The web server includes an event Timestamp attribute in the messages, which is the time when the click occurred. You have a Dataflow s

Question 349 (0)

You have a BigQuery dataset named 'customers'. All tables will be tagged by using a Data Catalog tag template named 'gdpr'. The template contains one mandatory field, 'has sensitive data~. with a bo

Question 350 (0)

You are architecting a data transformation solution for BigQuery. Your developers are proficient with SOL and want to use the ELT development technique. In addition, your developers need an intuitiv

Open VPLUS File

Convert VPLUS to PDF

Related questions

You are designing a data mesh on Google Cloud by using Dataplex to manage data in BigQuery and Cloud Storage. You want to simplify data asset permissions. You are creating a customer virtual lake with two user groups: * Data engineers, which require lull data lake access * Analytic users, which require access to curated data You need to assign access rights to these two groups. What should you do?

You need to connect multiple applications with dynamic public IP addresses to a Cloud SQL instance. You configured users with strong passwords and enforced the SSL connection to your Cloud SOL instance. You want to use Cloud SQL public IP and ensure that you have secured connections. What should you do?

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

Your company built a TensorFlow neural-network model with a large number of neurons and layers. The model fits well for the training dat a. However, when tested against new data, it performs poorly. What method can you employ to address this?

You are designing the architecture to process your data from Cloud Storage to BigQuery by using Dataflow. The network team provided you with the Shared VPC network and subnetwork to be used by your pipelines. You need to enable the deployment of the pipeline on the Shared VPC network. What should you do?

One of your encryption keys stored in Cloud Key Management Service (Cloud KMS) was exposed. You need to re-encrypt all of your CMEK-protected Cloud Storage data that used that key. and then delete the compromised key. You also want to reduce the risk of objects getting written without customer-managed encryption key (CMEK protection in the future. What should you do?

What are all of the BigQuery operations that Google charges for?

You work for a large real estate firm and are preparing 6 TB of home sales data lo be used for machine learning You will use SOL to transform the data and use BigQuery ML lo create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?

You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS endpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of these duplicate messages?

You have a streaming pipeline that ingests data from Pub/Sub in production. You need to update this streaming pipeline with improved business logic. You need to ensure that the updated pipeline reprocesses the previous two days of delivered Pub/Sub messages. What should you do? Choose 2 answers

Unlock Premium Member Feature for Professional Data Engineer

BASIC - 1 Month

$ 3

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

PLUS - 2 Months

$ 10

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

PRO - 6 Months

$ 20

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

BASIC - 1 Month

$ 3

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

PLUS - 2 Months

$ 10

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

PRO - 6 Months

$ 20

Full latest questions

No CAPTCHA

New Updates

Export to VPLUS

Export to PDF - All questions

Team support

How to open VPLUS file?

Convert VPLUS to PDF (DOCX)