ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 33

Question list
Search
Search

List of questions

Search

Related questions











Your organization has two Google Cloud projects, project A and project B. In project A, you have a Pub/Sub topic that receives data from confidential sources. Only the resources in project A should be able to access the data in that topic. You want to ensure that project B and any future project cannot access data in the project A topic. What should you do?

A.
Configure VPC Service Controls in the organization with a perimeter around the VPC of project A.
A.
Configure VPC Service Controls in the organization with a perimeter around the VPC of project A.
Answers
B.
Add firewall rules in project A so only traffic from the VPC in project A is permitted.
B.
Add firewall rules in project A so only traffic from the VPC in project A is permitted.
Answers
C.
Configure VPC Service Controls in the organization with a perimeter around project A.
C.
Configure VPC Service Controls in the organization with a perimeter around project A.
Answers
D.
Use Identity and Access Management conditions to ensure that only users and service accounts in project A can access resources in project.
D.
Use Identity and Access Management conditions to ensure that only users and service accounts in project A can access resources in project.
Answers
Suggested answer: C

Explanation:


You are administering a BigQuery dataset that uses a customer-managed encryption key (CMEK). You need to share the dataset with a partner organization that does not have access to your CMEK. What should you do?

A.
Create an authorized view that contains the CMEK to decrypt the data when accessed.
A.
Create an authorized view that contains the CMEK to decrypt the data when accessed.
Answers
B.
Provide the partner organization a copy of your CMEKs to decrypt the data.
B.
Provide the partner organization a copy of your CMEKs to decrypt the data.
Answers
C.
Copy the tables you need to share to a dataset without CMEKs Create an Analytics Hub listing for this dataset.
C.
Copy the tables you need to share to a dataset without CMEKs Create an Analytics Hub listing for this dataset.
Answers
D.
Export the tables to parquet files to a Cloud Storage bucket and grant the storageinsights. viewer role on the bucket to the partner organization.
D.
Export the tables to parquet files to a Cloud Storage bucket and grant the storageinsights. viewer role on the bucket to the partner organization.
Answers
Suggested answer: C

Explanation:

If you want to share a BigQuery dataset that uses a customer-managed encryption key (CMEK) with a partner organization that does not have access to your CMEK, you cannot use an authorized view or provide them a copy of your CMEK, because these options would violate the security and privacy of your data. Instead, you can copy the tables you need to share to a dataset without CMEKs, and then create an Analytics Hub listing for this dataset. Analytics Hub is a service that allows you to securely share and discover data assets across your organization and with external partners. By creating an Analytics Hub listing, you can grant the partner organization access to the copied dataset without CMEKs, and also control the level of access and the duration of the sharing.Reference:

Customer-managed Cloud KMS keys

[Authorized views]

[Analytics Hub overview]

[Creating an Analytics Hub listing]

You are designing a data mesh on Google Cloud with multiple distinct data engineering teams building data products. The typical data curation design pattern consists of landing files in Cloud Storage, transforming raw data in Cloud Storage and BigQuery datasets. and storing the final curated data product in BigQuery datasets You need to configure Dataplex to ensure that each team can access only the assets needed to build their data products. You also need to ensure that teams can easily share the curated data product. What should you do?

A.
1 Create a single Dataplex virtual lake and create a single zone to contain landing, raw. and curated data. 2 Provide each data engineering team access to the virtual lake.
A.
1 Create a single Dataplex virtual lake and create a single zone to contain landing, raw. and curated data. 2 Provide each data engineering team access to the virtual lake.
Answers
B.
1 Create a single Dataplex virtual lake and create a single zone to contain landing, raw. and curated data. 2 Build separate assets for each data product within the zone. 3. Assign permissions to the data engineering teams at the zone level.
B.
1 Create a single Dataplex virtual lake and create a single zone to contain landing, raw. and curated data. 2 Build separate assets for each data product within the zone. 3. Assign permissions to the data engineering teams at the zone level.
Answers
C.
1 Create a Dataplex virtual lake for each data product, and create a single zone to contain landing, raw, and curated data. 2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.
C.
1 Create a Dataplex virtual lake for each data product, and create a single zone to contain landing, raw, and curated data. 2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.
Answers
D.
1 Create a Dataplex virtual lake for each data product, and create multiple zones for landing, raw. and curated data. 2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.
D.
1 Create a Dataplex virtual lake for each data product, and create multiple zones for landing, raw. and curated data. 2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.
Answers
Suggested answer: D

Explanation:

This option is the best way to configure Dataplex for a data mesh architecture, as it allows each data engineering team to have full ownership and control over their data products, while also enabling easy discovery and sharing of the curated data across the organization12.By creating a Dataplex virtual lake for each data product, you can isolate the data assets and resources for each domain, and avoid conflicts and dependencies between different teams3.By creating multiple zones for landing, raw, and curated data, you can enforce different security and governance policies for each stage of the data curation process, and ensure that only authorized users can access the data assets45. By providing the data engineering teams with full access to the virtual lake assigned to their data product, you can empower them to manage and monitor their data products, and leverage the Dataplex features such as tagging, quality, and lineage.

Option A is not suitable, as it creates a single point of failure and a bottleneck for the data mesh, and does not allow for fine-grained access control and governance for different data products2.Option B is also not suitable, as it does not isolate the data assets and resources for each data product, and assigns permissions at the zone level, which may not reflect the different roles and responsibilities of the data engineering teams34.Option C is better than option A and B, but it does not create multiple zones for landing, raw, and curated data, which may compromise the security and quality of the data products5.Reference:

1: Building a data mesh on Google Cloud using BigQuery and Dataplex | Google Cloud Blog

2: Data Mesh - 7 Effective Practices to Get Started - Confluent

3: Best practices | Dataplex | Google Cloud

4: Secure your lake | Dataplex | Google Cloud

5: Zones | Dataplex | Google Cloud

[6]: Managing a Data Mesh with Dataplex -- ROI Training

You are on the data governance team and are implementing security requirements to deploy resources. You need to ensure that resources are limited to only the europe-west 3 region You want to follow Google-recommended practices What should you do?

A.
Deploy resources with Terraform and implement a variable validation rule to ensure that the region is set to the europe-west3 region for all resources.
A.
Deploy resources with Terraform and implement a variable validation rule to ensure that the region is set to the europe-west3 region for all resources.
Answers
B.
Set the constraints/gcp. resourceLocations organization policy constraint to in:eu-locations.
B.
Set the constraints/gcp. resourceLocations organization policy constraint to in:eu-locations.
Answers
C.
Create a Cloud Function to monitor all resources created and automatically destroy the ones created outside the europe-west3 region.
C.
Create a Cloud Function to monitor all resources created and automatically destroy the ones created outside the europe-west3 region.
Answers
D.
Set the constraints/gcp. resourceLocations organization policy constraint to in: europe-west3-locations.
D.
Set the constraints/gcp. resourceLocations organization policy constraint to in: europe-west3-locations.
Answers
Suggested answer: D

Explanation:

To ensure that resources are limited to only the europe-west3 region, you should set the organization policy constraintconstraints/gcp.resourceLocationstoin:europe-west3-locations. This policy restricts the deployment of resources to the specified locations, which in this case is the europe-west3 region. By setting this policy, you enforce location compliance across your Google Cloud resources, aligning with the best practices for data governance and regulatory compliance.

Professional Data Engineer Certification Exam Guide | Learn - Google Cloud1.

Preparing for Google Cloud Certification: Cloud Data Engineer2.

Professional Data Engineer Certification | Learn | Google Cloud3.

3:Professional Data Engineer Certification | Learn | Google Cloud2:Preparing for Google Cloud Certification: Cloud Data Engineer1:Professional Data Engineer Certification Exam Guide | Learn - Google Cloud

You have a BigQuery table that contains customer data, including sensitive information such as names and addresses. You need to share the customer data with your data analytics and consumer support teams securely. The data analytics team needs to access the data of all the customers, but must not be able to access the sensitive data. The consumer support team needs access to all data columns, but must not be able to access customers that no longer have active contracts. You enforced these requirements by using an authorized dataset and policy tags After implementing these steps, the data analytics team reports that they still have access to the sensitive columns. You need to ensure that the data analytics team does not have access to restricted data What should you do?

Choose 2 answers

A.
Create two separate authorized datasets; one for the data analytics team and another for the consumer support team.
A.
Create two separate authorized datasets; one for the data analytics team and another for the consumer support team.
Answers
B.
Ensure that the data analytics team members do not have the Data Catalog Fine-Grained Reader role for the policy tags.
B.
Ensure that the data analytics team members do not have the Data Catalog Fine-Grained Reader role for the policy tags.
Answers
C.
Enforce access control in the policy tag taxonomy.
C.
Enforce access control in the policy tag taxonomy.
Answers
D.
Remove the bigquery. dataViewer role from the data analytics team on the authorized datasets.
D.
Remove the bigquery. dataViewer role from the data analytics team on the authorized datasets.
Answers
E.
Replace the authorized dataset with an authorized view Use row-level security and apply filter_ expression to limit data access.
E.
Replace the authorized dataset with an authorized view Use row-level security and apply filter_ expression to limit data access.
Answers
Suggested answer: B, C

Explanation:

To ensure that the data analytics team does not have access to sensitive columns, you should:

B) Ensure that the data analytics team members do not have the Data Catalog Fine-Grained Reader role for the policy tags.This role allows users to read metadata for data assets that have policy tags applied, which could include sensitive information.

C) Enforce access control in the policy tag taxonomy.By setting access control at the policy tag level, you can restrict access to specific columns within a dataset, ensuring that only authorized users can view sensitive data.

You are building a streaming Dataflow pipeline that ingests noise level data from hundreds of sensors placed near construction sites across a city. The sensors measure noise level every ten seconds, and send that data to the pipeline when levels reach above 70 dBA. You need to detect the average noise level from a sensor when data is received for a duration of more than 30 minutes, but the window ends when no data has been received for 15 minutes What should you do?


A.
Use session windows with a 30-mmute gap duration.
A.
Use session windows with a 30-mmute gap duration.
Answers
B.
Use tumbling windows with a 15-mmute window and a fifteen-minute. withAllowedLateness operator.
B.
Use tumbling windows with a 15-mmute window and a fifteen-minute. withAllowedLateness operator.
Answers
C.
Use session windows with a 15-minute gap duration.
C.
Use session windows with a 15-minute gap duration.
Answers
D.
Use hopping windows with a 15-mmute window, and a thirty-minute period.
D.
Use hopping windows with a 15-mmute window, and a thirty-minute period.
Answers
Suggested answer: B

Explanation:

Session windows are dynamic windows that group elements based on the periods of activity. They are useful for streaming data that is irregularly distributed with respect to time. In this case, the noise level data from the sensors is only sent when it exceeds a certain threshold, and the duration of the noise events may vary. Therefore, session windows can capture the average noise level for each sensor during the periods of high noise, and end the window when there is no data for a specified gap duration. The gap duration should be 15 minutes, as the requirement is to end the window when no data has been received for 15 minutes. A 30-minute gap duration would be too long and may miss some noise events that are shorter than 30 minutes. Tumbling windows and hopping windows are fixed windows that group elements based on a fixed time interval. They are not suitable for this use case, as they may split or overlap the noise events from the sensors, and do not account for the periods of inactivity.Reference:

Windowing concepts

Session windows

Windowing in Dataflow

You have a BigQuery table that ingests data directly from a Pub/Sub subscription. The ingested data is encrypted with a Google-managed encryption key. You need to meet a new organization policy that requires you to use keys from a centralized Cloud Key Management Service (Cloud KMS) project to encrypt data at rest. What should you do?

A.
Create a new BigOuory table by using customer-managed encryption keys (CMEK), and migrate the data from the old BigQuery table.
A.
Create a new BigOuory table by using customer-managed encryption keys (CMEK), and migrate the data from the old BigQuery table.
Answers
B.
Create a new BigOuery table and Pub/Sub topic by using customer-managed encryption keys (CMEK), and migrate the data from the old Bigauery table.
B.
Create a new BigOuery table and Pub/Sub topic by using customer-managed encryption keys (CMEK), and migrate the data from the old Bigauery table.
Answers
C.
Create a new Pub/Sub topic with CMEK and use the existing BigQuery table by using Google-managed encryption key.
C.
Create a new Pub/Sub topic with CMEK and use the existing BigQuery table by using Google-managed encryption key.
Answers
D.
Use Cloud KMS encryption key with Dataflow to ingest the existing Pub/Sub subscription to the existing BigQuery table.
D.
Use Cloud KMS encryption key with Dataflow to ingest the existing Pub/Sub subscription to the existing BigQuery table.
Answers
Suggested answer: A

Explanation:

To use CMEK for BigQuery, you need to create a key ring and a key in Cloud KMS, and then specify the key resource name when creating or updating a BigQuery table. You cannot change the encryption type of an existing table, so you need to create a new table with CMEK and copy the data from the old table with Google-managed encryption key.

Customer-managed Cloud KMS keys | BigQuery | Google Cloud

Creating and managing encryption keys | Cloud KMS Documentation | Google Cloud

You are designing a fault-tolerant architecture to store data in a regional BigOuery dataset. You need to ensure that your application is able to recover from a corruption event in your tables that occurred within the past seven days. You want to adopt managed services with the lowest RPO and most cost-effective solution. What should you do?

A.
Export the data from BigQuery into a new table that excludes the corrupted data.
A.
Export the data from BigQuery into a new table that excludes the corrupted data.
Answers
B.
Migrate your data to multi-region BigQuery buckets.
B.
Migrate your data to multi-region BigQuery buckets.
Answers
C.
Access historical data by using time travel in BigQuery.
C.
Access historical data by using time travel in BigQuery.
Answers
D.
Create a BigQuery table snapshot on a daily basis.
D.
Create a BigQuery table snapshot on a daily basis.
Answers
Suggested answer: C

Explanation:

Time travel is a feature of BigQuery that allows you to query and recover data from any point within the past seven days. You can use the FOR SYSTEM_TIME AS OF clause in your SQL query to specify the timestamp of the data you want to access. This way, you can restore your tables to a previous state before the corruption event occurred. Time travel is automatically enabled for all datasets and does not incur any additional cost or storage.

Data retention with time travel and fail-safe | BigQuery | Google Cloud

BigQuery Time Travel: How to access Historical Data? | Easy Steps

You are developing an Apache Beam pipeline to extract data from a Cloud SQL instance by using JdbclO. You have two projects running in Google Cloud. The pipeline will be deployed and executed on Dataflow in Project A. The Cloud SQL instance is running jn Project B and does not have a public IP address. After deploying the pipeline, you noticed that the pipeline failed to extract data from the Cloud SQL instance due to connection failure. You verified that VPC Service Controls and shared VPC are not in use in these projects. You want to resolve this error while ensuring that the data does not go through the public internet. What should you do?

A.
Set up VPC Network Peering between Project A and Project B. Add a firewall rule to allow the peered subnet range to access all instances on the network.
A.
Set up VPC Network Peering between Project A and Project B. Add a firewall rule to allow the peered subnet range to access all instances on the network.
Answers
B.
Turn off the external IP addresses on the Dataflow worker. Enable Cloud NAT in Project A.
B.
Turn off the external IP addresses on the Dataflow worker. Enable Cloud NAT in Project A.
Answers
C.
Set up VPC Network Peering between Project A and Project B. Create a Compute Engine instance without external IP address in Project B on the peered subnet to serve as a proxy server to the Cloud SQL database.
C.
Set up VPC Network Peering between Project A and Project B. Create a Compute Engine instance without external IP address in Project B on the peered subnet to serve as a proxy server to the Cloud SQL database.
Answers
D.
Add the external IP addresses of the Dataflow worker as authorized networks in the Cloud SOL instance.
D.
Add the external IP addresses of the Dataflow worker as authorized networks in the Cloud SOL instance.
Answers
Suggested answer: B

Explanation:

Option A is incorrect because VPC Network Peering alone does not enable connectivity to Cloud SQL instances with private IP addresses.You also need to configure private services access and allocate an IP address range for the service producer network1.

Option B is incorrect because Cloud NAT does not support Cloud SQL instances with private IP addresses.Cloud NAT only provides outbound connectivity for resources that do not have public IP addresses, such as VMs, GKE clusters, and serverless instances2.

Option C is correct because it allows you to use a Compute Engine instance as a proxy server to connect to the Cloud SQL database over the peered network. The proxy server does not need an external IP address because it can communicate with the Dataflow workers and the Cloud SQL instance using internal IP addresses. You need to install the Cloud SQL Auth proxy on the proxy server and configure it to use a service account that has the Cloud SQL Client role.

Option D is incorrect because it requires you to assign public IP addresses to the Dataflow workers, which exposes the data to the public internet. This violates the requirement of ensuring that the data does not go through the public internet. Moreover, adding authorized networks does not work for Cloud SQL instances with private IP addresses.

You are designing a data warehouse in BigQuery to analyze sales data for a telecommunication service provider. You need to create a data model for customers, products, and subscriptions All customers, products, and subscriptions can be updated monthly, but you must maintain a historical record of all data. You plan to use the visualization layer for current and historical reporting. You need to ensure that the data model is simple, easy-to-use. and cost-effective. What should you do?

A.
Create a normalized model with tables for each entity. Use snapshots before updates to track historical data
A.
Create a normalized model with tables for each entity. Use snapshots before updates to track historical data
Answers
B.
Create a normalized model with tables for each entity. Keep all input files in a Cloud Storage bucket to track historical data
B.
Create a normalized model with tables for each entity. Keep all input files in a Cloud Storage bucket to track historical data
Answers
C.
Create a denormalized model with nested and repeated fields Update the table and use snapshots to track historical data
C.
Create a denormalized model with nested and repeated fields Update the table and use snapshots to track historical data
Answers
D.
Create a denormalized, append-only model with nested and repeated fields Use the ingestion timestamp to track historical data.
D.
Create a denormalized, append-only model with nested and repeated fields Use the ingestion timestamp to track historical data.
Answers
Suggested answer: D

Explanation:

- A denormalized, append-only model simplifies query complexity by eliminating the need for joins. - Adding data with an ingestion timestamp allows for easy retrieval of both current and historical states. - Instead of updating records, new records are appended, which maintains historical information without the need to create separate snapshots.

Total 372 questions
Go to page: of 38