ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 26

Question list
Search
Search

List of questions

Search

Related questions











You have uploaded 5 years of log data to Cloud Storage A user reported that some data points in the log data are outside of their expected ranges, which indicates errors You need to address this issue and be able to run the process again in the future while keeping the original data for compliance reasons. What should you do?

A.
Import the data from Cloud Storage into BigQuery Create a new BigQuery table, and skip the rows with errors.
A.
Import the data from Cloud Storage into BigQuery Create a new BigQuery table, and skip the rows with errors.
Answers
B.
Create a Compute Engine instance and create a new copy of the data in Cloud Storage Skip the rows with errors
B.
Create a Compute Engine instance and create a new copy of the data in Cloud Storage Skip the rows with errors
Answers
C.
Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to a new dataset in Cloud Storage
C.
Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to a new dataset in Cloud Storage
Answers
D.
Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to the same dataset in Cloud Storage
D.
Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to the same dataset in Cloud Storage
Answers
Suggested answer: D

An aerospace company uses a proprietary data format to store its night dat a. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible.

What should you do?

A.
Use a standard Dataflow pipeline to store the raw data m BigQuery and then transform the format later when the data is used
A.
Use a standard Dataflow pipeline to store the raw data m BigQuery and then transform the format later when the data is used
Answers
B.
Write a she script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source
B.
Write a she script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source
Answers
C.
Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format
C.
Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format
Answers
D.
Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
D.
Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
Answers
Suggested answer: D

An aerospace company uses a proprietary data format to store its night dat a. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible.

What should you do?

A.
Use a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.
A.
Use a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.
Answers
B.
Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source
B.
Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source
Answers
C.
Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format
C.
Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format
Answers
D.
Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
D.
Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
Answers
Suggested answer: D

You are using BigQuery and Data Studio to design a customer-facing dashboard that displays large quantities of aggregated dat a. You expect a high volume of concurrent users. You need to optimize tie dashboard to provide quick visualizations with minimal latency. What should you do?

A.
Use BigQuery BI Engine with materialized views
A.
Use BigQuery BI Engine with materialized views
Answers
B.
Use BigQuery BI Engine with streaming data.
B.
Use BigQuery BI Engine with streaming data.
Answers
C.
Use BigQuery Bl Engine with authorized views
C.
Use BigQuery Bl Engine with authorized views
Answers
D.
Use BigQuery Bl Engine with logical reviews
D.
Use BigQuery Bl Engine with logical reviews
Answers
Suggested answer: B

You need ads data to serve Al models and historical data tor analytics longtail and outlier data points need to be identified You want to cleanse the data n near-reel time before running it through Al models What should you do?

A.
Use BigQuery to ingest prepare and then analyze the data and then run queries to create views
A.
Use BigQuery to ingest prepare and then analyze the data and then run queries to create views
Answers
B.
Use Cloud Storage as a data warehouse shell scripts tor processing, and BigQuery to create views tor desired datasets
B.
Use Cloud Storage as a data warehouse shell scripts tor processing, and BigQuery to create views tor desired datasets
Answers
C.
Use Dataflow to identity longtail and outber data points programmatically with BigQuery as a sink
C.
Use Dataflow to identity longtail and outber data points programmatically with BigQuery as a sink
Answers
D.
Use Cloud Composer to identify longtail and outlier data points, and then output a usable dataset to BigQuery
D.
Use Cloud Composer to identify longtail and outlier data points, and then output a usable dataset to BigQuery
Answers
Suggested answer: A

The Development and External teams nave the project viewer Identity and Access Management (1AM) role m a folder named Visualization. You want the Development Team to be able to read data from both Cloud Storage and BigQuery, but the External Team should only be able to read data from BigQuery. What should you do?

A.
Remove Cloud Storage IAM permissions to the External Team on the acme-raw-data project
A.
Remove Cloud Storage IAM permissions to the External Team on the acme-raw-data project
Answers
B.
Create Virtual Private Cloud (VPC) firewall rules on the acme-raw-data protect that deny all Ingress traffic from the External Team CIDR range
B.
Create Virtual Private Cloud (VPC) firewall rules on the acme-raw-data protect that deny all Ingress traffic from the External Team CIDR range
Answers
C.
Create a VPC Service Controls perimeter containing both protects and BigQuery as a restricted API Add the External Team users to the perimeter s Access Level
C.
Create a VPC Service Controls perimeter containing both protects and BigQuery as a restricted API Add the External Team users to the perimeter s Access Level
Answers
D.
Create a VPC Service Controls perimeter containing both protects and Cloud Storage as a restricted API. Add the Development Team users to the perimeter's Access Level
D.
Create a VPC Service Controls perimeter containing both protects and Cloud Storage as a restricted API. Add the Development Team users to the perimeter's Access Level
Answers
Suggested answer: C

A TensorFlow machine learning model on Compute Engine virtual machines (n2-standard -32) takes two days to complete framing. The model has custom TensorFlow operations that must run partially on a CPU You want to reduce the training time in a cost-effective manner. What should you do?

A.
Change the VM type to n2-highmem-32
A.
Change the VM type to n2-highmem-32
Answers
B.
Change the VM type to e2 standard-32
B.
Change the VM type to e2 standard-32
Answers
C.
Train the model using a VM with a GPU hardware accelerator
C.
Train the model using a VM with a GPU hardware accelerator
Answers
D.
Train the model using a VM with a TPU hardware accelerator
D.
Train the model using a VM with a TPU hardware accelerator
Answers
Suggested answer: C

An online brokerage company requires a high volume trade processing architecture. You need to create a secure queuing system that triggers jobs. The jobs will run in Google Cloud and cat the company's Python API to execute trades. You need to efficiently implement a solution. What should you do?

A.
Use Cloud Composer to subscribe to a Pub/Sub tope and can the Python API.
A.
Use Cloud Composer to subscribe to a Pub/Sub tope and can the Python API.
Answers
B.
Use a Pub/Sub push subscription to trigger a Cloud Function to pass the data to tie Python API.
B.
Use a Pub/Sub push subscription to trigger a Cloud Function to pass the data to tie Python API.
Answers
C.
Write an application that makes a queue in a NoSQL database
C.
Write an application that makes a queue in a NoSQL database
Answers
D.
Write an application hosted on a Compute Engine instance that makes a push subscription to the Pub/Sub topic
D.
Write an application hosted on a Compute Engine instance that makes a push subscription to the Pub/Sub topic
Answers
Suggested answer: C

You are designing a pipeline that publishes application events to a Pub/Sub topic. You need to aggregate events across hourly intervals before loading the results to BigQuery for analysis. Your solution must be scalable so it can process and load large volumes of events to BigQuery. What should you do?

A.
Create a streaming Dataflow job to continually read from the Pub/Sub topic and perform the necessary aggregations using tumbling windows
A.
Create a streaming Dataflow job to continually read from the Pub/Sub topic and perform the necessary aggregations using tumbling windows
Answers
B.
Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub-Sub topic and performing the necessary aggregations
B.
Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub-Sub topic and performing the necessary aggregations
Answers
C.
Schedule a Cloud Function to run hourly, pulling all avertable messages from the Pub/Sub topic and performing the necessary aggregations
C.
Schedule a Cloud Function to run hourly, pulling all avertable messages from the Pub/Sub topic and performing the necessary aggregations
Answers
D.
Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.
D.
Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.
Answers
Suggested answer: A

Your company is migrating its on-premises data warehousing solution to BigQuery. The existing data warehouse uses trigger-based change data capture (CDC) to apply daily updates from transactional database sources Your company wants to use BigQuery to improve its handling of CDC and to optimize the performance of the data warehouse Source system changes must be available for query m near-real time using tog-based CDC streams You need to ensure that changes in the BigQuery reporting table are available with minimal latency and reduced overhead. What should you do?

Choose 2 answers

A.
Perform a DML INSERT UPDATE, or DELETE to replicate each CDC record in the reporting table m real time.
A.
Perform a DML INSERT UPDATE, or DELETE to replicate each CDC record in the reporting table m real time.
Answers
B.
Periodically DELETE outdated records from the reporting tablePeriodically use a DML MERGE to simultaneously perform DML INSERT. UPDATE, and DELETE operations in the reporting table
B.
Periodically DELETE outdated records from the reporting tablePeriodically use a DML MERGE to simultaneously perform DML INSERT. UPDATE, and DELETE operations in the reporting table
Answers
C.
Insert each new CDC record and corresponding operation type into a staging table in real time
C.
Insert each new CDC record and corresponding operation type into a staging table in real time
Answers
D.
Insert each new CDC record and corresponding operation type into the reporting table in real time and use a materialized view to expose only the current version of each unique record.
D.
Insert each new CDC record and corresponding operation type into the reporting table in real time and use a materialized view to expose only the current version of each unique record.
Answers
Suggested answer: B, D
Total 372 questions
Go to page: of 38