ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 20

Question list
Search
Search

List of questions

Search

Related questions











You use a dataset in BigQuery for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure that the data is current.

Which solution should you choose?

A.
Create an authorized view on the BigQuery table to control data access, and provide third-party companies with access to that view.
A.
Create an authorized view on the BigQuery table to control data access, and provide third-party companies with access to that view.
Answers
B.
Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide thirdparty companies with access to the bucket.
B.
Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide thirdparty companies with access to the bucket.
Answers
C.
Create a separate dataset in BigQuery that contains the relevant data to share, and provide thirdparty companies with access to the new dataset.
C.
Create a separate dataset in BigQuery that contains the relevant data to share, and provide thirdparty companies with access to the new dataset.
Answers
D.
Create a Cloud Dataflow job that reads the data in frequent time intervals, and writes it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.
D.
Create a Cloud Dataflow job that reads the data in frequent time intervals, and writes it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.
Answers
Suggested answer: B

A shipping company has live package-tracking data that is sent to an Apache Kafka stream in real time. This is then loaded into BigQuery. Analysts in your company want to query the tracking data in BigQuery to analyze geospatial trends in the lifecycle of a package. The table was originally created with ingest-date partitioning. Over time, the query processing time has increased. You need to implement a change that would improve query performance in BigQuery. What should you do?

A.
Implement clustering in BigQuery on the ingest date column.
A.
Implement clustering in BigQuery on the ingest date column.
Answers
B.
Implement clustering in BigQuery on the package-tracking ID column.
B.
Implement clustering in BigQuery on the package-tracking ID column.
Answers
C.
Tier older data onto Cloud Storage files, and leverage extended tables.
C.
Tier older data onto Cloud Storage files, and leverage extended tables.
Answers
D.
Re-create the table using data partitioning on the package delivery date.
D.
Re-create the table using data partitioning on the package delivery date.
Answers
Suggested answer: A

You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once, and must be ordered within windows of 1 hour. How should you design the solution?

A.
Use Apache Kafka for message ingestion and use Cloud Dataproc for streaming analysis.
A.
Use Apache Kafka for message ingestion and use Cloud Dataproc for streaming analysis.
Answers
B.
Use Apache Kafka for message ingestion and use Cloud Dataflow for streaming analysis.
B.
Use Apache Kafka for message ingestion and use Cloud Dataflow for streaming analysis.
Answers
C.
Use Cloud Pub/Sub for message ingestion and Cloud Dataproc for streaming analysis.
C.
Use Cloud Pub/Sub for message ingestion and Cloud Dataproc for streaming analysis.
Answers
D.
Use Cloud Pub/Sub for message ingestion and Cloud Dataflow for streaming analysis.
D.
Use Cloud Pub/Sub for message ingestion and Cloud Dataflow for streaming analysis.
Answers
Suggested answer: D

You need to set access to BigQuery for different departments within your company. Your solution should comply with the following requirements:

Each department should have access only to their data.

Each department will have one or more leads who need to be able to create and update tables and provide them to their team.

Each department has data analysts who need to be able to query but not modify data.

How should you set access to the data in BigQuery?

A.
Create a dataset for each department. Assign the department leads the role of OWNER, and assign the data analysts the role of WRITER on their dataset.
A.
Create a dataset for each department. Assign the department leads the role of OWNER, and assign the data analysts the role of WRITER on their dataset.
Answers
B.
Create a dataset for each department. Assign the department leads the role of WRITER, and assign the data analysts the role of READER on their dataset.
B.
Create a dataset for each department. Assign the department leads the role of WRITER, and assign the data analysts the role of READER on their dataset.
Answers
C.
Create a table for each department. Assign the department leads the role of Owner, and assign the data analysts the role of Editor on the project the table is in.
C.
Create a table for each department. Assign the department leads the role of Owner, and assign the data analysts the role of Editor on the project the table is in.
Answers
D.
Create a table for each department. Assign the department leads the role of Editor, and assign the data analysts the role of Viewer on the project the table is in.
D.
Create a table for each department. Assign the department leads the role of Editor, and assign the data analysts the role of Viewer on the project the table is in.
Answers
Suggested answer: D

You operate a database that stores stock trades and an application that retrieves average stock price for a given company over an adjustable window of time. The data is stored in Cloud Bigtable where the datetime of the stock trade is the beginning of the row key. Your application has thousands of concurrent users, and you notice that performance is starting to degrade as more stocks are added.

What should you do to improve the performance of your application?

A.
Change the row key syntax in your Cloud Bigtable table to begin with the stock symbol.
A.
Change the row key syntax in your Cloud Bigtable table to begin with the stock symbol.
Answers
B.
Change the row key syntax in your Cloud Bigtable table to begin with a random number per second.
B.
Change the row key syntax in your Cloud Bigtable table to begin with a random number per second.
Answers
C.
Change the data pipeline to use BigQuery for storing stock trades, and update your application.
C.
Change the data pipeline to use BigQuery for storing stock trades, and update your application.
Answers
D.
Use Cloud Dataflow to write summary of each day's stock trades to an Avro file on Cloud Storage.Update your application to read from Cloud Storage and Cloud Bigtable to compute the responses.
D.
Use Cloud Dataflow to write summary of each day's stock trades to an Avro file on Cloud Storage.Update your application to read from Cloud Storage and Cloud Bigtable to compute the responses.
Answers
Suggested answer: A

You are operating a Cloud Dataflow streaming pipeline. The pipeline aggregates events from a Cloud Pub/Sub subscription source, within a window, and sinks the resulting aggregation to a Cloud Storage bucket. The source has consistent throughput. You want to monitor an alert on behavior of the pipeline with Cloud Stackdriver to ensure that it is processing dat a. Which Stackdriver alerts should you create?

A.
An alert based on a decrease of subscription/num_undelivered_messages for the source and a rate of change increase of instance/storage/used_bytes for the destination
A.
An alert based on a decrease of subscription/num_undelivered_messages for the source and a rate of change increase of instance/storage/used_bytes for the destination
Answers
B.
An alert based on an increase of subscription/num_undelivered_messages for the source and a rate of change decrease of instance/storage/used_bytes for the destination
B.
An alert based on an increase of subscription/num_undelivered_messages for the source and a rate of change decrease of instance/storage/used_bytes for the destination
Answers
C.
An alert based on a decrease of instance/storage/used_bytes for the source and a rate of change increase of subscription/num_undelivered_messages for the destination
C.
An alert based on a decrease of instance/storage/used_bytes for the source and a rate of change increase of subscription/num_undelivered_messages for the destination
Answers
D.
An alert based on an increase of instance/storage/used_bytes for the source and a rate of change decrease of subscription/num_undelivered_messages for the destination
D.
An alert based on an increase of instance/storage/used_bytes for the source and a rate of change decrease of subscription/num_undelivered_messages for the destination
Answers
Suggested answer: B

You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting messages from IoT devices globally. Because large parts of globe have poor internet connectivity, messages sometimes batch at the edge, come in all at once, and cause a spike in load on your Kafka cluster. This is becoming difficult to manage and prohibitively expensive. What is the Google-recommended cloud native architecture for this scenario?

A.
Edge TPUs as sensor devices for storing and transmitting the messages.
A.
Edge TPUs as sensor devices for storing and transmitting the messages.
Answers
B.
Cloud Dataflow connected to the Kafka cluster to scale the processing of incoming messages.
B.
Cloud Dataflow connected to the Kafka cluster to scale the processing of incoming messages.
Answers
C.
An IoT gateway connected to Cloud Pub/Sub, with Cloud Dataflow to read and process the messages from Cloud Pub/Sub.
C.
An IoT gateway connected to Cloud Pub/Sub, with Cloud Dataflow to read and process the messages from Cloud Pub/Sub.
Answers
D.
A Kafka cluster virtualized on Compute Engine in us-east with Cloud Load Balancing to connect to the devices around the world.
D.
A Kafka cluster virtualized on Compute Engine in us-east with Cloud Load Balancing to connect to the devices around the world.
Answers
Suggested answer: C

You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a different environment. You want to archive these snapshots for a long time. Which two methods can accomplish this?

Choose 2 answers.

A.
Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.
A.
Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.
Answers
B.
Use managed exportm, and then import to Cloud Datastore in a separate project under a unique namespace reserved for that export.
B.
Use managed exportm, and then import to Cloud Datastore in a separate project under a unique namespace reserved for that export.
Answers
C.
Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary export files.
C.
Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary export files.
Answers
D.
Write an application that uses Cloud Datastore client libraries to read all the entities. Treat each entity as a BigQuery table row via BigQuery streaming insert. Assign an export timestamp for each export, and attach it as an extra column for each row. Make sure that the BigQuery table is partitioned using the export timestamp column.
D.
Write an application that uses Cloud Datastore client libraries to read all the entities. Treat each entity as a BigQuery table row via BigQuery streaming insert. Assign an export timestamp for each export, and attach it as an extra column for each row. Make sure that the BigQuery table is partitioned using the export timestamp column.
Answers
E.
Write an application that uses Cloud Datastore client libraries to read all the entities. Format the exported data into a JSON file. Apply compression before storing the data in Cloud Source Repositories.
E.
Write an application that uses Cloud Datastore client libraries to read all the entities. Format the exported data into a JSON file. Apply compression before storing the data in Cloud Source Repositories.
Answers
Suggested answer: C, E

You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this dat a. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.

A.
Denormalize the data as must as possible.
A.
Denormalize the data as must as possible.
Answers
B.
Preserve the structure of the data as much as possible.
B.
Preserve the structure of the data as much as possible.
Answers
C.
Use BigQuery UPDATE to further reduce the size of the dataset.
C.
Use BigQuery UPDATE to further reduce the size of the dataset.
Answers
D.
Develop a data pipeline where status updates are appended to BigQuery instead of updated.
D.
Develop a data pipeline where status updates are appended to BigQuery instead of updated.
Answers
E.
Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery's support for external data sources to query.
E.
Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery's support for external data sources to query.
Answers
Suggested answer: A, E

You are designing a cloud-native historical data processing system to meet the following conditions:

The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.

A streaming data pipeline stores new data daily.

Peformance is not a factor in the solution.

The solution design should maximize availability.

How should you design data storage for this solution?

A.
Create a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.
A.
Create a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.
Answers
B.
Store the data in BigQuery. Access the data using the BigQuery Connector or Cloud Dataproc and Compute Engine.
B.
Store the data in BigQuery. Access the data using the BigQuery Connector or Cloud Dataproc and Compute Engine.
Answers
C.
Store the data in a regional Cloud Storage bucket. Aceess the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.
C.
Store the data in a regional Cloud Storage bucket. Aceess the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.
Answers
D.
Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.
D.
Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.
Answers
Suggested answer: D
Total 372 questions
Go to page: of 38