ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 22

Question list
Search
Search

List of questions

Search

Related questions











You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:

Decoupling producer from consumer

Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely Near real-time SQL query Maintain at least 2 years of historical data, which will be queried with SQ Which pipeline should you use to meet these requirements?

A.
Create an application that provides an API. Write a tool to poll the API and write data to Cloud Storage as gzipped JSON files.
A.
Create an application that provides an API. Write a tool to poll the API and write data to Cloud Storage as gzipped JSON files.
Answers
B.
Create an application that writes to a Cloud SQL database to store the data. Set up periodic exports of the database to write to Cloud Storage and load into BigQuery.
B.
Create an application that writes to a Cloud SQL database to store the data. Set up periodic exports of the database to write to Cloud Storage and load into BigQuery.
Answers
C.
Create an application that publishes events to Cloud Pub/Sub, and create Spark jobs on Cloud Dataproc to convert the JSON data to Avro format, stored on HDFS on Persistent Disk.
C.
Create an application that publishes events to Cloud Pub/Sub, and create Spark jobs on Cloud Dataproc to convert the JSON data to Avro format, stored on HDFS on Persistent Disk.
Answers
D.
Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.
D.
Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.
Answers
Suggested answer: A

You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)

A.
Increase the number of max workers
A.
Increase the number of max workers
Answers
B.
Use a larger instance type for your Cloud Dataflow workers
B.
Use a larger instance type for your Cloud Dataflow workers
Answers
C.
Change the zone of your Cloud Dataflow pipeline to run in us-central1
C.
Change the zone of your Cloud Dataflow pipeline to run in us-central1
Answers
D.
Create a temporary table in Cloud Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery
D.
Create a temporary table in Cloud Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery
Answers
E.
Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery
E.
Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery
Answers
Suggested answer: A, B

You have a data pipeline with a Cloud Dataflow job that aggregates and writes time series metrics to Cloud Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the dat a. Which two actions should you take? (Choose two.)

A.
Configure your Cloud Dataflow pipeline to use local execution
A.
Configure your Cloud Dataflow pipeline to use local execution
Answers
B.
Increase the maximum number of Cloud Dataflow workers by setting maxNumWorkers in PipelineOptions
B.
Increase the maximum number of Cloud Dataflow workers by setting maxNumWorkers in PipelineOptions
Answers
C.
Increase the number of nodes in the Cloud Bigtable cluster
C.
Increase the number of nodes in the Cloud Bigtable cluster
Answers
D.
Modify your Cloud Dataflow pipeline to use the Flatten transform before writing to Cloud Bigtable
D.
Modify your Cloud Dataflow pipeline to use the Flatten transform before writing to Cloud Bigtable
Answers
E.
Modify your Cloud Dataflow pipeline to use the CoGroupByKey transform before writing to Cloud Bigtable
E.
Modify your Cloud Dataflow pipeline to use the CoGroupByKey transform before writing to Cloud Bigtable
Answers
Suggested answer: B, C

You have several Spark jobs that run on a Cloud Dataproc cluster on a schedule. Some of the jobs run in sequence, and some of the jobs run concurrently. You need to automate this process. What should you do?

A.
Create a Cloud Dataproc Workflow Template
A.
Create a Cloud Dataproc Workflow Template
Answers
B.
Create an initialization action to execute the jobs
B.
Create an initialization action to execute the jobs
Answers
C.
Create a Directed Acyclic Graph in Cloud Composer
C.
Create a Directed Acyclic Graph in Cloud Composer
Answers
D.
Create a Bash script that uses the Cloud SDK to create a cluster, execute jobs, and then tear down the cluster
D.
Create a Bash script that uses the Cloud SDK to create a cluster, execute jobs, and then tear down the cluster
Answers
Suggested answer: C

You are building a new data pipeline to share data between two different types of applications: jobs generators and job runners. Your solution must scale to accommodate increases in usage and must accommodate the addition of new applications without negatively affecting the performance of existing ones. What should you do?

A.
Create an API using App Engine to receive and send messages to the applications
A.
Create an API using App Engine to receive and send messages to the applications
Answers
B.
Use a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them
B.
Use a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them
Answers
C.
Create a table on Cloud SQL, and insert and delete rows with the job information
C.
Create a table on Cloud SQL, and insert and delete rows with the job information
Answers
D.
Create a table on Cloud Spanner, and insert and delete rows with the job information
D.
Create a table on Cloud Spanner, and insert and delete rows with the job information
Answers
Suggested answer: A

You need to create a new transaction table in Cloud Spanner that stores product sales dat a. You are deciding what to use as a primary key. From a performance perspective, which strategy should you choose?

A.
The current epoch time
A.
The current epoch time
Answers
B.
A concatenation of the product name and the current epoch time
B.
A concatenation of the product name and the current epoch time
Answers
C.
A random universally unique identifier number (version 4 UUID)
C.
A random universally unique identifier number (version 4 UUID)
Answers
D.
The original order identification number from the sales system, which is a monotonically increasing integer
D.
The original order identification number from the sales system, which is a monotonically increasing integer
Answers
Suggested answer: C

Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

A.
Enable data access logs in each Data Analyst's project. Restrict access to Stackdriver Logging via Cloud IAM roles.
A.
Enable data access logs in each Data Analyst's project. Restrict access to Stackdriver Logging via Cloud IAM roles.
Answers
B.
Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts' projects. Restrict access to the Cloud Storage bucket.
B.
Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts' projects. Restrict access to the Cloud Storage bucket.
Answers
C.
Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit logs. Restrict access to the project with the exported logs.
C.
Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit logs. Restrict access to the project with the exported logs.
Answers
D.
Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit logs. Restrict access to the project that contains the exported logs.
D.
Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit logs. Restrict access to the project that contains the exported logs.
Answers
Suggested answer: D

Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable each team to monitor slot usage within their projects. What should you do?

A.
Create a Stackdriver Monitoring dashboard based on the BigQuery metric query/scanned_bytes
A.
Create a Stackdriver Monitoring dashboard based on the BigQuery metric query/scanned_bytes
Answers
B.
Create a Stackdriver Monitoring dashboard based on the BigQuery metric slots/allocated_for_project
B.
Create a Stackdriver Monitoring dashboard based on the BigQuery metric slots/allocated_for_project
Answers
C.
Create a log export for each project, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric
C.
Create a log export for each project, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric
Answers
D.
Create an aggregated log export at the organization level, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric
D.
Create an aggregated log export at the organization level, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric
Answers
Suggested answer: D

You are operating a streaming Cloud Dataflow pipeline. Your engineers have a new version of the pipeline with a different windowing algorithm and triggering strategy. You want to update the running pipeline with the new version. You want to ensure that no data is lost during the update.

What should you do?

A.
Update the Cloud Dataflow pipeline inflight by passing the --update option with the --jobName set to the existing job name
A.
Update the Cloud Dataflow pipeline inflight by passing the --update option with the --jobName set to the existing job name
Answers
B.
Update the Cloud Dataflow pipeline inflight by passing the --update option with the --jobName set to a new unique job name
B.
Update the Cloud Dataflow pipeline inflight by passing the --update option with the --jobName set to a new unique job name
Answers
C.
Stop the Cloud Dataflow pipeline with the Cancel option. Create a new Cloud Dataflow job with the updated code
C.
Stop the Cloud Dataflow pipeline with the Cancel option. Create a new Cloud Dataflow job with the updated code
Answers
D.
Stop the Cloud Dataflow pipeline with the Drain option. Create a new Cloud Dataflow job with the updated code
D.
Stop the Cloud Dataflow pipeline with the Drain option. Create a new Cloud Dataflow job with the updated code
Answers
Suggested answer: A

You need to move 2 PB of historical data from an on-premises storage appliance to Cloud Storage within six months, and your outbound network capacity is constrained to 20 Mb/sec. How should you migrate this data to Cloud Storage?

A.
Use Transfer Appliance to copy the data to Cloud Storage
A.
Use Transfer Appliance to copy the data to Cloud Storage
Answers
B.
Use gsutil cp ñJ to compress the content being uploaded to Cloud Storage
B.
Use gsutil cp ñJ to compress the content being uploaded to Cloud Storage
Answers
C.
Create a private URL for the historical data, and then use Storage Transfer Service to copy the data to Cloud Storage
C.
Create a private URL for the historical data, and then use Storage Transfer Service to copy the data to Cloud Storage
Answers
D.
Use trickle or ionice along with gsutil cp to limit the amount of bandwidth gsutil utilizes to less than 20 Mb/sec so it does not interfere with the production traffic
D.
Use trickle or ionice along with gsutil cp to limit the amount of bandwidth gsutil utilizes to less than 20 Mb/sec so it does not interfere with the production traffic
Answers
Suggested answer: A
Total 372 questions
Go to page: of 38