ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 10

Question list
Search
Search

Related questions











The _________ for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Dataflow pipeline.

A.
Cloud Dataflow connector
A.
Cloud Dataflow connector
Answers
B.
DataFlow SDK
B.
DataFlow SDK
Answers
C.
BiqQuery API
C.
BiqQuery API
Answers
D.
BigQuery Data Transfer Service
D.
BigQuery Data Transfer Service
Answers
Suggested answer: A

Explanation:

The Cloud Dataflow connector for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Dataflow pipeline. You can use the connector for both batch and streaming operations.

Reference: https://cloud.google.com/bigtable/docs/dataflow-hbase

Does Dataflow process batch data pipelines or streaming data pipelines?

A.
Only Batch Data Pipelines
A.
Only Batch Data Pipelines
Answers
B.
Both Batch and Streaming Data Pipelines
B.
Both Batch and Streaming Data Pipelines
Answers
C.
Only Streaming Data Pipelines
C.
Only Streaming Data Pipelines
Answers
D.
None of the above
D.
None of the above
Answers
Suggested answer: B

Explanation:

Dataflow is a unified processing model, and can execute both streaming and batch data pipelines

Reference: https://cloud.google.com/dataflow/

You are planning to use Google's Dataflow SDK to analyze customer data such as displayed below.

Your project requirement is to extract only the customer name from the data source and then write to an output PCollection.

Tom,555 X street

Tim,553 Y street

Sam, 111 Z street

Which operation is best suited for the above data processing requirement?

A.
ParDo
A.
ParDo
Answers
B.
Sink API
B.
Sink API
Answers
C.
Source API
C.
Source API
Answers
D.
Data extraction
D.
Data extraction
Answers
Suggested answer: A

Explanation:

In Google Cloud dataflow SDK, you can use the ParDo to extract only a customer name of each element in your PCollection.

Reference: https://cloud.google.com/dataflow/model/par-do

Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?

A.
An hourly watermark
A.
An hourly watermark
Answers
B.
An event time trigger
B.
An event time trigger
Answers
C.
The with Allowed Lateness method
C.
The with Allowed Lateness method
Answers
D.
A processing time trigger
D.
A processing time trigger
Answers
Suggested answer: D

Explanation:

When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window.

Processing time triggers. These triggers operate on the processing time ñ the time when the data element is processed at any given stage in the pipeline.

Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam's default trigger is event time-based.

Reference: https://beam.apache.org/documentation/programming-guide/#triggers

Which of the following is NOT true about Dataflow pipelines?

A.
Dataflow pipelines are tied to Dataflow, and cannot be run on any other runner
A.
Dataflow pipelines are tied to Dataflow, and cannot be run on any other runner
Answers
B.
Dataflow pipelines can consume data from other Google Cloud services
B.
Dataflow pipelines can consume data from other Google Cloud services
Answers
C.
Dataflow pipelines can be programmed in Java
C.
Dataflow pipelines can be programmed in Java
Answers
D.
Dataflow pipelines use a unified programming model, so can work both with streaming and batch data sources
D.
Dataflow pipelines use a unified programming model, so can work both with streaming and batch data sources
Answers
Suggested answer: A

Explanation:

Dataflow pipelines can also run on alternate runtimes like Spark and Flink, as they are built using the Apache Beam SDKs

Reference: https://cloud.google.com/dataflow/

You are developing a software application using Google's Dataflow SDK, and want to use conditional, for loops and other complex programming structures to create a branching pipeline. Which component will be used for the data processing operation?

A.
PCollection
A.
PCollection
Answers
B.
Transform
B.
Transform
Answers
C.
Pipeline
C.
Pipeline
Answers
D.
Sink API
D.
Sink API
Answers
Suggested answer: B

Explanation:

In Google Cloud, the Dataflow SDK provides a transform component. It is responsible for the data processing operation. You can use conditional, for loops, and other complex programming structure to create a branching pipeline.

Reference: https://cloud.google.com/dataflow/model/programming-model

Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

A.
dataflow.worker
A.
dataflow.worker
Answers
B.
dataflow.compute
B.
dataflow.compute
Answers
C.
dataflow.developer
C.
dataflow.developer
Answers
D.
dataflow.viewer
D.
dataflow.viewer
Answers
Suggested answer: A

Explanation:

The dataflow.worker role provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline

Reference: https://cloud.google.com/dataflow/access-control

Which of the following is not true about Dataflow pipelines?

A.
Pipelines are a set of operations
A.
Pipelines are a set of operations
Answers
B.
Pipelines represent a data processing job
B.
Pipelines represent a data processing job
Answers
C.
Pipelines represent a directed graph of steps
C.
Pipelines represent a directed graph of steps
Answers
D.
Pipelines can share data between instances
D.
Pipelines can share data between instances
Answers
Suggested answer: D

Explanation:

The data and transforms in a pipeline are unique to, and owned by, that pipeline. While your program can create multiple pipelines, pipelines cannot share data or transforms

Reference: https://cloud.google.com/dataflow/model/pipelines

By default, which of the following windowing behavior does Dataflow apply to unbounded data sets?

A.
Windows at every 100 MB of data
A.
Windows at every 100 MB of data
Answers
B.
Single, Global Window
B.
Single, Global Window
Answers
C.
Windows at every 1 minute
C.
Windows at every 1 minute
Answers
D.
Windows at every 10 minutes
D.
Windows at every 10 minutes
Answers
Suggested answer: B

Explanation:

Dataflow's default windowing behavior is to assign all elements of a PCollection to a single, global window, even for unbounded PCollections

Reference: https://cloud.google.com/dataflow/model/pcollection

Which of the following job types are supported by Cloud Dataproc (select 3 answers)?

A.
Hive
A.
Hive
Answers
B.
Pig
B.
Pig
Answers
C.
YARN
C.
YARN
Answers
D.
Spark
D.
Spark
Answers
Suggested answer: A, B, D

Explanation:

Cloud Dataproc provides out-of-the box and end-to-end support for many of the most popular job types, including Spark, Spark SQL, PySpark, MapReduce, Hive, and Pig jobs.

Reference: https://cloud.google.com/dataproc/docs/resources/faq#what_type_of_jobs_can_i_run

Total 372 questions
Go to page: of 38