ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 9

Question list
Search
Search

Related questions











If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?

A.
1 continuous and 2 categorical
A.
1 continuous and 2 categorical
Answers
B.
3 categorical
B.
3 categorical
Answers
C.
3 continuous
C.
3 continuous
Answers
D.
2 continuous and 1 categorical
D.
2 continuous and 1 categorical
Answers
Suggested answer: D

Explanation:

The columns can be grouped into two typesócategorical and continuous columns:

A column is called categorical if its value can only be one of the categories in a finite set. For example, the native country of a person (U.S., India, Japan, etc.) or the education level (high school, college, etc.) are categorical columns.

A column is called continuous if its value can be any numerical value in a continuous range. For example, the capital gain of a person (e.g. $14,084) is a continuous column.

Year of birth and income are continuous columns. Country is a categorical column.

You could use bucketization to turn year of birth and/or income into categorical features, but the raw columns are continuous.

Reference: https://www.tensorflow.org/tutorials/wide#reading_the_census_data

Which of the following are examples of hyperparameters? (Select 2 answers.)

A.
Number of hidden layers
A.
Number of hidden layers
Answers
B.
Number of nodes in each hidden layer
B.
Number of nodes in each hidden layer
Answers
C.
Biases
C.
Biases
Answers
D.
Weights
D.
Weights
Answers
Suggested answer: A, B

Explanation:

If model parameters are variables that get adjusted by training with existing data, your hyperparameters are the variables about the training process itself. For example, part of setting up a deep neural network is deciding how many "hidden" layers of nodes to use between the input layer and the output layer, as well as how many nodes each layer should use. These variables are not directly related to the training data at all. They are configuration variables. Another difference is that parameters change during a training job, while the hyperparameters are usually constant during a job.

Weights and biases are variables that get adjusted during the training process, so they are not hyperparameters.

Reference: https://cloud.google.com/ml-engine/docs/hyperparameter-tuning-overview

Which of the following are feature engineering techniques? (Select 2 answers)

A.
Hidden feature layers
A.
Hidden feature layers
Answers
B.
Feature prioritization
B.
Feature prioritization
Answers
C.
Crossed feature columns
C.
Crossed feature columns
Answers
D.
Bucketization of a continuous feature
D.
Bucketization of a continuous feature
Answers
Suggested answer: C, D

Explanation:

Selecting and crafting the right set of feature columns is key to learning an effective model.

Bucketization is a process of dividing the entire range of a continuous feature into a set of consecutive bins/buckets, and then converting the original numerical feature into a bucket ID (as a categorical feature) depending on which bucket that value falls into.

Using each base feature column separately may not be enough to explain the data. To learn the differences between different feature combinations, we can add crossed feature columns to the model.

Reference:

https://www.tensorflow.org/tutorials/wide#selecting_and_engineering_features_for_the_model

You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

A.
Both batch and streaming
A.
Both batch and streaming
Answers
B.
BigQuery cannot be used as a sink
B.
BigQuery cannot be used as a sink
Answers
C.
Only batch
C.
Only batch
Answers
D.
Only streaming
D.
Only streaming
Answers
Suggested answer: A

Explanation:

When you apply a BigQueryIO.Write transform in batch mode to write to a single table, Dataflow invokes a BigQuery load job. When you apply a BigQueryIO.Write transform in streaming mode or in batch mode using a function to specify the destination table, Dataflow uses BigQuery's streaming inserts

Reference: https://cloud.google.com/dataflow/model/bigquery-io

You have a job that you want to cancel. It is a streaming pipeline, and you want to ensure that any data that is in-flight is processed and written to the output. Which of the following commands can you use on the Dataflow monitoring console to stop the pipeline job?

A.
Cancel
A.
Cancel
Answers
B.
Drain
B.
Drain
Answers
C.
Stop
C.
Stop
Answers
D.
Finish
D.
Finish
Answers
Suggested answer: B

Explanation:

Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state.

Your job will immediately stop ingesting new data from input sources, but the Dataflow service will preserve any existing resources (such as worker instances) to finish processing and writing any buffered data in your pipeline.

Reference: https://cloud.google.com/dataflow/pipelines/stopping-a-pipeline

When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

A.
Your gcloud does not have access to the BigQuery resources
A.
Your gcloud does not have access to the BigQuery resources
Answers
B.
BigQuery cannot be accessed from local machines
B.
BigQuery cannot be accessed from local machines
Answers
C.
You are missing gcloud on your machine
C.
You are missing gcloud on your machine
Answers
D.
Pipelines cannot be run locally
D.
Pipelines cannot be run locally
Answers
Suggested answer: A

Explanation:

When reading from a Dataflow source or writing to a Dataflow sink using DirectPipelineRunner, the Cloud Platform account that you configured with the gcloud executable will need access to the corresponding source/sink

Reference: https://cloud.google.com/dataflow/javasdk/

JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner

What Dataflow concept determines when a Window's contents should be output based on certain criteria being met?

A.
Sessions
A.
Sessions
Answers
B.
OutputCriteria
B.
OutputCriteria
Answers
C.
Windows
C.
Windows
Answers
D.
Triggers
D.
Triggers
Answers
Suggested answer: D

Explanation:

Triggers control when the elements for a specific key and window are output. As elements arrive, they are put into one or more windows by a Window transform and its associated WindowFn, and then passed to the associated Trigger to determine if the Windows contents should be output.

Reference: https://cloud.google.com/dataflow/javasdk/

JavaDoc/com/google/cloud/dataflow/sdk/transforms/windowing/Trigger

Which of the following is NOT one of the three main types of triggers that Dataflow supports?

A.
Trigger based on element size in bytes
A.
Trigger based on element size in bytes
Answers
B.
Trigger that is a combination of other triggers
B.
Trigger that is a combination of other triggers
Answers
C.
Trigger based on element count
C.
Trigger based on element count
Answers
D.
Trigger based on time
D.
Trigger based on time
Answers
Suggested answer: A

Explanation:

There are three major kinds of triggers that Dataflow supports: 1. Time-based triggers 2. Data-driven triggers. You can set a trigger to emit results from a window when that window has received a certain number of data elements. 3.

Composite triggers. These triggers combine multiple time-based or data-driven triggers in some logical way

Reference: https://cloud.google.com/dataflow/model/triggers

Which Java SDK class can you use to run your Dataflow programs locally?

A.
LocalRunner
A.
LocalRunner
Answers
B.
DirectPipelineRunner
B.
DirectPipelineRunner
Answers
C.
MachineRunner
C.
MachineRunner
Answers
D.
LocalPipelineRunner
D.
LocalPipelineRunner
Answers
Suggested answer: B

Explanation:

DirectPipelineRunner allows you to execute operations in the pipeline directly, without any optimization. Useful for small local execution and tests

Reference: https://cloud.google.com/dataflow/javasdk/

JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner

The Dataflow SDKs have been recently transitioned into which Apache service?

A.
Apache Spark
A.
Apache Spark
Answers
B.
Apache Hadoop
B.
Apache Hadoop
Answers
C.
Apache Kafka
C.
Apache Kafka
Answers
D.
Apache Beam
D.
Apache Beam
Answers
Suggested answer: D

Explanation:

Dataflow SDKs are being transitioned to Apache Beam, as per the latest Google directive

Reference: https://cloud.google.com/dataflow/docs/

Total 372 questions
Go to page: of 38