ExamGecko
Home / Google / Professional Data Engineer / List of questions
Ask Question

Google Professional Data Engineer Practice Test - Questions Answers, Page 9

List of questions

Question 81

Report
Export
Collapse

If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?

1 continuous and 2 categorical
1 continuous and 2 categorical
3 categorical
3 categorical
3 continuous
3 continuous
2 continuous and 1 categorical
2 continuous and 1 categorical
Suggested answer: D

Explanation:

The columns can be grouped into two typesócategorical and continuous columns:

A column is called categorical if its value can only be one of the categories in a finite set. For example, the native country of a person (U.S., India, Japan, etc.) or the education level (high school, college, etc.) are categorical columns.

A column is called continuous if its value can be any numerical value in a continuous range. For example, the capital gain of a person (e.g. $14,084) is a continuous column.

Year of birth and income are continuous columns. Country is a categorical column.

You could use bucketization to turn year of birth and/or income into categorical features, but the raw columns are continuous.

Reference: https://www.tensorflow.org/tutorials/wide#reading_the_census_data

asked 18/09/2024
Bob Xiong
38 questions

Question 82

Report
Export
Collapse

Which of the following are examples of hyperparameters? (Select 2 answers.)

Number of hidden layers
Number of hidden layers
Number of nodes in each hidden layer
Number of nodes in each hidden layer
Biases
Biases
Weights
Weights
Suggested answer: A, B

Explanation:

If model parameters are variables that get adjusted by training with existing data, your hyperparameters are the variables about the training process itself. For example, part of setting up a deep neural network is deciding how many "hidden" layers of nodes to use between the input layer and the output layer, as well as how many nodes each layer should use. These variables are not directly related to the training data at all. They are configuration variables. Another difference is that parameters change during a training job, while the hyperparameters are usually constant during a job.

Weights and biases are variables that get adjusted during the training process, so they are not hyperparameters.

Reference: https://cloud.google.com/ml-engine/docs/hyperparameter-tuning-overview

asked 18/09/2024
Ryan John Ricafranca
46 questions

Question 83

Report
Export
Collapse

Which of the following are feature engineering techniques? (Select 2 answers)

Hidden feature layers
Hidden feature layers
Feature prioritization
Feature prioritization
Crossed feature columns
Crossed feature columns
Bucketization of a continuous feature
Bucketization of a continuous feature
Suggested answer: C, D

Explanation:

Selecting and crafting the right set of feature columns is key to learning an effective model.

Bucketization is a process of dividing the entire range of a continuous feature into a set of consecutive bins/buckets, and then converting the original numerical feature into a bucket ID (as a categorical feature) depending on which bucket that value falls into.

Using each base feature column separately may not be enough to explain the data. To learn the differences between different feature combinations, we can add crossed feature columns to the model.

Reference:

https://www.tensorflow.org/tutorials/wide#selecting_and_engineering_features_for_the_model

asked 18/09/2024
Harry Meijer
44 questions

Question 84

Report
Export
Collapse

You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

Both batch and streaming
Both batch and streaming
BigQuery cannot be used as a sink
BigQuery cannot be used as a sink
Only batch
Only batch
Only streaming
Only streaming
Suggested answer: A

Explanation:

When you apply a BigQueryIO.Write transform in batch mode to write to a single table, Dataflow invokes a BigQuery load job. When you apply a BigQueryIO.Write transform in streaming mode or in batch mode using a function to specify the destination table, Dataflow uses BigQuery's streaming inserts

Reference: https://cloud.google.com/dataflow/model/bigquery-io

asked 18/09/2024
James Sutter
33 questions

Question 85

Report
Export
Collapse

You have a job that you want to cancel. It is a streaming pipeline, and you want to ensure that any data that is in-flight is processed and written to the output. Which of the following commands can you use on the Dataflow monitoring console to stop the pipeline job?

Cancel
Cancel
Drain
Drain
Stop
Stop
Finish
Finish
Suggested answer: B

Explanation:

Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state.

Your job will immediately stop ingesting new data from input sources, but the Dataflow service will preserve any existing resources (such as worker instances) to finish processing and writing any buffered data in your pipeline.

Reference: https://cloud.google.com/dataflow/pipelines/stopping-a-pipeline

asked 18/09/2024
nebaba monda
42 questions

Question 86

Report
Export
Collapse

When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

Your gcloud does not have access to the BigQuery resources
Your gcloud does not have access to the BigQuery resources
BigQuery cannot be accessed from local machines
BigQuery cannot be accessed from local machines
You are missing gcloud on your machine
You are missing gcloud on your machine
Pipelines cannot be run locally
Pipelines cannot be run locally
Suggested answer: A

Explanation:

When reading from a Dataflow source or writing to a Dataflow sink using DirectPipelineRunner, the Cloud Platform account that you configured with the gcloud executable will need access to the corresponding source/sink

Reference: https://cloud.google.com/dataflow/javasdk/

JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner

asked 18/09/2024
Gofaone Ncube
42 questions

Question 87

Report
Export
Collapse

What Dataflow concept determines when a Window's contents should be output based on certain criteria being met?

Sessions
Sessions
OutputCriteria
OutputCriteria
Windows
Windows
Triggers
Triggers
Suggested answer: D

Explanation:

Triggers control when the elements for a specific key and window are output. As elements arrive, they are put into one or more windows by a Window transform and its associated WindowFn, and then passed to the associated Trigger to determine if the Windows contents should be output.

Reference: https://cloud.google.com/dataflow/javasdk/

JavaDoc/com/google/cloud/dataflow/sdk/transforms/windowing/Trigger

asked 18/09/2024
107 gleann na ri charles
34 questions

Question 88

Report
Export
Collapse

Which of the following is NOT one of the three main types of triggers that Dataflow supports?

Trigger based on element size in bytes
Trigger based on element size in bytes
Trigger that is a combination of other triggers
Trigger that is a combination of other triggers
Trigger based on element count
Trigger based on element count
Trigger based on time
Trigger based on time
Suggested answer: A

Explanation:

There are three major kinds of triggers that Dataflow supports: 1. Time-based triggers 2. Data-driven triggers. You can set a trigger to emit results from a window when that window has received a certain number of data elements. 3.

Composite triggers. These triggers combine multiple time-based or data-driven triggers in some logical way

Reference: https://cloud.google.com/dataflow/model/triggers

asked 18/09/2024
Petya Pavlova
43 questions

Question 89

Report
Export
Collapse

Which Java SDK class can you use to run your Dataflow programs locally?

LocalRunner
LocalRunner
DirectPipelineRunner
DirectPipelineRunner
MachineRunner
MachineRunner
LocalPipelineRunner
LocalPipelineRunner
Suggested answer: B

Explanation:

DirectPipelineRunner allows you to execute operations in the pipeline directly, without any optimization. Useful for small local execution and tests

Reference: https://cloud.google.com/dataflow/javasdk/

JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner

asked 18/09/2024
Irving Indian
30 questions

Question 90

Report
Export
Collapse

The Dataflow SDKs have been recently transitioned into which Apache service?

Apache Spark
Apache Spark
Apache Hadoop
Apache Hadoop
Apache Kafka
Apache Kafka
Apache Beam
Apache Beam
Suggested answer: D

Explanation:

Dataflow SDKs are being transitioned to Apache Beam, as per the latest Google directive

Reference: https://cloud.google.com/dataflow/docs/

asked 18/09/2024
Ahti Paju
34 questions
Total 377 questions
Go to page: of 38

Related questions