ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 12

Question list
Search
Search

Related questions











Cloud Dataproc charges you only for what you really use with _____ billing.

A.
month-by-month
A.
month-by-month
Answers
B.
minute-by-minute
B.
minute-by-minute
Answers
C.
week-by-week
C.
week-by-week
Answers
D.
hour-by-hour
D.
hour-by-hour
Answers
Suggested answer: B

Explanation:

One of the advantages of Cloud Dataproc is its low cost. Dataproc charges for what you really use with minute-by-minute billing and a low, ten-minute-minimum billing period.

Reference: https://cloud.google.com/dataproc/docs/concepts/overview

The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster ____.

A.
application node
A.
application node
Answers
B.
conditional node
B.
conditional node
Answers
C.
master node
C.
master node
Answers
D.
worker node
D.
worker node
Answers
Suggested answer: C

Explanation:

The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster master node. The cluster master-host-name is the name of your Cloud Dataproc cluster followed by an -m suffixófor example, if your cluster is named "my-cluster", the master-host-name would be "my-cluster-m".

Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#interfaces

Which of these is NOT a way to customize the software on Dataproc cluster instances?

A.
Set initialization actions
A.
Set initialization actions
Answers
B.
Modify configuration files using cluster properties
B.
Modify configuration files using cluster properties
Answers
C.
Configure the cluster using Cloud Deployment Manager
C.
Configure the cluster using Cloud Deployment Manager
Answers
D.
Log into the master node and make changes from there
D.
Log into the master node and make changes from there
Answers
Suggested answer: C

Explanation:

You can access the master node of the cluster by clicking the SSH button next to it in the Cloud Console.

You can easily use the --properties option of the dataproc command in the Google Cloud SDK to modify many common configuration files when creating a cluster.

When creating a Cloud Dataproc cluster, you can specify initialization actions in executables and/or scripts that Cloud Dataproc will run on all nodes in your Cloud Dataproc cluster immediately after the cluster is set up. [https:// cloud.google.com/dataproc/docs/concepts/configuring-clusters/initactions]

Reference: https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/clusterproperties

In order to securely transfer web traffic data from your computer's web browser to the Cloud Dataproc cluster you should use a(n) _____.

A.
VPN connection
A.
VPN connection
Answers
B.
Special browser
B.
Special browser
Answers
C.
SSH tunnel
C.
SSH tunnel
Answers
D.
FTP connection
D.
FTP connection
Answers
Suggested answer: C

Explanation:

To connect to the web interfaces, it is recommended to use an SSH tunnel to create a secure connection to the master node.

Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-webinterfaces#connecting_to_the_web_interfaces

All Google Cloud Bigtable client requests go through a front-end server ______ they are sent to a Cloud Bigtable node.

A.
before
A.
before
Answers
B.
after
B.
after
Answers
C.
only if
C.
only if
Answers
D.
once
D.
once
Answers
Suggested answer: A

Explanation:

In a Cloud Bigtable architecture all client requests go through a front-end server before they are sent to a Cloud Bigtable node.

The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, which is a container for the cluster. Each node in the cluster handles a subset of the requests to the cluster.

When additional nodes are added to a cluster, you can increase the number of simultaneous requests that the cluster can handle, as well as the maximum throughput for the entire cluster.

Reference: https://cloud.google.com/bigtable/docs/overview

What is the general recommendation when designing your row keys for a Cloud Bigtable schema?

A.
Include multiple time series values within the row key
A.
Include multiple time series values within the row key
Answers
B.
Keep the row keep as an 8 bit integer
B.
Keep the row keep as an 8 bit integer
Answers
C.
Keep your row key reasonably short
C.
Keep your row key reasonably short
Answers
D.
Keep your row key as long as the field permits
D.
Keep your row key as long as the field permits
Answers
Suggested answer: C

Explanation:

A general guide is to, keep your row keys reasonably short. Long row keys take up additional memory and storage and increase the time it takes to get responses from the Cloud Bigtable server.

Reference: https://cloud.google.com/bigtable/docs/schema-design#row-keys

Which of the following statements is NOT true regarding Bigtable access roles?

A.
Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a project.
A.
Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a project.
Answers
B.
To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.
B.
To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.
Answers
C.
You can configure access control only at the project level.
C.
You can configure access control only at the project level.
Answers
D.
To give a user access to only one table in a project, you must configure access through your application.
D.
To give a user access to only one table in a project, you must configure access through your application.
Answers
Suggested answer: B

Explanation:

For Cloud Bigtable, you can configure access control at the project level. For example, you can grant the ability to:

Read from, but not write to, any table within the project.

Read from and write to any table within the project, but not manage instances.

Read from and write to any table within the project, and manage instances.

Reference: https://cloud.google.com/bigtable/docs/access-control

For the best possible performance, what is the recommended zone for your Compute Engine instance and Cloud Bigtable instance?

A.
Have the Compute Engine instance in the furthest zone from the Cloud Bigtable instance.
A.
Have the Compute Engine instance in the furthest zone from the Cloud Bigtable instance.
Answers
B.
Have both the Compute Engine instance and the Cloud Bigtable instance to be in different zones.
B.
Have both the Compute Engine instance and the Cloud Bigtable instance to be in different zones.
Answers
C.
Have both the Compute Engine instance and the Cloud Bigtable instance to be in the same zone.
C.
Have both the Compute Engine instance and the Cloud Bigtable instance to be in the same zone.
Answers
D.
Have the Cloud Bigtable instance to be in the same zone as all of the consumers of your data.
D.
Have the Cloud Bigtable instance to be in the same zone as all of the consumers of your data.
Answers
Suggested answer: C

Explanation:

It is recommended to create your Compute Engine instance in the same zone as your Cloud Bigtable instance for the best possible performance, If it's not possible to create a instance in the same zone, you should create your instance in another zone within the same region. For example, if your Cloud Bigtable instance is located in us-central1-b, you could create your instance in us-central1-f. This change may result in several milliseconds of additional latency for each Cloud Bigtable request.

It is recommended to avoid creating your Compute Engine instance in a different region from your Cloud Bigtable instance, which can add hundreds of milliseconds of latency to each Cloud Bigtable request.

Reference: https://cloud.google.com/bigtable/docs/creating-compute-instance

Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?

A.
A sequential numeric ID
A.
A sequential numeric ID
Answers
B.
A timestamp followed by a stock symbol
B.
A timestamp followed by a stock symbol
Answers
C.
A non-sequential numeric ID
C.
A non-sequential numeric ID
Answers
D.
A stock symbol followed by a timestamp
D.
A stock symbol followed by a timestamp
Answers
Suggested answer: A, B

Explanation:

...using a timestamp as the first element of a row key can cause a variety of problems.

In brief, when a row key for a time series includes a timestamp, all of your writes will target a single node; fill that node; and then move onto the next node in the cluster, resulting in hotspotting.

Suppose your system assigns a numeric ID to each of your application's users. You might be tempted to use the user's numeric ID as the row key for your table. However, since new users are more likely to be active users, this approach is likely to push most of your traffic to a small number of nodes.

[https://cloud.google.com/bigtable/docs/schema-design]

Reference: https://cloud.google.com/bigtable/docs/schema-design-timeseries#ensure_that_your_row_key_avoids_hotspotting

When a Cloud Bigtable node fails, ____ is lost.

A.
all data
A.
all data
Answers
B.
no data
B.
no data
Answers
C.
the last transaction
C.
the last transaction
Answers
D.
the time dimension
D.
the time dimension
Answers
Suggested answer: B

Explanation:

A Cloud Bigtable table is sharded into blocks of contiguous rows, called tablets, to help balance the workload of queries. Tablets are stored on Colossus, Google's file system, in SSTable format. Each tablet is associated with a specific Cloud Bigtable node.

Data is never stored in Cloud Bigtable nodes themselves; each node has pointers to a set of tablets that are stored on Colossus. As a result:

Rebalancing tablets from one node to another is very fast, because the actual data is not copied.

Cloud Bigtable simply updates the pointers for each node.

Recovery from the failure of a Cloud Bigtable node is very fast, because only metadata needs to be migrated to the replacement node.

When a Cloud Bigtable node fails, no data is lost

Reference: https://cloud.google.com/bigtable/docs/overview

Total 372 questions
Go to page: of 38