Google Professional Data Engineer Practice Test - Questions Answers, Page 11
List of questions
Question 101
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
What are the minimum permissions needed for a service account used with Google Dataproc?
Explanation:
Service accounts authenticate applications running on your virtual machine instances to other Google Cloud Platform services. For example, if you write an application that reads and writes files on Google Cloud Storage, it must first authenticate to the Google Cloud Storage API. At a minimum, service accounts used with Cloud Dataproc need permissions to read and write to Google Cloud Storage, and to write to Google Cloud Logging.
Reference: https://cloud.google.com/dataproc/docs/concepts/service-accounts#important_notes
Question 102
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?
Explanation:
Service accounts used with Cloud Dataproc must have Dataproc/Dataproc Worker role (or have all the permissions granted by Dataproc Worker role).
Reference: https://cloud.google.com/dataproc/docs/concepts/service-accounts#important_notes
Question 103
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
When creating a new Cloud Dataproc cluster with the projects.regions.clusters.create operation, these four values are required: project, region, name, and ____.
Explanation:
At a minimum, you must specify four values when creating a new cluster with the projects.regions.clusters.create operation:
The project in which the cluster will be created
The region to use
The name of the cluster
The zone in which the cluster will be created You can specify many more details beyond these minimum requirements. For example, you can also specify the number of workers, whether preemptible compute should be used, and the network settings.
Reference: https://cloud.google.com/dataproc/docs/tutorials/python-libraryexample#create_a_new_cloud_dataproc_cluste
Question 104
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
Which Google Cloud Platform service is an alternative to Hadoop with Hive?
Explanation:
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization, query, and analysis.
Google BigQuery is an enterprise data warehouse.
Reference: https://en.wikipedia.org/wiki/Apache_Hive
Question 105
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?
Explanation:
The following rules will apply when you use preemptible workers with a Cloud Dataproc cluster:
. Processing onlyóSince preemptibles can be reclaimed at any time, preemptible workers do not store data. Preemptibles added to a Cloud Dataproc cluster only function as processing nodes.
. No preemptible-only clustersóTo ensure clusters do not lose all workers, Cloud Dataproc cannot create preemptible-only clusters.
. Persistent disk sizeóAs a default, all preemptible workers are created with the smaller of 100GB or the primary worker boot disk size. This disk space is used for local caching of data and is not available through HDFS.
The managed group automatically re-adds workers lost due to reclamation as capacity permits.
Reference: https://cloud.google.com/dataproc/docs/concepts/preemptible-vms
Question 106
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
When using Cloud Dataproc clusters, you can access the YARN web interface by configuring a browser to connect through a ____ proxy.
Explanation:
When using Cloud Dataproc clusters, configure your browser to use the SOCKS proxy. The SOCKS proxy routes data intended for the Cloud Dataproc cluster through an SSH tunnel.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#interfaces
Question 107
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
Cloud Dataproc is a managed Apache Hadoop and Apache _____ service.
Explanation:
Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you use open source data tools for batch processing, querying, streaming, and machine learning.
Reference: https://cloud.google.com/dataproc/docs/
Question 108
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
Which action can a Cloud Dataproc Viewer perform?
Explanation:
A Cloud Dataproc Viewer is limited in its actions based on its role. A viewer can only list clusters, get cluster details, list jobs, get job details, list operations, and get operation details.
Reference:
https://cloud.google.com/dataproc/docs/concepts/iam#iam_roles_and_cloud_dataproc_operations_summary
Question 109
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
Dataproc clusters contain many configuration files. To update these files, you will need to use the -- properties option. The format for the option is: file_prefix:property=_____.
Explanation:
To make updating files and properties easy, the --properties command uses a special format to specify the configuration file and the property and value within the file that should be updated. The formatting is as follows: file_prefix:property=value.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-properties#formatting
Question 110
![Export Export](https://examgecko.com/assets/images/icon-download-24.png)
Scaling a Cloud Dataproc cluster typically involves ____.
Explanation:
After creating a Cloud Dataproc cluster, you can scale the cluster by increasing or decreasing the number of worker nodes in the cluster at any time, even when jobs are running on the cluster. Cloud Dataproc clusters are typically scaled to:
1) increase the number of workers to make a job run faster
2) decrease the number of workers to save money
3) increase the number of nodes to expand available Hadoop Distributed Filesystem (HDFS) storage
Reference: https://cloud.google.com/dataproc/docs/concepts/scaling-clusters
Question