ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 24

Question list
Search
Search

List of questions

Search

Related questions











You plan to deploy Cloud SQL using MySQL. You need to ensure high availability in the event of a zone failure. What should you do?

A.
Create a Cloud SQL instance in one zone, and create a failover replica in another zone within the same region.
A.
Create a Cloud SQL instance in one zone, and create a failover replica in another zone within the same region.
Answers
B.
Create a Cloud SQL instance in one zone, and create a read replica in another zone within the same region.
B.
Create a Cloud SQL instance in one zone, and create a read replica in another zone within the same region.
Answers
C.
Create a Cloud SQL instance in one zone, and configure an external read replica in a zone in a different region.
C.
Create a Cloud SQL instance in one zone, and configure an external read replica in a zone in a different region.
Answers
D.
Create a Cloud SQL instance in a region, and configure automatic backup to a Cloud Storage bucket in the same region.
D.
Create a Cloud SQL instance in a region, and configure automatic backup to a Cloud Storage bucket in the same region.
Answers
Suggested answer: C

Your company is selecting a system to centralize data ingestion and delivery. You are considering messaging and data integration systems to address the requirements. The key requirements are:

The ability to seek to a particular offset in a topic, possibly back to the start of all data ever captured Support for publish/subscribe semantics on hundreds of topics Retain per-key ordering Which system should you choose?

A.
Apache Kafka
A.
Apache Kafka
Answers
B.
Cloud Storage
B.
Cloud Storage
Answers
C.
Cloud Pub/Sub
C.
Cloud Pub/Sub
Answers
D.
Firebase Cloud Messaging
D.
Firebase Cloud Messaging
Answers
Suggested answer: A

You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for longrunning batch jobs. You want to use a managed service. What should you do?

A.
Deploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers.Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
A.
Deploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers.Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
Answers
B.
Deploy a Cloud Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
B.
Deploy a Cloud Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
Answers
C.
Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances.Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://
C.
Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances.Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://
Answers
D.
Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://
D.
Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://
Answers
Suggested answer: A

Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?

A.
Perform hyperparameter tuning
A.
Perform hyperparameter tuning
Answers
B.
Train a classifier with deep neural networks, because neural networks would always beat SVMs
B.
Train a classifier with deep neural networks, because neural networks would always beat SVMs
Answers
C.
Deploy the model and measure the real-world AUC; it's always higher because of generalization
C.
Deploy the model and measure the real-world AUC; it's always higher because of generalization
Answers
D.
Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC
D.
Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC
Answers
Suggested answer: A

Explanation:

https://towardsdatascience.com/understanding-hyperparameters-and-its-optimisation-techniquesf0debba07568

You need to deploy additional dependencies to all of a Cloud Dataproc cluster at startup using an existing initialization action. Company security policies require that Cloud Dataproc nodes do not have access to the Internet so public initialization actions cannot fetch resources. What should you do?

A.
Deploy the Cloud SQL Proxy on the Cloud Dataproc master
A.
Deploy the Cloud SQL Proxy on the Cloud Dataproc master
Answers
B.
Use an SSH tunnel to give the Cloud Dataproc cluster access to the Internet
B.
Use an SSH tunnel to give the Cloud Dataproc cluster access to the Internet
Answers
C.
Copy all dependencies to a Cloud Storage bucket within your VPC security perimeter
C.
Copy all dependencies to a Cloud Storage bucket within your VPC security perimeter
Answers
D.
Use Resource Manager to add the service account used by the Cloud Dataproc cluster to the Network User role
D.
Use Resource Manager to add the service account used by the Cloud Dataproc cluster to the Network User role
Answers
Suggested answer: C

Explanation:


You need to choose a database for a new project that has the following requirements:

Fully managed

Able to automatically scale up

Transactionally consistent

Able to scale up to 6 TB

Able to be queried using SQL

Which database do you choose?

A.
Cloud SQL
A.
Cloud SQL
Answers
B.
Cloud Bigtable
B.
Cloud Bigtable
Answers
C.
Cloud Spanner
C.
Cloud Spanner
Answers
D.
Cloud Datastore
D.
Cloud Datastore
Answers
Suggested answer: C

You work for a mid-sized enterprise that needs to move its operational system transaction data from an on-premises database to GCP. The database is about 20 TB in size. Which database should you choose?

A.
Cloud SQL
A.
Cloud SQL
Answers
B.
Cloud Bigtable
B.
Cloud Bigtable
Answers
C.
Cloud Spanner
C.
Cloud Spanner
Answers
D.
Cloud Datastore
D.
Cloud Datastore
Answers
Suggested answer: A

You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database.

You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose?

A.
Create a table in BigQuery, and append the new samples for CPU and memory to the table
A.
Create a table in BigQuery, and append the new samples for CPU and memory to the table
Answers
B.
Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second
B.
Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second
Answers
C.
Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second
C.
Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second
Answers
D.
Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.
D.
Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.
Answers
Suggested answer: C

Explanation:

A tall and narrow table has a small number of events per row, which could be just one event, whereas a short and wide table has a large number of events per row. As explained in a moment, tall and narrow tables are best suited for time-series data. For time series, you should generally use tall and narrow tables. This is for two reasons: Storing one event per row makes it easier to run queries against your data. Storing many events per row makes it more likely that the total row size will exceed the recommended maximum (see Rows can be big but are not infinite).

https://cloud.google.com/bigtable/docs/schema-design-time-series#patterns_for_row_key_design

You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the "Trust No One" (TNO) approach to encrypt your data to prevent the cloud provider staff from decrypting your dat a. What should you do?

A.
Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AAD). Use gsutil cp to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google Cloud.
A.
Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AAD). Use gsutil cp to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google Cloud.
Answers
B.
Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key. Use gsutil cp to upload each encrypted file to the Cloud Storage bucket. Manually destroy the key previously used for encryption, and rotate the key once and rotate the key once.
B.
Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key. Use gsutil cp to upload each encrypted file to the Cloud Storage bucket. Manually destroy the key previously used for encryption, and rotate the key once and rotate the key once.
Answers
C.
Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in Cloud Memorystore as permanent storage of the secret.
C.
Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in Cloud Memorystore as permanent storage of the secret.
Answers
D.
Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in a different project that only the security team can access.
D.
Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in a different project that only the security team can access.
Answers
Suggested answer: B

You have data pipelines running on BigQuery, Cloud Dataflow, and Cloud Dataproc. You need to perform health checks and monitor their behavior, and then notify the team managing the pipelines if they fail. You also need to be able to work across multiple projects. Your preference is to use managed products of features of the platform. What should you do?

A.
Export the information to Cloud Stackdriver, and set up an Alerting policy
A.
Export the information to Cloud Stackdriver, and set up an Alerting policy
Answers
B.
Run a Virtual Machine in Compute Engine with Airflow, and export the information to Stackdriver
B.
Run a Virtual Machine in Compute Engine with Airflow, and export the information to Stackdriver
Answers
C.
Export the logs to BigQuery, and set up App Engine to read that information and send emails if you find a failure in the logs
C.
Export the logs to BigQuery, and set up App Engine to read that information and send emails if you find a failure in the logs
Answers
D.
Develop an App Engine application to consume logs using GCP API calls, and send emails if you find a failure in the logs
D.
Develop an App Engine application to consume logs using GCP API calls, and send emails if you find a failure in the logs
Answers
Suggested answer: B
Total 372 questions
Go to page: of 38