ExamGecko
Home / Google / Professional Data Engineer / List of questions
Ask Question

Google Professional Data Engineer Practice Test - Questions Answers, Page 27

List of questions

Question 261

Report
Export
Collapse

You've migrated a Hadoop job from an on-premises cluster to Dataproc and Good Storage. Your Spark job is a complex analytical workload fiat consists of many shuffling operations, and initial data are parquet toes (on average 200-400 MB size each) You see some degradation in performance after the migration to Dataproc so you'd like to optimize for it. Your organization is very cost-sensitive so you'd Idee to continue using Dataproc on preemptibles (with 2 non-preemptibles workers only) for this workload. What should you do?

Switch from HODs to SSDs override the preemptible VMs configuration to increase the boot disk size
Switch from HODs to SSDs override the preemptible VMs configuration to increase the boot disk size
Increase the see of your parquet files to ensure them to be 1 GB minimum
Increase the see of your parquet files to ensure them to be 1 GB minimum
Switch to TFRecords format (appr 200 MB per We) instead of parquet files
Switch to TFRecords format (appr 200 MB per We) instead of parquet files
Switch from HDDs to SSDs. copy initial data from Cloud Storage to Hadoop Distributed File System (HDFS) run the Spark job and copy results back to Cloud Storage
Switch from HDDs to SSDs. copy initial data from Cloud Storage to Hadoop Distributed File System (HDFS) run the Spark job and copy results back to Cloud Storage
Suggested answer: A
asked 18/09/2024
Jennifer Okai Addey
36 questions

Question 262

Report
Export
Collapse

Your company currently runs a large on-premises cluster using Spark Hive and Hadoop Distributed File System (HDFS) in a colocation facility. The duster is designed to support peak usage on the system, however, many jobs are batch n nature, and usage of the cluster fluctuates quite dramatically.

Your company is eager to move to the cloud to reduce the overhead associated with on-premises infrastructure and maintenance and to benefit from the cost savings. They are also hoping to modernize their existing infrastructure to use more servers offerings m order to take advantage of the cloud Because of the tuning of their contract renewal with the colocation facility they have only 2 months for their initial migration How should you recommend they approach thee upcoming migration strategy so they can maximize their cost savings in the cloud will still executing the migration in time?

Migrate the workloads to Dataproc plus HOPS, modernize later
Migrate the workloads to Dataproc plus HOPS, modernize later
Migrate the workloads to Dataproc plus Cloud Storage modernize later
Migrate the workloads to Dataproc plus Cloud Storage modernize later
Migrate the Spark workload to Dataproc plus HDFS, and modernize the Hive workload for BigQuery
Migrate the Spark workload to Dataproc plus HDFS, and modernize the Hive workload for BigQuery
Modernize the Spark workload for Dataflow and the Hive workload for BigQuery
Modernize the Spark workload for Dataflow and the Hive workload for BigQuery
Suggested answer: D
asked 18/09/2024
DATA 7 DATA7
41 questions

Question 263

Report
Export
Collapse

You are collecting loT sensor data from millions of devices across the world and storing the data in BigQuery. Your access pattern is based on recent data tittered by location_id and device_version with the following query:

Google Professional Data Engineer image Question 263 29859 09182024191422000000

You want to optimize your queries for cost and performance. How should you structure your data?

Partition table data by create_date, location_id and device_version
Partition table data by create_date, location_id and device_version
Partition table data by create_date cluster table data by tocation_id and device_version
Partition table data by create_date cluster table data by tocation_id and device_version
Cluster table data by create_date location_id and device_version
Cluster table data by create_date location_id and device_version
Cluster table data by create_date, partition by location and device_version
Cluster table data by create_date, partition by location and device_version
Suggested answer: C
asked 18/09/2024
Arno Rodenhuis
49 questions

Question 264

Report
Export
Collapse

You want to optimize your queries for cost and performance. How should you structure your data?

Partition table data by create_date, location_id and device_version
Partition table data by create_date, location_id and device_version
Partition table data by create_date cluster table data by location_Id and device_version
Partition table data by create_date cluster table data by location_Id and device_version
Cluster table data by create_date location_id and device_version
Cluster table data by create_date location_id and device_version
Cluster table data by create_date partition by locationed and device_version
Cluster table data by create_date partition by locationed and device_version
Suggested answer: B
asked 18/09/2024
David Aquino
41 questions

Question 265

Report
Export
Collapse

A live TV show asks viewers to cast votes using their mobile phones. The event generates a large volume of data during a 3 minute period. You are in charge of the Voting restructure* and must ensure that the platform can handle the load and Hal all votes are processed. You must display partial results write voting is open. After voting doses you need to count the votes exactly once white optimizing cost. What should you do?

Google Professional Data Engineer image Question 265 29861 09182024191422000000

Create a Memorystore instance with a high availability (HA) configuration
Create a Memorystore instance with a high availability (HA) configuration
Write votes to a Pub Sub tope and have Cloud Functions subscribe to it and write voles to BigQuery
Write votes to a Pub Sub tope and have Cloud Functions subscribe to it and write voles to BigQuery
Write votes to a Pub/Sub tope and toad into both Bigtable and BigQuery via a Dataflow pipeline Query Bigtable for real-time results and BigQuery for later analysis Shutdown the Bigtable instance when voting concludesD Create a Cloud SQL for PostgreSQL database with high availability (HA) configuration and multiple read replicas
Write votes to a Pub/Sub tope and toad into both Bigtable and BigQuery via a Dataflow pipeline Query Bigtable for real-time results and BigQuery for later analysis Shutdown the Bigtable instance when voting concludesD Create a Cloud SQL for PostgreSQL database with high availability (HA) configuration and multiple read replicas
Suggested answer: C
asked 18/09/2024
Mike de Roo
29 questions

Question 266

Report
Export
Collapse

You are updating the code for a subscriber to a Put/Sub feed. You are concerned that upon deployment the subscriber may erroneously acknowledge messages, leading to message loss. You subscriber is not set up to retain acknowledged messages. What should you do to ensure that you can recover from errors after deployment?

Use Cloud Build for your deployment if an error occurs after deployment, use a Seek operation to locate a tmestamp logged by Cloud Build at the start of the deployment
Use Cloud Build for your deployment if an error occurs after deployment, use a Seek operation to locate a tmestamp logged by Cloud Build at the start of the deployment
Create a Pub/Sub snapshot before deploying new subscriber code. Use a Seek operation to redeliver messages that became available after the snapshot was created
Create a Pub/Sub snapshot before deploying new subscriber code. Use a Seek operation to redeliver messages that became available after the snapshot was created
Set up the Pub/Sub emulator on your local machine Validate the behavior of your new subscriber togs before deploying it to production
Set up the Pub/Sub emulator on your local machine Validate the behavior of your new subscriber togs before deploying it to production
Enable dead-lettering on the Pub/Sub topic to capture messages that aren't successful acknowledged if an error occurs after deployment, re-deliver any messages captured by the deadletter queue
Enable dead-lettering on the Pub/Sub topic to capture messages that aren't successful acknowledged if an error occurs after deployment, re-deliver any messages captured by the deadletter queue
Suggested answer: B
asked 18/09/2024
frederic Morteau
31 questions

Question 267

Report
Export
Collapse

Government regulations in the banking industry mandate the protection of client's personally identifiable information (PII). Your company requires PII to be access controlled encrypted and compliant with major data protection standards In addition to using Cloud Data Loss Prevention (Cloud DIP) you want to follow Google-recommended practices and use service accounts to control access to PII. What should you do?

Assign the required identity and Access Management (IAM) roles to every employee, and create a single service account to access protect resources
Assign the required identity and Access Management (IAM) roles to every employee, and create a single service account to access protect resources
Use one service account to access a Cloud SQL database and use separate service accounts for each human user
Use one service account to access a Cloud SQL database and use separate service accounts for each human user
Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users
Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users
Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group
Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group
Suggested answer: D
asked 18/09/2024
Joseph McCray
32 questions

Question 268

Report
Export
Collapse

You are migrating a table to BigQuery and are deeding on the data model. Your table stores information related to purchases made across several store locations and includes information like the time of the transaction, items purchased, the store ID and the city and state in which the store is located You frequently query this table to see how many of each item were sold over the past 30 days and to look at purchasing trends by state city and individual store. You want to model this table to minimize query time and cost. What should you do?

Partition by transaction time; cluster by state first, then city then store ID
Partition by transaction time; cluster by state first, then city then store ID
Partition by transaction tome cluster by store ID first, then city, then stale
Partition by transaction tome cluster by store ID first, then city, then stale
Top-level cluster by stale first, then city then store
Top-level cluster by stale first, then city then store
Top-level cluster by store ID first, then city then state.
Top-level cluster by store ID first, then city then state.
Suggested answer: C
asked 18/09/2024
Alejandro Rodriguez
32 questions

Question 269

Report
Export
Collapse

You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. You also need to monitor and adjust for null values, which must remain real-valued and cannot be removed. What should you do?

Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataproc job.
Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataproc job.
Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.
Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.
Use Cloud Dataflow to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataprep job.
Use Cloud Dataflow to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataprep job.
Use Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.
Use Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.
Suggested answer: C

Explanation:


asked 18/09/2024
Nicoleta Moglan
37 questions

Question 270

Report
Export
Collapse

You have an Oracle database deployed in a VM as part of a Virtual Private Cloud (VPC) network. You want to replicate and continuously synchronize 50 tables to BigQuery. You want to minimize the need to manage infrastructure. What should you do?

Create a Datastream service from Oracle to BigQuery, use a private connectivity configuration to the same VPC network, and a connection profile to BigQuery.
Create a Datastream service from Oracle to BigQuery, use a private connectivity configuration to the same VPC network, and a connection profile to BigQuery.
Create a Pub/Sub subscription to write to BigQuery directly Deploy the Debezium Oracle connector to capture changes in the Oracle database, and sink to the Pub/Sub topic.
Create a Pub/Sub subscription to write to BigQuery directly Deploy the Debezium Oracle connector to capture changes in the Oracle database, and sink to the Pub/Sub topic.
Deploy Apache Kafka in the same VPC network, use Kafka Connect Oracle Change Data Capture (CDC), and Dataflow to stream the Kafka topic to BigQuery. D O Deploy Apache Kafka in the same VPC network, use Kafka Connect Oracle change data capture (CDC), and the Kafka Connect Google BigQuery Sink Connector.
Deploy Apache Kafka in the same VPC network, use Kafka Connect Oracle Change Data Capture (CDC), and Dataflow to stream the Kafka topic to BigQuery. D O Deploy Apache Kafka in the same VPC network, use Kafka Connect Oracle change data capture (CDC), and the Kafka Connect Google BigQuery Sink Connector.
Suggested answer: A

Explanation:

Datastream is a serverless, scalable, and reliable service that enables you to stream data changes from Oracle and MySQL databases to Google Cloud services such as BigQuery, Cloud SQL, Google Cloud Storage, and Cloud Pub/Sub. Datastream captures and streams database changes using change data capture (CDC) technology. Datastream supports private connectivity to the source and destination systems using VPC networks. Datastream also provides a connection profile to BigQuery, which simplifies the configuration and management of the data replication.Reference:

Datastream overview

Creating a Datastream stream

Using Datastream with BigQuery

asked 18/09/2024
Martijn Pollmann
36 questions
Total 377 questions
Go to page: of 38
Search

Related questions