ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 31

Question list
Search
Search

List of questions

Search

Related questions











You are designing a system that requires an ACID-compliant database. You must ensure that the system requires minimal human intervention in case of a failure. What should you do?

A.
Configure a Cloud SQL for MySQL instance with point-in-time recovery enabled.
A.
Configure a Cloud SQL for MySQL instance with point-in-time recovery enabled.
Answers
B.
Configure a Cloud SQL for PostgreSQL instance with high availability enabled.
B.
Configure a Cloud SQL for PostgreSQL instance with high availability enabled.
Answers
C.
Configure a Bigtable instance with more than one cluster.
C.
Configure a Bigtable instance with more than one cluster.
Answers
D.
Configure a BJgQuery table with a multi-region configuration.
D.
Configure a BJgQuery table with a multi-region configuration.
Answers
Suggested answer: B

Explanation:

The best option to meet the ACID compliance and minimal human intervention requirements is to configure a Cloud SQL for PostgreSQL instance with high availability enabled. Key reasons: Cloud SQL for PostgreSQL provides full ACID compliance, unlike Bigtable which provides only atomicity and consistency guarantees. Enabling high availability removes the need for manual failover as Cloud SQL will automatically failover to a standby replica if the leader instance goes down. Point-in-time recovery in MySQL requires manual intervention to restore data if needed. BigQuery does not provide transactional guarantees required for an ACID database. Therefore, a Cloud SQL for PostgreSQL instance with high availability meets the ACID and minimal intervention requirements best. The automatic failover will ensure availability and uptime without administrative effort.

You need to migrate a Redis database from an on-premises data center to a Memorystore for Redis instance. You want to follow Google-recommended practices and perform the migration for minimal cost. time, and effort. What should you do?

A.
Make a secondary instance of the Redis database on a Compute Engine instance, and then perform a live cutover.
A.
Make a secondary instance of the Redis database on a Compute Engine instance, and then perform a live cutover.
Answers
B.
Write a shell script to migrate the Redis data, and create a new Memorystore for Redis instance.
B.
Write a shell script to migrate the Redis data, and create a new Memorystore for Redis instance.
Answers
C.
Create a Dataflow job to road the Redis database from the on-premises data center. and write the data to a Memorystore for Redis instance
C.
Create a Dataflow job to road the Redis database from the on-premises data center. and write the data to a Memorystore for Redis instance
Answers
D.
Make an RDB backup of the Redis database, use the gsutil utility to copy the RDB file into a Cloud Storage bucket, and then import the RDB tile into the Memorystore for Redis instance.
D.
Make an RDB backup of the Redis database, use the gsutil utility to copy the RDB file into a Cloud Storage bucket, and then import the RDB tile into the Memorystore for Redis instance.
Answers
Suggested answer: D

Explanation:

The import and export feature uses the native RDB snapshot feature of Redis to import data into or export data out of a Memorystore for Redis instance. The use of the native RDB format prevents lock-in and makes it very easy to move data within Google Cloud or outside of Google Cloud. Import and export uses Cloud Storage buckets to store RDB files.

Reference: https://cloud.google.com/memorystore/docs/redis/import-export-overview

You want to create a machine learning model using BigQuery ML and create an endpoint foe hosting the model using Vertex Al. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?

A.
Create a new BigOuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the 'ingestion' dataset as the training data.
A.
Create a new BigOuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the 'ingestion' dataset as the training data.
Answers
B.
Use BigQuery streaming inserts to land the data from multiple vendors whore your BigQuery dataset ML model is deployed.
B.
Use BigQuery streaming inserts to land the data from multiple vendors whore your BigQuery dataset ML model is deployed.
Answers
C.
Create a Pub'Sub topic and send all vendor data to it Connect a Cloud Function to the topic to process the data and store it in BigQuery.
C.
Create a Pub'Sub topic and send all vendor data to it Connect a Cloud Function to the topic to process the data and store it in BigQuery.
Answers
D.
Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
D.
Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
Answers
Suggested answer: D

Explanation:

Dataflow provides a scalable and flexible way to process and clean the incoming data in real-time before loading it into BigQuery.

You have a data processing application that runs on Google Kubernetes Engine (GKE). Containers need to be launched with their latest available configurations from a container registry. Your GKE nodes need to have GPUs. local SSDs, and 8 Gbps bandwidth. You want to efficiently provision the data processing infrastructure and manage the deployment process. What should you do?

A.
Use Compute Engi.no startup scriots to pull container Images, and use gloud commands to provision the infrastructure.
A.
Use Compute Engi.no startup scriots to pull container Images, and use gloud commands to provision the infrastructure.
Answers
B.
Use GKE to autoscale containers, and use gloud commands to provision the infrastructure.
B.
Use GKE to autoscale containers, and use gloud commands to provision the infrastructure.
Answers
C.
Use Cloud Build to schedule a job using Terraform build to provision the infrastructure and launch with the most current container images.
C.
Use Cloud Build to schedule a job using Terraform build to provision the infrastructure and launch with the most current container images.
Answers
D.
Use Dataflow to provision the data pipeline, and use Cloud Scheduler to run the job.
D.
Use Dataflow to provision the data pipeline, and use Cloud Scheduler to run the job.
Answers
Suggested answer: C

Explanation:

https://cloud.google.com/architecture/managing-infrastructure-as-code

You issue a new batch job to Dataflow. The job starts successfully, processes a few elements, and then suddenly fails and shuts down. You navigate to the Dataflow monitoring interface where you find errors related to a particular DoFn in your pipeline. What is the most likely cause of the errors?

A.
Exceptions in worker code
A.
Exceptions in worker code
Answers
B.
Job validation
B.
Job validation
Answers
C.
Graph or pipeline construction
C.
Graph or pipeline construction
Answers
D.
Insufficient permissions
D.
Insufficient permissions
Answers
Suggested answer: A

Explanation:

https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline#detect_an_exception_in_worker_code While your job is running, you might encounter errors or exceptions in your worker code. These errors generally mean that the DoFns in your pipeline code have generated unhandled exceptions, which result in failed tasks in your Dataflow job. Exceptions in user code (for example, your DoFn instances) are reported in the Dataflow monitoring interface.

You are developing a new deep teaming model that predicts a customer's likelihood to buy on your ecommerce site. Alter running an evaluation of the model against both the original training data and new test data, you find that your model is overfitting the data. You want to improve the accuracy of the model when predicting new data. What should you do?

A.
Increase the size of the training dataset, and increase the number of input features.
A.
Increase the size of the training dataset, and increase the number of input features.
Answers
B.
Increase the size of the training dataset, and decrease the number of input features.
B.
Increase the size of the training dataset, and decrease the number of input features.
Answers
C.
Reduce the size of the training dataset, and increase the number of input features.
C.
Reduce the size of the training dataset, and increase the number of input features.
Answers
D.
Reduce the size of the training dataset, and decrease the number of input features.
D.
Reduce the size of the training dataset, and decrease the number of input features.
Answers
Suggested answer: B

Explanation:

https://machinelearningmastery.com/impact-of-dataset-size-on-deep-learning-model-skill-and-performance-estimates/

You are implementing workflow pipeline scheduling using open source-based tools and Google Kubernetes Engine (GKE). You want to use a Google managed service to simplify and automate the task. You also want to accommodate Shared VPC networking considerations. What should you do?

A.
Use Dataflow for your workflow pipelines. Use Cloud Run triggers for scheduling.
A.
Use Dataflow for your workflow pipelines. Use Cloud Run triggers for scheduling.
Answers
B.
Use Dataflow for your workflow pipelines. Use shell scripts to schedule workflows.
B.
Use Dataflow for your workflow pipelines. Use shell scripts to schedule workflows.
Answers
C.
Use Cloud Composer in a Shared VPC configuration. Place the Cloud Composer resources in the host project.
C.
Use Cloud Composer in a Shared VPC configuration. Place the Cloud Composer resources in the host project.
Answers
D.
Use Cloud Composer in a Shared VPC configuration. Place the Cloud Composer resources in the service project.
D.
Use Cloud Composer in a Shared VPC configuration. Place the Cloud Composer resources in the service project.
Answers
Suggested answer: D

Explanation:

Shared VPC requires that you designate a host project to which networks and subnetworks belong and a service project, which is attached to the host project. When Cloud Composer participates in a Shared VPC, the Cloud Composer environment is in the service project.

Reference: https://cloud.google.com/composer/docs/how-to/managing/configuring-shared-vpc

You are implementing a chatbot to help an online retailer streamline their customer service. The chatbot must be able to respond to both text and voice inquiries. You are looking for a low-code or no-code option, and you want to be able to easily train the chatbot to provide answers to keywords. What should you do?

A.
Use the Speech-to-Text API to build a Python application in App Engine.
A.
Use the Speech-to-Text API to build a Python application in App Engine.
Answers
B.
Use the Speech-to-Text API to build a Python application in a Compute Engine instance.
B.
Use the Speech-to-Text API to build a Python application in a Compute Engine instance.
Answers
C.
Use Dialogflow for simple queries and the Speech-to-Text API for complex queries.
C.
Use Dialogflow for simple queries and the Speech-to-Text API for complex queries.
Answers
D.
Use Dialogflow to implement the chatbot. defining the intents based on the most common queries collected.
D.
Use Dialogflow to implement the chatbot. defining the intents based on the most common queries collected.
Answers
Suggested answer: D

Explanation:

Dialogflow is a conversational AI platform that allows for easy implementation of chatbots without needing to code. It has built-in integration for both text and voice input via APIs like Cloud Speech-to-Text. Defining intents and entity types allows you to map common queries and keywords to responses. This would provide a low/no-code way to quickly build and iteratively improve the chatbot capabilities.

https://cloud.google.com/dialogflow/docs Dialogflow is a natural language understanding platform that makes it easy to design and integrate a conversational user interface into your mobile app, web application, device, bot, interactive voice response system, and so on. Using Dialogflow, you can provide new and engaging ways for users to interact with your product. Dialogflow can analyze multiple types of input from your customers, including text or audio inputs (like from a phone or voice recording). It can also respond to your customers in a couple of ways, either through text or with synthetic speech.

You are loading CSV files from Cloud Storage to BigQuery. The files have known data quality issues, including mismatched data types, such as STRINGS and INT64s in the same column, and inconsistent formatting of values such as phone numbers or addresses. You need to create the data pipeline to maintain data quality and perform the required cleansing and transformation. What should you do?

A.
Use Data Fusion to transform the data before loading it into BigQuery.
A.
Use Data Fusion to transform the data before loading it into BigQuery.
Answers
B.
Load the CSV files into a staging table with the desired schema, perform the transformations with SQL. and then write the results to the final destination table.
B.
Load the CSV files into a staging table with the desired schema, perform the transformations with SQL. and then write the results to the final destination table.
Answers
C.
Create a table with the desired schema, toad the CSV files into the table, and perform the transformations in place using SQL.
C.
Create a table with the desired schema, toad the CSV files into the table, and perform the transformations in place using SQL.
Answers
D.
Use Data Fusion to convert the CSV files lo a self-describing data formal, such as AVRO. before loading the data to BigOuery.
D.
Use Data Fusion to convert the CSV files lo a self-describing data formal, such as AVRO. before loading the data to BigOuery.
Answers
Suggested answer: A

Explanation:

Data Fusion's advantages:

Visual interface: Offers a user-friendly interface for designing data pipelines without extensive coding, making it accessible to a wider range of users.

Built-in transformations: Includes a wide range of pre-built transformations to handle common data quality issues, such as:

Data type conversions

Data cleansing (e.g., removing invalid characters, correcting formatting)

Data validation (e.g., checking for missing values, enforcing constraints)

Data enrichment (e.g., adding derived fields, joining with other datasets)

Custom transformations: Allows for custom transformations using SQL or Java code for more complex cleaning tasks.

Scalability: Can handle large datasets efficiently, making it suitable for processing CSV files with potential data quality issues.

Integration with BigQuery: Integrates seamlessly with BigQuery, allowing for direct loading of transformed data.

You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. You also need to monitor and adjust for null values, which must remain real-valued and cannot be removed. What should you do?

A.
Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataproc job.
A.
Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataproc job.
Answers
B.
Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.
B.
Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.
Answers
C.
Use Cloud Dataflow to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataprep job.
C.
Use Cloud Dataflow to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataprep job.
Answers
D.
Use Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.
D.
Use Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.
Answers
Suggested answer: C
Total 372 questions
Go to page: of 38