Google Professional Data Engineer Practice Test – Member Shared Questions, Page 15

Question 141

You launched a new gaming app almost three years ago. You have been uploading log files from the previous day to a separate Google BigQuery table with the table name format LOGS_yyyymmdd. You have been using table wildcard functions to generate daily and monthly reports for all time ranges.

Recently, you discovered that some queries that cover long date ranges are exceeding the limit of 1,000 tables and failing. How can you resolve this issue?

A.

Convert all daily log tables into date-partitioned tables

B.

Convert the sharded tables into a single partitioned table

C.

Enable query caching so you can cache data from previous months

D.

Create separate views to cover each month, and query from these views

Show Answer Comment (0)

Question 142

Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google

BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?

A.

Migrate the workload to Google Cloud Dataflow

B.

Use pre-emptible virtual machines (VMs) for the cluster

C.

Use a higher-memory node so that the job runs faster

D.

Use SSDs on the worker nodes so that the job can run faster

Show Answer Comment (0)

Question 143

Your company receives both batch- and stream-based event dat a. You want to process the data using Google Cloud Dataflow over a predictable time period.

However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

A.

Set a single global window to capture all the data.

B.

Set sliding windows to capture all the lagged data.

C.

Use watermarks and timestamps to capture the lagged data.

D.

Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

Show Answer Comment (0)

Question 144

You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accurately using a linear algorithm.

Google Professional Data Engineer image Question 144 29740 09182024191422000000

To do this you need to add a synthetic feature. What should the value of that feature be?

A.

X^2+Y^2

B.

X^2

C.

Y^2

D.

cos(X)

Show Answer Comment (0)

Question 145

You are integrating one of your internal IT applications and Google BigQuery, so users can query BigQuery from the application's interface. You do not want individual users to authenticate to BigQuery and you do not want to give them access to the dataset. You need to securely access BigQuery from your IT application.

What should you do?

A.

Create groups for your users and give those groups access to the dataset

B.

Integrate with a single sign-on (SSO) platform, and pass each user's credentials along with the query request

C.

Create a service account and grant dataset access to that account. Use the service account's private key to access the dataset

D.

Create a dummy user and grant dataset access to that user. Store the username and password for that user in a file on the files system, and use those credentials to access the BigQuery dataset

Show Answer Comment (0)

Question 146

You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed.

What should you do?

A.

Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls.

B.

Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.

C.

Create encryption keys locally. Upload your encryption keys to Cloud Key Management Service.Use those keys to encrypt your data in all of the Compute Engine cluster instances.

D.

Create encryption keys in Cloud Key Management Service. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.

Show Answer Comment (0)

Question 147

You are developing an application that uses a recommendation engine on Google Cloud. Your solution should display new videos to customers based on past views. Your solution needs to generate labels for the entities in videos that the customer has viewed. Your design must be able to provide very fast filtering suggestions based on data from other customer preferences on several TB of dat a. What should you do?

A.

Build and train a complex classification model with Spark MLlib to generate labels and filter the results.Deploy the models using Cloud Dataproc. Call the model from your application.

B.

Build and train a classification model with Spark MLlib to generate labels. Build and train a second classification model with Spark MLlib to filter results to match customer preferences. Deploy the models using Cloud Dataproc. Call the models from your application.

C.

Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud Bigtable, and filter the predicted labels to match the user's viewing history to generate preferences.

D.

Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud SQL, and join and filter the predicted labels to match the user's viewing history to generate preferences.

Show Answer Comment (0)

Question 148

You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?

A.

Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster. Resize the number of worker nodes in your cluster via the command line.

B.

Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources.

C.

Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver. Use the default autoscaling setting for worker instances.

D.

Use Cloud Dataflow to run your transformations. Monitor the total execution time for a sampling of jobs. Configure the job to use non-default Compute Engine machine types when needed.

Show Answer Comment (0)

Question 149

Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-todate YouTube channels log dat a. How should you set up the log data transfer into Google Cloud?

A.

Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.

B.

Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination.

C.

Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi- Regional storage bucket as a final destination.

D.

Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Regional storage bucket as a final destination.

Show Answer Comment (0)

Question 150

You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

A.

Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query.

B.

Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query.

C.

Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and query.

D.

Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then import into Cloud Bigtable for query.

Show Answer Comment (0)

Google Professional Data Engineer Practice Test - Questions Answers, Page 15

List of questions

Question 141

Question 142

Question 143

Question 144

Question 145

Question 146

Question 147

Question 148

Question 149

Question 150

Related questions

Google Professional Data Engineer Practice Test - Questions Answers, Page 15

List of questions

Question 141

Question 142

Question 143

Question 144

Question 145

Question 146

Question 147

Question 148

Question 149

Question 150

Question

Case Study

Related questions

Export

Practice Tests