ExamGecko
Home Home / Google / Professional Data Engineer

Google Professional Data Engineer Practice Test - Questions Answers, Page 14

Question list
Search
Search

List of questions

Search

Related questions











What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?

A.
create a third instance and sync the data from the two storage types via batch jobs
A.
create a third instance and sync the data from the two storage types via batch jobs
Answers
B.
export the data from the existing instance and import the data into a new instance
B.
export the data from the existing instance and import the data into a new instance
Answers
C.
run parallel instances where one is HDD and the other is SDD
C.
run parallel instances where one is HDD and the other is SDD
Answers
D.
the selection is final and you must resume using the same storage type
D.
the selection is final and you must resume using the same storage type
Answers
Suggested answer: B

Explanation:

When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage for the cluster is permanent. You cannot use the Google Cloud Platform Console to change the type of storage that is used for the cluster.

If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data from the existing instance and import the data into a new instance. Alternatively, you can write a Cloud Dataflow or Hadoop MapReduce job that copies the data from one instance to another.

Reference: https://cloud.google.com/bigtable/docs/choosing-ssd-hddñ

Topic 6, Main Questions Set C

You are training a spam classifier. You notice that you are overfitting the training dat a. Which three actions can you take to resolve this problem? (Choose three.)

A.
Get more training examples
A.
Get more training examples
Answers
B.
Reduce the number of training examples
B.
Reduce the number of training examples
Answers
C.
Use a smaller set of features
C.
Use a smaller set of features
Answers
D.
Use a larger set of features
D.
Use a larger set of features
Answers
E.
Increase the regularization parameters
E.
Increase the regularization parameters
Answers
F.
Decrease the regularization parameters
F.
Decrease the regularization parameters
Answers
Suggested answer: A, D, F

You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from

Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery.

How should you securely run this workload?

A.
Restrict the Google Cloud Storage bucket so only you can see the files
A.
Restrict the Google Cloud Storage bucket so only you can see the files
Answers
B.
Grant the Project Owner role to a service account, and run the job with it
B.
Grant the Project Owner role to a service account, and run the job with it
Answers
C.
Use a service account with the ability to read the batch files and to write to BigQuery
C.
Use a service account with the ability to read the batch files and to write to BigQuery
Answers
D.
Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery
D.
Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery
Answers
Suggested answer: B

You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query:

SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country You check the query plan for the query and see the following output in the Read section of Stage:1:

What is the most likely cause of the delay for this query?

A.
Users are running too many concurrent queries in the system
A.
Users are running too many concurrent queries in the system
Answers
B.
The [myproject:mydataset.mytable] table has too many partitions
B.
The [myproject:mydataset.mytable] table has too many partitions
Answers
C.
Either the state or the city columns in the [myproject:mydataset.mytable] table have too many NULL values
C.
Either the state or the city columns in the [myproject:mydataset.mytable] table have too many NULL values
Answers
D.
Most rows in the [myproject:mydataset.mytable] table have the same value in the country column, causing data skew
D.
Most rows in the [myproject:mydataset.mytable] table have the same value in the country column, causing data skew
Answers
Suggested answer: A

Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do?

A.
Create a file on a shared file and have the application servers write all bid events to that file.Process the file with Apache Hadoop to identify which user bid first.
A.
Create a file on a shared file and have the application servers write all bid events to that file.Process the file with Apache Hadoop to identify which user bid first.
Answers
B.
Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.
B.
Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.
Answers
C.
Set up a MySQL database for each application server to write bid events into. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information.
C.
Set up a MySQL database for each application server to write bid events into. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information.
Answers
D.
Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.
D.
Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.
Answers
Suggested answer: C

Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_partitioned. To reduce the cost of queries, your organization created a view called events, which queries only the last 14 days of dat a. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read the events data via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.)

A.
Create a new view over events using standard SQL
A.
Create a new view over events using standard SQL
Answers
B.
Create a new partitioned table using a standard SQL query
B.
Create a new partitioned table using a standard SQL query
Answers
C.
Create a new view over events_partitioned using standard SQL
C.
Create a new view over events_partitioned using standard SQL
Answers
D.
Create a service account for the ODBC connection to use for authentication
D.
Create a service account for the ODBC connection to use for authentication
Answers
E.
Create a Google Cloud Identity and Access Management (Cloud IAM) role for the ODBC connection and shared "events"
E.
Create a Google Cloud Identity and Access Management (Cloud IAM) role for the ODBC connection and shared "events"
Answers
Suggested answer: A, E

You have enabled the free integration between Firebase Analytics and Google BigQuery. Firebase now automatically creates a new table daily in BigQuery in the format app_events_YYYYMMDD. You want to query all of the tables for the past 30 days in legacy SQL. What should you do?

A.
Use the TABLE_DATE_RANGE function
A.
Use the TABLE_DATE_RANGE function
Answers
B.
Use the WHERE_PARTITIONTIME pseudo column
B.
Use the WHERE_PARTITIONTIME pseudo column
Answers
C.
Use WHERE date BETWEEN YYYY-MM-DD AND YYYY-MM-DD
C.
Use WHERE date BETWEEN YYYY-MM-DD AND YYYY-MM-DD
Answers
D.
Use SELECT IF.(date >= YYYY-MM-DD AND date <= YYYY-MM-DD
D.
Use SELECT IF.(date >= YYYY-MM-DD AND date <= YYYY-MM-DD
Answers
Suggested answer: A

Explanation:

Reference: https://cloud.google.com/blog/products/gcp/using-bigquery-and-firebase-analytics-tounderstandyour-mobile-app?hl=am

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

A.
They have not assigned the timestamp, which causes the job to fail
A.
They have not assigned the timestamp, which causes the job to fail
Answers
B.
They have not set the triggers to accommodate the data coming in late, which causes the job to fail
B.
They have not set the triggers to accommodate the data coming in late, which causes the job to fail
Answers
C.
They have not applied a global windowing function, which causes the job to fail when the pipeline is created
C.
They have not applied a global windowing function, which causes the job to fail when the pipeline is created
Answers
D.
They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created
D.
They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created
Answers
Suggested answer: C

You architect a system to analyze seismic dat a. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?

A.
Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.
A.
Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.
Answers
B.
Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.
B.
Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.
Answers
C.
Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.
C.
Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.
Answers
D.
Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.
D.
Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.
Answers
Suggested answer: A

An online retailer has built their current application on Google App Engine. A new initiative at the company mandates that they extend their application to allow their customers to transact directly via the application.

They need to manage their shopping transactions and analyze combined data from multiple datasets using a business intelligence (BI) tool. They want to use only a single database for this purpose.

Which Google Cloud database should they choose?

A.
BigQuery
A.
BigQuery
Answers
B.
Cloud SQL
B.
Cloud SQL
Answers
C.
Cloud BigTable
C.
Cloud BigTable
Answers
D.
Cloud Datastore
D.
Cloud Datastore
Answers
Suggested answer: C

Explanation:

Reference: https://cloud.google.com/solutions/business-intelligence/

Total 372 questions
Go to page: of 38