Google Professional Data Engineer Practice Test - Questions Answers, Page 16
List of questions
Question 151

You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?
Question 152

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud.
Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?
Question 153

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud.
You want to support transactions that scale horizontally. You also want to optimize data for range queries on nonkey columns. What should you do?
Explanation:
Reference: https://cloud.google.com/solutions/data-lifecycle-cloud-platform
Question 154

Your financial services company is moving to cloud technology and wants to store 50 TB of financial timeseries data in the cloud. This data is updated frequently and new data will be streaming in all the time. Your company also wants to move their existing Apache Hadoop jobs to the cloud to get insights into this data.
Which product should they use to store the data?
Explanation:
Reference: https://cloud.google.com/bigtable/docs/schema-design-time-series
Question 155

An organization maintains a Google BigQuery dataset that contains tables with user-level dat
Explanation:
Reference: https://cloud.google.com/bigquery/docs/access-control
Question 156

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of dat
Question 157

Your neural network model is taking days to train. You want to increase the training speed. What can you do?
Explanation:
Reference: https://towardsdatascience.com/how-to-increase-the-accuracy-of-a-neural-network-9f5d1c6f407d
Question 158

You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster.
The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?
Question 159

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer dat a. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?
Question 160

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.
What should you do?
Question