ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 199 - Professional Data Engineer discussion

Report
Export

You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this dat a. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.

A.
Denormalize the data as must as possible.
Answers
A.
Denormalize the data as must as possible.
B.
Preserve the structure of the data as much as possible.
Answers
B.
Preserve the structure of the data as much as possible.
C.
Use BigQuery UPDATE to further reduce the size of the dataset.
Answers
C.
Use BigQuery UPDATE to further reduce the size of the dataset.
D.
Develop a data pipeline where status updates are appended to BigQuery instead of updated.
Answers
D.
Develop a data pipeline where status updates are appended to BigQuery instead of updated.
E.
Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery's support for external data sources to query.
Answers
E.
Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery's support for external data sources to query.
Suggested answer: A, E
asked 18/09/2024
AHOPkos Varga
29 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first