ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 340 - Professional Data Engineer discussion

Report
Export

You work for a farming company. You have one BigQuery table named sensors, which is about 500 MB and contains the list of your 5000 sensors, with columns for id, name, and location. This table is updated every hour. Each sensor generates one metric every 30 seconds along with a timestamp. which you want to store in BigQuery. You want to run an analytical query on the data once a week for monitoring purposes. You also want to minimize costs. What data model should you use?

A.
1. Create a retries column in the sensorstable. 2. Set record type and repeated mode for the metrics column. 3. Use an UPDATE statement every 30 seconds to add new metrics.
Answers
A.
1. Create a retries column in the sensorstable. 2. Set record type and repeated mode for the metrics column. 3. Use an UPDATE statement every 30 seconds to add new metrics.
B.
1. Create a metrics column in the sensors table. 2. Set RECORD type and REPEATED mode for the metrics column. 3. Use an INSERT statement every 30 seconds to add new metrics.
Answers
B.
1. Create a metrics column in the sensors table. 2. Set RECORD type and REPEATED mode for the metrics column. 3. Use an INSERT statement every 30 seconds to add new metrics.
C.
1. Create a metrics table partitioned by timestamp. 2. Create a sensorld column in the metrics table, that points to the id column in the sensors table. 3. Use an IHSEW statement every 30 seconds to append new metrics to the metrics table. 4. Join the two tables, if needed, when running the analytical query.
Answers
C.
1. Create a metrics table partitioned by timestamp. 2. Create a sensorld column in the metrics table, that points to the id column in the sensors table. 3. Use an IHSEW statement every 30 seconds to append new metrics to the metrics table. 4. Join the two tables, if needed, when running the analytical query.
D.
1. Create a metrics table partitioned by timestamp. 2. Create a sensor Id column in the metrics table, that points to the _d column in the sensors table. 3. Use an UPDATE statement every 30 seconds to append new metrics to the metrics table. 4. Join the two tables, if needed, when running the analytical query.
Answers
D.
1. Create a metrics table partitioned by timestamp. 2. Create a sensor Id column in the metrics table, that points to the _d column in the sensors table. 3. Use an UPDATE statement every 30 seconds to append new metrics to the metrics table. 4. Join the two tables, if needed, when running the analytical query.
Suggested answer: C

Explanation:

For a farming company with a sensor data table updated every 30 seconds, the goal is to minimize costs while facilitating weekly analytical queries. The best data model will effectively manage data storage, update frequency, and query performance.

Partitioned Metrics Table:

Creating a metrics table partitioned by timestamp optimizes query performance and storage costs.

Partitioning by timestamp allows for efficient querying, especially for time-based analyses.

Sensor ID

Reference:

Including a sensor_id column in the metrics table that points to the id column in the sensors table ensures data normalization.

This structure avoids redundancy and maintains a clear relationship between sensors and their metrics.

Using INSERT Statements:

Using INSERT statements to append new metrics every 30 seconds is efficient and cost-effective.

INSERT operations are more suitable than UPDATE operations for adding new data entries, especially at high frequencies.

Joining Tables for Analysis:

When running analytical queries, joining the partitioned metrics table with the sensors table as needed provides a comprehensive view of the data.

This approach leverages BigQuery's powerful JOIN capabilities while keeping the data model normalized and efficient.

Google Data Engineer Reference:

BigQuery Partitioned Tables

BigQuery Best Practices

Efficient Data Partitioning

BigQuery Data Modeling

Using this data model, the farming company can manage its sensor data effectively, minimize costs, and perform weekly analytical queries with high efficiency.

asked 18/09/2024
Justin Kim
37 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first