ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 284 - Professional Data Engineer discussion

Report
Export

You are designing a real-time system for a ride hailing app that identifies areas with high demand for rides to effectively reroute available drivers to meet the demand. The system ingests data from multiple sources to Pub/Sub. processes the data, and stores the results for visualization and analysis in real-time dashboards. The data sources include driver location updates every 5 seconds and app-based booking events from riders. The data processing involves real-time aggregation of supply and demand data for the last 30 seconds, every 2 seconds, and storing the results in a low-latency system for visualization. What should you do?

A.
Group the data by using a tumbling window in a Dataflow pipeline, and write the aggregated data to Memorystore
Answers
A.
Group the data by using a tumbling window in a Dataflow pipeline, and write the aggregated data to Memorystore
B.
Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to Memorystore
Answers
B.
Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to Memorystore
C.
Group the data by using a session window in a Dataflow pipeline, and write the aggregated data to BigQuery.
Answers
C.
Group the data by using a session window in a Dataflow pipeline, and write the aggregated data to BigQuery.
D.
Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to BigQuery.
Answers
D.
Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to BigQuery.
Suggested answer: B

Explanation:

A hopping window is a type of sliding window that advances by a fixed period of time, producing overlapping windows. This is suitable for the scenario where the system needs to aggregate data for the last 30 seconds, every 2 seconds, and provide real-time updates. A Dataflow pipeline can implement the hopping window logic using Apache Beam, and process both streaming and batch data sources. Memorystore is a low-latency, in-memory data store that can serve the aggregated data to the visualization layer. BigQuery is not a good choice for this scenario, as it is not optimized for low-latency queries and frequent updates.

asked 18/09/2024
Angel Molina
41 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first