ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 316 - Professional Data Engineer discussion

Report
Export

You are running a streaming pipeline with Dataflow and are using hopping windows to group the data as the data arrives. You noticed that some data is arriving late but is not being marked as late data, which is resulting in inaccurate aggregations downstream. You need to find a solution that allows you to capture the late data in the appropriate window. What should you do?

A.
Change your windowing function to session windows to define your windows based on certain activity.
Answers
A.
Change your windowing function to session windows to define your windows based on certain activity.
B.
Change your windowing function to tumbling windows to avoid overlapping window periods.
Answers
B.
Change your windowing function to tumbling windows to avoid overlapping window periods.
C.
Expand your hopping window so that the late data has more time to arrive within the grouping.
Answers
C.
Expand your hopping window so that the late data has more time to arrive within the grouping.
D.
Use watermarks to define the expected data arrival window Allow late data as it arrives.
Answers
D.
Use watermarks to define the expected data arrival window Allow late data as it arrives.
Suggested answer: D

Explanation:

Watermarks are a way of tracking the progress of time in a streaming pipeline. They are used to determine when a window can be closed and the results emitted. Watermarks can be either event-time based or processing-time based. Event-time watermarks track the progress of time based on the timestamps of the data elements, while processing-time watermarks track the progress of time based on the system clock. Event-time watermarks are more accurate, but they require the data source to provide reliable timestamps. Processing-time watermarks are simpler, but they can be affected by system delays or backlogs.

By using watermarks, you can define the expected data arrival window for each windowing function. You can also specify how to handle late data, which is data that arrives after the watermark has passed. You can either discard late data, or allow late data and update the results as new data arrives. Allowing late data requires you to use triggers to control when the results are emitted.

In this case, using watermarks and allowing late data is the best solution to capture the late data in the appropriate window. Changing the windowing function to session windows or tumbling windows will not solve the problem of late data, as they still rely on watermarks to determine when to close the windows. Expanding the hopping window might reduce the amount of late data, but it will also change the semantics of the windowing function and the results.

Streaming pipelines | Cloud Dataflow | Google Cloud

Windowing | Apache Beam

asked 18/09/2024
mohammed zakir
38 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first