ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 348 - Professional Data Engineer discussion

Report
Export

A web server sends click events to a Pub/Sub topic as messages. The web server includes an event Timestamp attribute in the messages, which is the time when the click occurred. You have a Dataflow streaming job that reads from this Pub/Sub topic through a subscription, applies some transformations, and writes the result to another Pub/Sub topic for use by the advertising department. The advertising department needs to receive each message within 30 seconds of the corresponding click occurrence, but they report receiving the messages late. Your Dataflow job's system lag is about 5 seconds, and the data freshness is about 40 seconds. Inspecting a few messages show no more than 1 second lag between their event Timestamp and publish Time. What is the problem and what should you do?

A.
The advertising department is causing delays when consuming the messages. Work with the advertising department to fix this.
Answers
A.
The advertising department is causing delays when consuming the messages. Work with the advertising department to fix this.
B.
Messages in your Dataflow job are processed in less than 30 seconds, but your job cannot keep up with the backlog in the Pub/Sub subscription. Optimize your job or increase the number of workers to fix this.
Answers
B.
Messages in your Dataflow job are processed in less than 30 seconds, but your job cannot keep up with the backlog in the Pub/Sub subscription. Optimize your job or increase the number of workers to fix this.
C.
The web server is not pushing messages fast enough to Pub/Sub. Work with the web server team to fix this.
Answers
C.
The web server is not pushing messages fast enough to Pub/Sub. Work with the web server team to fix this.
D.
Messages in your Dataflow job are taking more than 30 seconds to process. Optimize your job or increase the number of workers to fix this.
Answers
D.
Messages in your Dataflow job are taking more than 30 seconds to process. Optimize your job or increase the number of workers to fix this.
Suggested answer: B

Explanation:

To ensure that the advertising department receives messages within 30 seconds of the click occurrence, and given the current system lag and data freshness metrics, the issue likely lies in the processing capacity of the Dataflow job. Here's why option B is the best choice:

System Lag and Data Freshness:

The system lag of 5 seconds indicates that Dataflow itself is processing messages relatively quickly.

However, the data freshness of 40 seconds suggests a significant delay before processing begins, indicating a backlog.

Backlog in Pub/Sub Subscription:

A backlog occurs when the rate of incoming messages exceeds the rate at which the Dataflow job can process them, causing delays.

Optimizing the Dataflow Job:

To handle the incoming message rate, the Dataflow job needs to be optimized or scaled up by increasing the number of workers, ensuring it can keep up with the message inflow.

Steps to Implement:

Analyze the Dataflow Job:

Inspect the Dataflow job metrics to identify bottlenecks and inefficiencies.

Optimize Processing Logic:

Optimize the transformations and operations within the Dataflow pipeline to improve processing efficiency.

Increase Number of Workers:

Scale the Dataflow job by increasing the number of workers to handle the higher load, reducing the backlog.

Dataflow Monitoring

Scaling Dataflow Jobs

asked 18/09/2024
Benito Gonzalez
36 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first