Most important concept

  • Tumbling/Fixed window – https://youtu.be/0sUJA3tDFiY?si=fC_p2SpFoDZbhKyk
  • Hopping/sliding window – https://youtu.be/0sUJA3tDFiY?si=fC_p2SpFoDZbhKyk
  • Session window – https://youtu.be/vqiHXdrn3sU
  • Global Window – https://youtu.be/vqiHXdrn3sU (Batch Data)

In the context of data stream processing and windowing, these terms—tumbling windows, hopping windows, session windows, and global windows—describe different ways to group and analyze continuous streams of data. Here’s a breakdown of each:

1. Tumbling/Fixed Windows – official link

  • Definition: Tumbling windows are fixed-size, non-overlapping, contiguous time intervals.
  • Example: If you define a tumbling window of 5 minutes, the stream will be divided into consecutive 5-minute intervals. Each event belongs to exactly one window.
  • Usage: Useful for aggregations where you want clear and distinct time-based boundaries, such as calculating the total sales every hour.

2. Hopping/sliding Windows- official link

  • Definition: Hopping windows are fixed-size, but they can overlap. They are defined by a window size and a slide interval.
  • Example: If you have a window size of 5 minutes and a slide interval of 2 minutes, a new window starts every 2 minutes and overlaps the previous ones.
  • Usage: Useful for scenarios where you need frequent updates or overlaps, such as moving averages or trend analysis.

3. Session Windows- official link

  • Definition: Session windows group events that occur in proximity to each other. They have a gap duration parameter, which defines the maximum allowable time between events for them to be included in the same session.
  • Example: If the gap duration is set to 10 minutes, any two events that occur within 10 minutes of each other will be grouped into the same session. If there’s a gap longer than 10 minutes, a new session starts.
  • Usage: Useful for tracking user activity sessions or any natural grouping of events that occur close together in time.

4. Global Window- official link

  • Definition: A global window encompasses the entire data stream without any boundaries, effectively treating the stream as a single, unbounded window.
  • Example: There are no subdivisions or intervals; all data is considered together.
  • Usage: Useful for aggregations over the entire data stream, such as computing overall statistics or when the stream processing application requires a complete view of all data.

Comparison Summary

  • Tumbling Windows: Non-overlapping, fixed-size intervals (e.g., hourly sales reports).
  • Hopping Windows: Overlapping, fixed-size intervals with a slide (e.g., 5-minute average updated every 2 minutes).
  • Session Windows: Variable size based on event proximity with gaps (e.g., user activity sessions).
  • Global Window: No boundaries, single window for all data (e.g., total number of events processed).

These different windowing strategies are chosen based on the nature of the data and the specific analysis or processing requirements.

Comparison in different windowing functions

official documentation link : https://cloud.google.com/dataflow/docs/concepts/streaming-pipelines

Leave a Reply

Your email address will not be published. Required fields are marked *