Real-Time Data Streaming: Why Batch Is No Longer Enough

For most of the history of data engineering, batch processing was the default. You collect data throughout the day, run a job at night, and wake up to yesterday’s insights. That model powered analytics for decades. It still works for a lot of things.

But the expectations of modern products have shifted. When a user abandons a cart, the personalization engine should know immediately - not eight hours later. When a fraud signal appears in a transaction, the decision system needs to act in milliseconds - not in the next batch run.

What real-time streaming actually means

Real-time streaming doesn’t mean instant. It means low-latency. Data flows continuously from source to destination - events are processed as they happen, not collected and processed later in bulk.

Batch processing is like reading yesterday’s newspaper. Streaming is like watching the news as it happens. Both are useful. The question is which one your business actually needs.

When streaming genuinely changes the business

Fraud detection and risk systems

Fraudulent transactions need to be caught before they complete, not in a report the next morning. Real-time streaming lets you build decision systems that evaluate every transaction against a model trained on live behavioral data.

Personalization at scale

The recommendation model I worked on at Lululemon gave us a 12% uplift in sales conversions. A meaningful part of that was freshness - the model could act on what a user had done in the current session, not just their historical behavior. That recency required streaming infrastructure. I wrote about why the data layer, not the model, is almost always the bottleneck in a companion piece.

The shape of a streaming consumer is not exotic — it’s a loop with careful failure handling:

# Pseudocode for a resilient Kafka-style consumer.
for event in consumer.poll(topic="cart_events", group_id="personalizer"):
    try:
        features = enrich(event)              # join with user profile, etc.
        score    = model.score(features)      # low-latency inference
        publish("personalization_scores", {"user": event.user_id, "score": score})
        consumer.commit(event.offset)         # only after successful publish
    except TransientError:
        continue                              # will be redelivered
    except PoisonPillError as e:
        dead_letter.send(event, reason=str(e))
        consumer.commit(event.offset)         # skip forward, don't block the stream

Notice what is not glamorous about this: commit placement, dead-letter queues, and what exactly “at-least-once” means for a downstream consumer that might act twice on the same event. That operational surface is the real cost of streaming.

When batch is still the right answer

Financial reporting where daily or monthly aggregates are the unit of analysis
Model training pipelines where freshness is measured in days, not seconds
Data warehouse loads where the downstream consumers only need daily snapshots
Small-scale products where the engineering overhead of streaming outweighs the business benefit

The architecture considerations nobody talks about enough

The hardest part of streaming isn’t the technology - it’s the operational model. Streaming pipelines require monitoring, backpressure handling, dead-letter queues for failed events, and careful thought about exactly-once versus at-least-once processing semantics.

Designing a data architecture or trying to figure out whether streaming is right for your use case? That’s exactly the kind of problem I work through in architecture sessions.

Book a Session

Real-Time Data Streaming: Why Batch Is No Longer Enough

What real-time streaming actually means

When streaming genuinely changes the business

Fraud detection and risk systems

Personalization at scale

When batch is still the right answer

The architecture considerations nobody talks about enough

Keep reading

The Intersection of Data Engineering and AI

How I'd Break Into Data Engineering in 2025 If I Were Starting Over

Filed under

Want to talk through this?