[Avg. reading time: 5 minutes]
Batch - Streaming - Microbatch
Batch Processing
Batch means collect first, process later.
- Works on large chunks of accumulated data
- High throughput, cheaper, simpler
- Results are not real-time
- Typically minutes, hours, or days delayed
Examples:
- Daily or weekly sales reports
- End-of-day stock portfolio reconciliation
- Monthly billing cycles
- ETL pipelines that refresh a data warehouse
Use cases
- Immediate action is not required
- Delay is acceptable
- Working with large historical datasets
Stream Processing
Streaming means process events the moment they arrive.
- Low-latency (milliseconds to seconds)
- Continuous, event-by-event processing
- Ideal for real-time analytics and alerting
- Stateful systems maintain event history or running context
Examples:
- Stock price updates
- Fraud detection for credit cards
- Real-time gaming leaderboards
- IoT sensor monitoring
Use cases
- You need instant reactions
- Delays cause risk, loss, or bad UX
Micro Batch
Micro-batching = small batches processed very frequently.
- Latency: ~0.5 to a few seconds
- Not true real-time, but close
- Simpler than full streaming
- Common in systems like Spark Structured Streaming
batch pretending to be streaming
Examples
Fraud Detection (Streaming)
- Decision must be immediate
- Millisecond latency required
- Delay = financial loss
Payment Posting (Micro-Batch)
- Small delay is acceptable
- Updates can lag slightly
- No immediate risk
Monthly Statements (Batch)
- No urgency
- Process large volumes at once
- Cost-efficient
STREAMING > Event > Process > Output (ms latency)
MICRO-BATCH > Small windows > Process (seconds)
BATCH > Accumulate > Process (minutes+)