Batch processing is an umbrella term that is used to encompass the various big data processing approaches based on MapReduce
Batch Processing is a framework that processes as much data as possible at a regular cadence in the most efficient manner possible

Lambda Architecture

When using ETL that uses MapReduce to process large amounts of data the reports that are generated are several hours old
Because of this it is difficult to get real time information for some of the parameters even with same loss of accuracy

Cold Path (Historical data)

Store all the data in its raw form and processes them in a batch. The result of this processing is called a batch view
The batch layer stores the data in a serving layer where the data can be queried efficiently

Hot Path/ Speed Layer (Real-time data)

This path processes the data in real time. It is designed for low latency fast processing at the cost of accuracy
This data is stored as real-time views. The speed layer updates the serving layer by incrementally updating the data
It must support random writes of data

NOTE

  • A drawback of Lambda Architecture is there is duplication of the processing logic. This increases the maintenance required for more complex applications
  • Apache Spark is commonly used and recommended processing engine in the Lambda Architecture

Cruise Example
Cold Path: Load the data into Azure when in land (docked)
Hot Path: Get information about the local “delta” transactions that have taken place

Kappa Architecture

In Kappa there is only one processing block which is the speed layer that processes data using real-time analysis
The long term storage data/ batch processing is also done using the speed layer

Kappa is an ideal approach when real-time processing is the priority
Kappa requires the all the data stream and metric to be incremental in nature