bmap4j - Batch Management And Processing For Java

xinventa logo

The bmap4j Framework - Parallel Processing

de

Parallel Processing with bmap4j

When you have large amounts of data to be processed in a limited timeframe, it can be helpful to have a WorkManager divide the tasks and distribute it to different workers, who can perform the work in parallel. This is where the WorkManager-Worker pattern comes in.

The WorkManager-Worker pattern determines the terms batch, record, slice and lot, which will be described further down.

Batch

A batch is the number of 1..R all records (also known as simple business case, see Fundametals ) which have to be processed by a job. Oftentimes the size of a batch is too large to allow its processing to be done in one transaction. At this point, alternative strategies have to be applied.

Slice

A possible strategy is to break the batch into 1..S Slices, meaning partial stacks. One slice then contains 1..M records. The sum of those records equals R again.

A slice is usually processed in one transaction, so you do not need to start one transaction for every record, which, depending on processing time of each record, increases the efficiency.

While we try to keep the number of records for efficiency reasons as high as possible, we have to be careful, that the processing time does not exceed the transaction-timeout of the transaction manager.

A slice can contain interdependent records, as records contained in one slice are processed sequentially. It is however still important to keep the sequence of processing in order.

However, slices have to be independent from each other, because slices are processed in parallel and JMS does not provide a guarantee for a certain processing order.

Lot

Slices have to be sent to the messaging system first in order for the workers (MDB's) to process the slices. This process is advantageously performed via transaction control. How many slices are eventually written in one transaction will be determined by the lot.

The lot contains 1..N Slices which are transferred to the messaging system in a transaction.

The number of lots per batch is L. The sum of slices across all lots is S.

If a batch contains exactly one lot, it will be implemented using the all-or-nothing strategy as the workers don't start until all slices are placed in the JMS queue and the transaction is committed.

All-Slices Lot Strategie

If the worker of a batch program is idempotent, an automatic restart can be achieved even with more than one lot. Oftentimes in this case, one lot gets exactly one slice.

One-Slice Lot Strategie

For efficiency reasons, a lot can contain several slices, if a large number of slices have to be created.

N-Slices Lot Strategie

In practice, one will try to send all the slices in one lot to the messaging system, as this will allow for a easy restart of the job. Only if this is not possible, one of the other strategies is to be implemented.