-
Notifications
You must be signed in to change notification settings - Fork 199
readers
To read records from a data source, you should register an implementation of the RecordReader
interface:
Job job = new JobBuilder()
.reader(new MyRecordReader(myDataSource))
.build();
There are several built-in record readers to read data from a variety of sources:
- flat files (delimited and fixed length)
- xml, json and yaml files
- MS Excel files
- in-memory strings
- databases
- JMS queues
- BlockingQueue and Iterable objects
- Java 8 streams
- and standard input
Here is a table of built-in readers and how to use them:
Data source | Reader | Record type | Module |
---|---|---|---|
String | StringRecordReader | StringRecord | easybatch-core |
Directory | FileRecordReader | FileRecord | easybatch-core |
Iterable | IterableRecordReader | GenericRecord | easybatch-core |
Standard input | StandardInputRecordReader | StringRecord | easybatch-core |
Java 8 Stream | StreamRecordReader | GenericRecord | easybatch-stream |
Flat file | FlatFileRecordReader | StringRecord | easybatch-flatfile |
MS Excel file | MsExcelRecordReader | MsExcelRecord | easybatch-msexcel |
Xml stream | XmlRecordReader | XmlRecord | easybatch-xml |
Xml file | XmlFileRecordReader | XmlRecord | easybatch-xml |
Json stream | JsonRecordReader | JsonRecord | easybatch-json |
Json file | JsonFileRecordReader | JsonRecord | easybatch-json |
Yaml stream | YamlRecordReader | YamlRecord | easybatch-yaml |
Yaml file | YamlFileRecordReader | YamlRecord | easybatch-yaml |
Relational database | JdbcRecordReader | JdbcRecord | easybatch-jdbc |
Relational database | JpaRecordReader | GenericRecord | easybatch-jpa |
Relational database | HibernateRecordReader | GenericRecord | easybatch-hibernate |
MongoDB | MongoDBRecordReader | MongoDBRecord | easybatch-mongodb |
BlockingQueue | BlockingQueueRecordReader | GenericRecord | easybatch-core |
JmsQueue | JmsQueueRecordReader | JmsRecord | easybatch-jms |
Sometimes, the data source may be temporarily unavailable. In this case, the record reader will fail to read data and the job will be aborted.
The RetryableRecordReader
can be used to retry reading data using a delegate RecordReader
with a RetryPolicy
.
Job job = new JobBuilder()
.reader(new RetryableRecordReader(unreliableDataSourceReader, new RetryPolicy(5, 1, SECONDS)))
.build();
This will make the reader retries at most 5 times waiting one second between each attempt. If after 5 attempts the data source is still unreachable, the job will be aborted.
-
The
JdbcRecordReader
reads records in chunks. For large data sets, you can set themaxRows
andfetchSize
parameters to prevent loading data entirely in memory. -
The
JpaRecordReader
loads all data fetched by the JPQL query into ajava.util.List
object. You should pay attention to large data sets with the JPQL query you specify to theJpaRecordReader
. You can specify the maximum number of rows to fetch using themaxResults
parameter. -
The
HibernateRecordReader
uses theorg.hibernate.ScrollableResults
behind the scene to stream records in chunks. You can specify the fetch size and the maximum rows to fetch using thefetchSize
andmaxResult
parameters.
It is possible to read data from multiple files using a MultiFileRecordReader
. This assumes that all files have the same format. A MultiFileRecordReader
reads files in sequence and all records are passed to the processing pipeline as if they were read from the same file. There are 4 MultiFileRecordReader
s : MultiFlatFileRecordReader
, MultiXmlFileRecordReader
, MultiJsonFileRecordReader
and MultiYamlFileRecordReader
to read multiples flat, xml, json and yaml files respectively.
Easy Batch is created by Mahmoud Ben Hassine with the help of some awesome contributors
-
Introduction
-
User guide
-
Job reference
-
Component reference
-
Get involved