Skip to content

readers

Mahmoud Ben Hassine edited this page Jun 4, 2017 · 3 revisions

To read records from a data source, you should register an implementation of the RecordReader interface:

Job job = new JobBuilder()
    .reader(new MyRecordReader(myDataSource))
    .build();

There are several built-in record readers to read data from a variety of sources:

  • flat files (delimited and fixed length)
  • xml, json and yaml files
  • MS Excel files
  • in-memory strings
  • databases
  • JMS queues
  • BlockingQueue and Iterable objects
  • Java 8 streams
  • and standard input

Here is a table of built-in readers and how to use them:

Data source Reader Record type Module
String StringRecordReader StringRecord easybatch-core
Directory FileRecordReader FileRecord easybatch-core
Iterable IterableRecordReader GenericRecord easybatch-core
Standard input StandardInputRecordReader StringRecord easybatch-core
Java 8 Stream StreamRecordReader GenericRecord easybatch-stream
Flat file FlatFileRecordReader StringRecord easybatch-flatfile
MS Excel file MsExcelRecordReader MsExcelRecord easybatch-msexcel
Xml stream XmlRecordReader XmlRecord easybatch-xml
Xml file XmlFileRecordReader XmlRecord easybatch-xml
Json stream JsonRecordReader JsonRecord easybatch-json
Json file JsonFileRecordReader JsonRecord easybatch-json
Yaml stream YamlRecordReader YamlRecord easybatch-yaml
Yaml file YamlFileRecordReader YamlRecord easybatch-yaml
Relational database JdbcRecordReader JdbcRecord easybatch-jdbc
Relational database JpaRecordReader GenericRecord easybatch-jpa
Relational database HibernateRecordReader GenericRecord easybatch-hibernate
MongoDB MongoDBRecordReader MongoDBRecord easybatch-mongodb
BlockingQueue BlockingQueueRecordReader GenericRecord easybatch-core
JmsQueue JmsQueueRecordReader JmsRecord easybatch-jms

Handling data reading failures

Sometimes, the data source may be temporarily unavailable. In this case, the record reader will fail to read data and the job will be aborted. The RetryableRecordReader can be used to retry reading data using a delegate RecordReader with a RetryPolicy.

Job job = new JobBuilder()
    .reader(new RetryableRecordReader(unreliableDataSourceReader, new RetryPolicy(5, 1, SECONDS)))
    .build();

This will make the reader retries at most 5 times waiting one second between each attempt. If after 5 attempts the data source is still unreachable, the job will be aborted.

Performance notes

  • The JdbcRecordReader reads records in chunks. For large data sets, you can set the maxRows and fetchSize parameters to prevent loading data entirely in memory.

  • The JpaRecordReader loads all data fetched by the JPQL query into a java.util.List object. You should pay attention to large data sets with the JPQL query you specify to the JpaRecordReader. You can specify the maximum number of rows to fetch using the maxResults parameter.

  • The HibernateRecordReader uses the org.hibernate.ScrollableResults behind the scene to stream records in chunks. You can specify the fetch size and the maximum rows to fetch using the fetchSize and maxResult parameters.

Reading data from multiple files

It is possible to read data from multiple files using a MultiFileRecordReader. This assumes that all files have the same format. A MultiFileRecordReader reads files in sequence and all records are passed to the processing pipeline as if they were read from the same file. There are 4 MultiFileRecordReaders : MultiFlatFileRecordReader, MultiXmlFileRecordReader, MultiJsonFileRecordReader and MultiYamlFileRecordReader to read multiples flat, xml, json and yaml files respectively.

Clone this wiki locally