-
-
Notifications
You must be signed in to change notification settings - Fork 151
Description
We have a use case where we replace an existing CSV writer (based on Apache Commons CSV) with Jackson. The old CSV writer was configured to write "special characters" such as CR and LF as \r
and \n
. Jackson does not support this but adheres to RFC 4180 (where no escaping exists). This causes a lot of pain for our customers as the data we write contains often CR and LF characters.
Here is some test code:
public class CsvTest {
public static final void main(String... args) throws Exception {
CsvSchema schema = CsvSchema.emptySchema().withEscapeChar('\\');
CsvFactory csvFactory = new SchemaAwareCsvFactory(schema);
ObjectMapper csvMapper = new ObjectMapper(csvFactory);
String [] line = new String [] { "a", "\n", "\"", ","};
csvMapper.writeValue(new PrintWriter(System.out), line);
}
// Is there actually a better way to set the schema for a ObjectMapper? This seems painful.
public static class SchemaAwareCsvFactory extends CsvFactory {
SchemaAwareCsvFactory(CsvSchema schema) {
super();
this._schema = schema;
}
}
}
Which produces
a,"
","""",","
I can get it to produce
"a","
","\"",","
by adding
csvFactory.enable(CsvGenerator.Feature.ALWAYS_QUOTE_STRINGS);
csvFactory.enable(Feature.ESCAPE_QUOTE_CHAR_WITH_ESCAPE_CHAR);
But what I am actually looking for is
"a","\n","\"",","
There seems to be no way to get the generator (and probably also the parser) to generate and parse control characters. Having CR and LF within quotes is legal from the RFC 4180 PoV, however most of the CSV that our systems produce get parsed by legacy ("brain dead") tools that assume that every LF is a record separator.
Apache Commons CSV has a nice summary on their Javadoc page for CSVFormat
: https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#DEFAULT (and below)
(god, CSV is such a mess. And that is the standard format for enterprise data???)