Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected quotes in header column name #237

Open
HongliBu opened this issue Dec 23, 2020 · 1 comment
Open

Unexpected quotes in header column name #237

HongliBu opened this issue Dec 23, 2020 · 1 comment
Labels

Comments

@HongliBu
Copy link

HongliBu commented Dec 23, 2020

Hi we I using the lib(dependency "com.fasterxml.jackson.dataformat:jackson-dataformat-csv:2.9.9") to write Java Map Object to CSV files, I am seeing a weird thing. some of the header column generated are wrapped in double quotes.
like this:
create_timestamp,detection_timestamp,document_major_version,duplicate_count,env,event_guid,event_timestamp,finding_source_id,finding_source_system,finding_timestamp,finding_unique_id,landing_timestamp,"record_processed_timestamp","security_event_manager_id",sem_state,update_timestamp

and my code is like this:

private static class StreamHolder {
        private final FileOutputStream outputStream;
        private final File outputFile;
        private final CsvSchema csvSchema;
        private final CsvMapper csvMapper;


        StreamHolder(File outputFile, Map<String, Object> fields) {
            this.outputFile = outputFile;
            CsvSchema.Builder schemaBuilder = new CsvSchema.Builder();
            csvMapper = new CsvMapper();
            SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
            csvMapper.setDateFormat(df);
            for (String value : new TreeSet<>(fields.keySet())) {
                schemaBuilder.addColumn(value, CsvSchema.ColumnType.STRING);
            }
            csvSchema = schemaBuilder.build();
            try {
                this.outputStream = new FileOutputStream(outputFile);
            } catch (IOException e) {
                throw new PublisherException("Open stream on " + outputFile, e);
            }
        }

        void write(Map<String, Object> fields) throws IOException {
            if (outputFile.length() == 0) {
                outputStream.write(csvMapper.writerFor(Map.class).
                        with(csvSchema.withHeader()).writeValueAsBytes(fields));
            } else {
                outputStream.write(csvMapper.writerFor(Map.class).
                        with(csvSchema.withoutHeader()).writeValueAsBytes(fields));

            }
        }}

here I defined a class, and will be called to generated an instance and maintain the instance to write data to csv file. I use addColumn to define a schema. And when writing I use a Map<String, Object> fields as the input, and if the file is empty I useHeader, o.w. I don't useHeader.

The keys are all string type, why some of the keys has double quotes?

@cowtowncoder
Copy link
Member

Ok, first a note: 2.9.9 is bit of an old versions, so typically for issues it is good to check a more recent version; in this case either 2.11.4 or 2.12.0. But I assume there is no difference in behavior.
Since CSV allows quoting of any and all values (and requires in some cases; users may prefer in others as well), behavior seen is legal. But I am guessing you find it unexpected that only 2 column names are quoted. I have not verified this but I suspect this is because by default Jackson uses heuristics where:

  1. For short Strings, check is made to see if quoting is needed; if not, left unquoted (unless quoting is forced)
  2. For longer Strings, quoting occurs automatically: this to avoid cost of iterating over the whole String value to verify the need.

Definition of "short" String is in CsvEncoder class:

    /**
     * Also: only do check for optional quotes for short
     * values; longer ones will always be quoted.
     */
    final protected static int MAX_QUOTE_CHECK = 24;

so Strings 24 characters or longer are quoted by default.

But you can force checking of all values by enabling

 JsonGenerator.Feature.STRICT_CHECK_FOR_QUOTING

after which check would be made for longer names too and quoting not used unless absolutely required (due to existence of characters like linefeeds).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants