Skip to content

fix(tools): unbreak bat for paren paths and silence library INFO noise#825

Merged
HTHou merged 4 commits into
apache:developfrom
gengziyand:fix-tools-import-edge-cases
May 28, 2026
Merged

fix(tools): unbreak bat for paren paths and silence library INFO noise#825
HTHou merged 4 commits into
apache:developfrom
gengziyand:fix-tools-import-edge-cases

Conversation

@gengziyand
Copy link
Copy Markdown
Contributor

  1. Drop the start /B /WAIT wrapper in csv/arrow/parquet bats so paths like events(1).csv no longer get truncated by cmd parenthesis grouping; pass Java exit code through explicitly.

  2. Silence Arrow allocator and Parquet/Hadoop CodecPool INFO logs in logback-cvs2tsfile.xml.

  3. Add BatScriptTest covering normal filename, paren filename, and absence of suppressed INFO noise.

ziyangeng added 4 commits May 18, 2026 20:16
…ndling

- Parquet/Arrow schema mode: validate source_columns against the source file by count and name; reject unnamed SKIP (name-based matching cannot resolve it). Previously mismatches silently produced all-null columns.

- TabletBuilder: lowercase the keys of tagDefaults and sourceColumnIndex to match the lowercase names returned by TableSchema.getColumnSchemas, match the time column case-insensitively, and use STRING (not TEXT) for tag MeasurementSchema.

- Document the new validation rules and clarify auto-mode time-column requirements, type-promotion conditions, and table-name sanitization in both README versions; add a note above the Schema Example that duplicate (device, timestamp) rows are not supported.

- Add SchemaValidationTest covering positive case plus column-count mismatches, unknown column name, and unnamed SKIP for both formats.

- Add four regression tests in TabletBuilderTest for mixed-case tag / source / time columns, mixed-case tag default values, and virtual tag DEFAULT values landing in the device id.
ImportSchemaParser.parse used FileReader, which decodes with the JVM default charset. On Chinese Windows running JDK <18 the  default is GBK (Java 18+ defaults to UTF-8 per JEP 400), so a UTF-8 import.schema containing Chinese table or column names was  mis-decoded and the garbled text propagated into the resulting TsFile.

Replace FileReader with InputStreamReader explicitly bound to StandardCharsets.UTF_8 so the file is read correctly regardless   of JDK version or platform locale, mirroring how CsvSourceReader already opens its input.
Drop the start /B /WAIT wrapper in csv/arrow/parquet bats so paths like events(1).csv no longer get truncated by cmd parenthesis grouping; pass Java exit code through explicitly.

Silence Arrow allocator and Parquet/Hadoop CodecPool INFO logs in logback-cvs2tsfile.xml.

Add BatScriptTest covering normal filename, paren filename, and absence of suppressed INFO noise.
@HTHou HTHou merged commit e3cdf87 into apache:develop May 28, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants