fix(tools): unbreak bat for paren paths and silence library INFO noise by gengziyand · Pull Request #825 · apache/tsfile

gengziyand · 2026-05-27T07:07:55Z

Drop the start /B /WAIT wrapper in csv/arrow/parquet bats so paths like events(1).csv no longer get truncated by cmd parenthesis grouping; pass Java exit code through explicitly.
Silence Arrow allocator and Parquet/Hadoop CodecPool INFO logs in logback-cvs2tsfile.xml.
Add BatScriptTest covering normal filename, paren filename, and absence of suppressed INFO noise.

…ndling - Parquet/Arrow schema mode: validate source_columns against the source file by count and name; reject unnamed SKIP (name-based matching cannot resolve it). Previously mismatches silently produced all-null columns. - TabletBuilder: lowercase the keys of tagDefaults and sourceColumnIndex to match the lowercase names returned by TableSchema.getColumnSchemas, match the time column case-insensitively, and use STRING (not TEXT) for tag MeasurementSchema. - Document the new validation rules and clarify auto-mode time-column requirements, type-promotion conditions, and table-name sanitization in both README versions; add a note above the Schema Example that duplicate (device, timestamp) rows are not supported. - Add SchemaValidationTest covering positive case plus column-count mismatches, unknown column name, and unnamed SKIP for both formats. - Add four regression tests in TabletBuilderTest for mixed-case tag / source / time columns, mixed-case tag default values, and virtual tag DEFAULT values landing in the device id.

ImportSchemaParser.parse used FileReader, which decodes with the JVM default charset. On Chinese Windows running JDK <18 the default is GBK (Java 18+ defaults to UTF-8 per JEP 400), so a UTF-8 import.schema containing Chinese table or column names was mis-decoded and the garbled text propagated into the resulting TsFile. Replace FileReader with InputStreamReader explicitly bound to StandardCharsets.UTF_8 so the file is read correctly regardless of JDK version or platform locale, mirroring how CsvSourceReader already opens its input.

Drop the start /B /WAIT wrapper in csv/arrow/parquet bats so paths like events(1).csv no longer get truncated by cmd parenthesis grouping; pass Java exit code through explicitly. Silence Arrow allocator and Parquet/Hadoop CodecPool INFO logs in logback-cvs2tsfile.xml. Add BatScriptTest covering normal filename, paren filename, and absence of suppressed INFO noise.

ziyangeng added 4 commits May 18, 2026 20:16

Fix tools import edge cases

eb228ec

CritasWang approved these changes May 27, 2026

View reviewed changes

HTHou merged commit e3cdf87 into apache:develop May 28, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tools): unbreak bat for paren paths and silence library INFO noise#825

fix(tools): unbreak bat for paren paths and silence library INFO noise#825
HTHou merged 4 commits into
apache:developfrom
gengziyand:fix-tools-import-edge-cases

gengziyand commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gengziyand commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants