fix(tools): unbreak bat for paren paths and silence library INFO noise#825
Merged
Merged
Conversation
added 4 commits
May 18, 2026 20:16
…ndling - Parquet/Arrow schema mode: validate source_columns against the source file by count and name; reject unnamed SKIP (name-based matching cannot resolve it). Previously mismatches silently produced all-null columns. - TabletBuilder: lowercase the keys of tagDefaults and sourceColumnIndex to match the lowercase names returned by TableSchema.getColumnSchemas, match the time column case-insensitively, and use STRING (not TEXT) for tag MeasurementSchema. - Document the new validation rules and clarify auto-mode time-column requirements, type-promotion conditions, and table-name sanitization in both README versions; add a note above the Schema Example that duplicate (device, timestamp) rows are not supported. - Add SchemaValidationTest covering positive case plus column-count mismatches, unknown column name, and unnamed SKIP for both formats. - Add four regression tests in TabletBuilderTest for mixed-case tag / source / time columns, mixed-case tag default values, and virtual tag DEFAULT values landing in the device id.
ImportSchemaParser.parse used FileReader, which decodes with the JVM default charset. On Chinese Windows running JDK <18 the default is GBK (Java 18+ defaults to UTF-8 per JEP 400), so a UTF-8 import.schema containing Chinese table or column names was mis-decoded and the garbled text propagated into the resulting TsFile. Replace FileReader with InputStreamReader explicitly bound to StandardCharsets.UTF_8 so the file is read correctly regardless of JDK version or platform locale, mirroring how CsvSourceReader already opens its input.
Drop the start /B /WAIT wrapper in csv/arrow/parquet bats so paths like events(1).csv no longer get truncated by cmd parenthesis grouping; pass Java exit code through explicitly. Silence Arrow allocator and Parquet/Hadoop CodecPool INFO logs in logback-cvs2tsfile.xml. Add BatScriptTest covering normal filename, paren filename, and absence of suppressed INFO noise.
CritasWang
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Drop the start /B /WAIT wrapper in csv/arrow/parquet bats so paths like events(1).csv no longer get truncated by cmd parenthesis grouping; pass Java exit code through explicitly.
Silence Arrow allocator and Parquet/Hadoop CodecPool INFO logs in logback-cvs2tsfile.xml.
Add BatScriptTest covering normal filename, paren filename, and absence of suppressed INFO noise.