[fix][schema] Illegal character '$' in record#25193
[fix][schema] Illegal character '$' in record#25193mattisonchao wants to merge 4 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a regression introduced by the Avro upgrade (1.12.0) where Protobuf-derived Avro schemas can fail parsing due to $ appearing in generated record names, by introducing a more permissive Avro NameValidator and adding tests to cover the scenario.
Changes:
- Add a custom Avro
NameValidator(CompatibleNameValidator) to allow$in Avro record/field names during schema validation. - Add unit tests for the validator’s behavior and a reproduction test using a generated Protobuf schema.
- Introduce a new Protobuf message (
DataRecord.proto) used by the tests.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java | Uses a custom Avro NameValidator to accept $ during schema parsing/validation. |
| pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java | Adds tests for the new validator and a Protobuf-based reproduction. |
| pulsar-broker/src/main/proto/DataRecord.proto | Adds a Protobuf schema used to generate Avro with nested-type $ names for testing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...c/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java
Show resolved
Hide resolved
...c/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java
Show resolved
Hide resolved
...src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java
Show resolved
Hide resolved
...src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java
Show resolved
Hide resolved
|
|
||
| try { | ||
| Schema.Parser avroSchemaParser = new Schema.Parser(); | ||
| Schema.Parser avroSchemaParser = new Schema.Parser(COMPATIBLE_NAME_VALIDATOR); |
There was a problem hiding this comment.
The Pulsar code base includes new Schema.Parser() in many locations. Wouldn't we have to use the custom name validator in all cases?
There was a problem hiding this comment.
Yes, I'll change them also.
There was a problem hiding this comment.
Some factory class to pulsar-common? There's no avro dependency currently, but I guess it could justified.
There was a problem hiding this comment.
after the evaluation. I think this PR should focus solely on fixing the issue and eliminating breaking changes in the broker versions. To align the usage. We could try another PR on the client side to support it.
There was a problem hiding this comment.
pulsar-common is also used on broker side, so how could it be separated? I would assume that it would make sense to cover both client + broker to use the COMPATIBLE_NAME_VALIDATOR in the avro schema parser in all locations. It should be a task that Claude Code could handle without much intervention.
There was a problem hiding this comment.
Oh yes, pulsar-common doesn't currently depend on avro libraries (which could be a useful thing) so it would require a solution to share the name validator across broker and client.
There was a problem hiding this comment.
I think it's fine to handle broker-side first. Please rename the PR title so that it captures this detail.
Motivation
After the PR #24617. We upgraded to a newer Avro version, which introduced breaking changes for our use case.
The discussion context is here: https://lists.apache.org/thread/3xx7rx8571bbxc6c5s2ldbnmr06c42x3
This PR is only working to avoid breaking existing users. For the root cause of the issue, we should fix it in the Avro protobuf.
Modifications
CompatibleNameValidatorinstead of the default UTF_VALIDATOR to allow '$' to pass our validation.Verifying this change
Documentation
doc-not-needed