Skip to content

[fix][schema] Illegal character '$' in record#25193

Open
mattisonchao wants to merge 4 commits intomasterfrom
fixes.compatible.avro
Open

[fix][schema] Illegal character '$' in record#25193
mattisonchao wants to merge 4 commits intomasterfrom
fixes.compatible.avro

Conversation

@mattisonchao
Copy link
Member

Motivation

After the PR #24617. We upgraded to a newer Avro version, which introduced breaking changes for our use case.

The discussion context is here: https://lists.apache.org/thread/3xx7rx8571bbxc6c5s2ldbnmr06c42x3

This PR is only working to avoid breaking existing users. For the root cause of the issue, we should fix it in the Avro protobuf.

Modifications

  • Support a CompatibleNameValidator instead of the default UTF_VALIDATOR to allow '$' to pass our validation.
  • Add a test to reproduce the issue.
  • Add a test for CompatibleNameValidator

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • doc-not-needed

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a regression introduced by the Avro upgrade (1.12.0) where Protobuf-derived Avro schemas can fail parsing due to $ appearing in generated record names, by introducing a more permissive Avro NameValidator and adding tests to cover the scenario.

Changes:

  • Add a custom Avro NameValidator (CompatibleNameValidator) to allow $ in Avro record/field names during schema validation.
  • Add unit tests for the validator’s behavior and a reproduction test using a generated Protobuf schema.
  • Introduce a new Protobuf message (DataRecord.proto) used by the tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java Uses a custom Avro NameValidator to accept $ during schema parsing/validation.
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java Adds tests for the new validator and a Protobuf-based reproduction.
pulsar-broker/src/main/proto/DataRecord.proto Adds a Protobuf schema used to generate Avro with nested-type $ names for testing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


try {
Schema.Parser avroSchemaParser = new Schema.Parser();
Schema.Parser avroSchemaParser = new Schema.Parser(COMPATIBLE_NAME_VALIDATOR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pulsar code base includes new Schema.Parser() in many locations. Wouldn't we have to use the custom name validator in all cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change them also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some factory class to pulsar-common? There's no avro dependency currently, but I guess it could justified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the evaluation. I think this PR should focus solely on fixing the issue and eliminating breaking changes in the broker versions. To align the usage. We could try another PR on the client side to support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulsar-common is also used on broker side, so how could it be separated? I would assume that it would make sense to cover both client + broker to use the COMPATIBLE_NAME_VALIDATOR in the avro schema parser in all locations. It should be a task that Claude Code could handle without much intervention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, pulsar-common doesn't currently depend on avro libraries (which could be a useful thing) so it would require a solution to share the name validator across broker and client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to handle broker-side first. Please rename the PR title so that it captures this detail.

@mattisonchao mattisonchao marked this pull request as ready for review February 20, 2026 01:53
@mattisonchao mattisonchao reopened this Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments