Refactor: Improve space insertion logic for Pinyin conversion #29

sunshineplan · 2025-02-14T05:49:40Z

The previous approach to adding spaces was overly mechanical, indiscriminately inserting spaces without considering the context of surrounding characters. This resulted in unexpected spaces in the output.

This commit refactors the space insertion logic to be context-aware. It now checks if adjacent characters belong to unicode.Punct or unicode.Symbol categories. Spaces are only inserted if the neighboring characters are not punctuation or symbols. This eliminates the need for a separate replacement step to remove redundant spaces added by the previous mechanical approach.

Additionally, the "allowed characters" setting has been removed. This ensures that all content from the original text is displayed in the Pinyin output, preventing the loss of characters such as book titles marks like 《》 and French characters, which were previously excluded by the character filtering mechanism.

Optimize spacing rules

9617ee0

vcaesar added the enhancement label Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: Improve space insertion logic for Pinyin conversion #29

Refactor: Improve space insertion logic for Pinyin conversion #29

Uh oh!

sunshineplan commented Feb 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor: Improve space insertion logic for Pinyin conversion #29

Are you sure you want to change the base?

Refactor: Improve space insertion logic for Pinyin conversion #29

Uh oh!

Conversation

sunshineplan commented Feb 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants