Skip to content

Conversation

@shreeMahadikGit
Copy link

Fixes this issue:

#20225

Copy link
Contributor

@calixteman calixteman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it won't work with a pdf containing "Hello. World" and a query equals to "o. w" (fyi it's ok in Acrobat).
I think a fix could be to add optional white spaces around group of punctuation signs which induces to not have [ ]* between consecutive punctuation signs.
It'd lead to update SPECIAL_CHARS_REG_EXP but you've to take care to the case of . and ?.

@shreeMahadikGit
Copy link
Author

shreeMahadikGit commented Oct 16, 2025

Handled the Suggested test case. Also Attaching the PDF with the test case:

Morse (1) (1).pdf

const DIACRITICS_REG_EXP = /\p{M}+/gu;
const SPECIAL_CHARS_REG_EXP =
/([.*+?^${}()|[\]\\])|(\p{P})|(\s+)|(\p{M})|(\p{L})/gu;
/([*+^${}()|[\]\\])|((?:[.?]|\p{P})+)|(\s+)|(\p{M})|(\p{L})/gu;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to not capture . or ? ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still capturing them just in the punctuation-run group (p2) so consecutive punctuation is treated as one token. That lets us add spaces only around the run, never inside it.

return `[ ]*\\${p1}[ ]*`;
// Escaped metacharacters like . * + ? ...
// Allow spaces around them ONLY if the user typed spaces.
return queryHasWhitespace ? `[ ]*\\${p1}[ ]*` : `\\${p1}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to do that ?

return `[ ]*${p2}[ ]*`;
// Punctuation: allow optional spaces ONLY if the user typed spaces.
// p2 is a *run* of punctuation; escape it as a whole.
const escapedRun = escapeForRegex(p2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just have to replace the . and the ? by . and ?

// Punctuation: allow optional spaces ONLY if the user typed spaces.
// p2 is a *run* of punctuation; escape it as a whole.
const escapedRun = escapeForRegex(p2);
return queryHasWhitespace ? `[ ]*${escapedRun}[ ]*` : `${escapedRun}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I don't understand why you want to make a difference between hasWhitespace and !hasWhitespace.

@calixteman
Copy link
Contributor

@shreeMahadikGit do you plan to work on this patch or not ?

@shreeMahadikGit
Copy link
Author

Hii @calixteman,

First of all, apologies for the late reply. I found a few more issues with the solution I implemented.

As you can see in the attached screenshot, the search isn’t matching the correct characters. I’ll need to do some additional work on my solution to address this.

I have refactored the code according to your comments and removed most of the unnecessary parts as well, but I’ve discovered a few more issues in the current implementation that need to be fixed.

If you think someone else can address this more quickly, please feel free to reassign the issue.

Screenshot 2025-10-31 at 12 37 24 AM

@calixteman
Copy link
Contributor

Thank you very much for your contribution, but I've addressed the issue myself:
#20456

@calixteman calixteman closed this Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants