Fix case_insensitive_matching_strategy-with special characters #45

zhaoyang868686 · 2025-02-20T16:01:27Z

Hi guys,

There is a bug in case_insensitive_matching_strategy.

It use

text = re.sub(anonymized, original, text, flags=re.IGNORECASE)

to replace anonymized with original in text.

For phone numbers such as "+1-235-234-8740x164" starting with "+"

text = re.sub(pattern='+1-235-234-8740x164', repl='XXX', string='XXX', flags=re.IGNORECASE)

The first parameter pattern expect a string or a regular expressions string, if the string starts with "+", it will be recognized as a regular expressions and lead to an error.

In regular expressions "+" causes the resulting RE to match 1 or more repetitions of the preceding RE, but there is no characters before "+" in phone number.
-> re.error: nothing to repeat at position 0

How to reproduce:

anonymizer = PresidioReversibleAnonymizer()
anonymizer._deanonymizer_mapping.update(new_mapping={'PHONE_NUMBER': {'+1-235-234-8740x164': '12345678'}})
anonymizer.deanonymize(text_to_deanonymize='some text', deanonymizer_matching_strategy=case_insensitive_matching_strategy)

How to fix:

    for entity_type in deanonymizer_mapping:
        for anonymized, original in deanonymizer_mapping[entity_type].items():
            # Use regular expressions for case-insensitive matching and replacing
            text = re.sub(pattern=re.escape(pattern=anonymized),
                          repl=original,
                          string=text,
                          flags=re.IGNORECASE)
    return text

Fix case_insensitive_matching_strategy-with special characters.

zhaoyang868686 added 5 commits February 20, 2025 16:45

Fix case_insensitive_matching_strategy-with special characters

b3de953

Fix case_insensitive_matching_strategy-with special characters.

format code

996ce5c

format code

679adc5

format code

9c6475e

format code

7e7c9a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix case_insensitive_matching_strategy-with special characters #45

Fix case_insensitive_matching_strategy-with special characters #45

Uh oh!

zhaoyang868686 commented Feb 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix case_insensitive_matching_strategy-with special characters #45

Are you sure you want to change the base?

Fix case_insensitive_matching_strategy-with special characters #45

Uh oh!

Conversation

zhaoyang868686 commented Feb 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant