Skip to content

safe identifier characters should include all Unicode alphanumerics #4

@jba

Description

@jba

The documentation for safe identifiers says "alphanumeric" characters are allowed, but the implementation supports only ASCII letters and numbers. Unless there are security issues with Unicode characters, they should be supported.

Motivation: documentation sites that want to use language identifiers as (part of) fragments for easy navigation, e.g. https://pkg.go.dev.

The change I'm suggesting would be from

var onlyAlphanumericsOrHyphenPattern = regexp.MustCompile(`^[-_a-zA-Z0-9]*$`)

(https://github.com/google/safehtml/blob/v0.0.2/identifier.go#L49)
to

var onlyAlphanumericsOrHyphenPattern = regexp.MustCompile(`^[-_\pL\pN]*$`)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions