Skip to content

Conversation

@dorien-er
Copy link
Contributor

@dorien-er dorien-er commented Oct 9, 2025

Changelog

Make gene name sanitation optional:
recommended for removing versions from ensembleid's (e.g. ENSMUSG00000017167.6), but not for gene names with splice variants (e.g. AL627309.1)

Note: #1083 needs to be merged first

Issue ticket number and link

Closes #xxxx (Replace xxxx with the GitHub issue number)

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributor's guide

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI tests succeed!

# then an eleven digit number, optionally followed by .version_number
ensembl_pattern = re.compile(r"^(ENS.*\d{11})(?:\.\d+)?$")

return [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the index as input here and use index.to_series().str.startswith()
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.startswith.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rule out that there are no genes that start with ENS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can inverse the boolean mask returned by startswith using ~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, it's adjusted now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants