Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique variants per VCF #1246

Open
davmlaw opened this issue Jan 31, 2025 · 1 comment
Open

Unique variants per VCF #1246

davmlaw opened this issue Jan 31, 2025 · 1 comment
Assignees

Comments

@davmlaw
Copy link
Contributor

davmlaw commented Jan 31, 2025

Ideally we'd like to do this:

class CohortGenotype(models.Model):
    unique_together = ("collection", "variant")

But that caused a conflict after normalizing a varinat to dupe, as we no longer do rmdup in variant normalization as it makes the bcftools old variant record disappear (vcf_preprocess line ~116)

Perhaps we can remove duplicates ourselves as they are going to be right after each other (as file is sorted)

@davmlaw davmlaw self-assigned this Jan 31, 2025
davmlaw added a commit that referenced this issue Jan 31, 2025
davmlaw added a commit that referenced this issue Jan 31, 2025
@davmlaw
Copy link
Contributor Author

davmlaw commented Jan 31, 2025

Had problem again, dupe variants were 2 apart in array, so started using a hash check instead

TODO: there is a very slight chance that dupes could be split across multiple processors (due is at end of 1 split file, start of another). Should raise an issue to fix that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant