Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precomputed Genome level hasfrag #2

Open
ejarmand opened this issue Jan 30, 2025 · 1 comment
Open

Precomputed Genome level hasfrag #2

ejarmand opened this issue Jan 30, 2025 · 1 comment

Comments

@ejarmand
Copy link

Hi @muntakimrafi wanted to create a an issue relevant to our discussion on Blue sky

Basically it would be really useful to provide files with precomputed pairwise leakage for all genomic elements. I imagine every ~200 bp or so, for popular references (hg38, mm10, T2T, GCRm39).

Ideally there could also be a tiled whole genome pipeline.

If it's useful I'm happy to contribute some development time to parts of this task.

@muntakimrafi
Copy link
Collaborator

Hello @ejarmand, continuing from our discussion on Blue sky

This would be an enourmous amount of calculation. But I think we should do this if we were do a create the least possible leakage free splits for the genome.

The first step would be to run blastn genome wide. I am thinking of creating mutliple databases (1 per chromosome) and running blastn_array modules (for each chromosome as query set) (@bkiyota just pushed it to the repo). This way we can modularize the computation and divide it among multiple interested parties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants