You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am phasing a large (n=172k) sample of parents and offspring (some duos, some trios), but I only need the phased genotypes for the offspring.
I start by running the phasing jobs (with the chromosomes in chunks) using shapeit5 with --pedigree on a HPC cluster. The initial whole-sample output files are written to a local scratch filesystem, where they will be deleted immediately after the job finishes. I then use bcftools view -S to subset these results to just the offspring, and save that smaller results chunk on the cluster's network filesystem where files will persist past the end of the job.
After all jobs have finished running, I try to use ligate with the --pedigree flag on the offspring-only results chunks. Despite using --pedigree, it detects the offspring samples as non-scaffolded, haplotype order gets swapped, and sometimes chunks from the maternal and paternal haplotypes are incorrectly combined as if they were in phase.
Is the behavior of ligate for a file where 100% of the samples are scaffolded just the same as bcftools concat -a -d all, or would there still be a reason to prefer ligate? If there's still a reason to prefer ligate, then is there a way to get it to treat offspring as scaffolded (eg. refrain from swapping haplotypes around) even when parents are no longer in the data?
The text was updated successfully, but these errors were encountered:
I am phasing a large (n=172k) sample of parents and offspring (some duos, some trios), but I only need the phased genotypes for the offspring.
I start by running the phasing jobs (with the chromosomes in chunks) using shapeit5 with --pedigree on a HPC cluster. The initial whole-sample output files are written to a local scratch filesystem, where they will be deleted immediately after the job finishes. I then use
bcftools view -S
to subset these results to just the offspring, and save that smaller results chunk on the cluster's network filesystem where files will persist past the end of the job.After all jobs have finished running, I try to use
ligate
with the--pedigree
flag on the offspring-only results chunks. Despite using--pedigree
, it detects the offspring samples as non-scaffolded, haplotype order gets swapped, and sometimes chunks from the maternal and paternal haplotypes are incorrectly combined as if they were in phase.Is the behavior of
ligate
for a file where 100% of the samples are scaffolded just the same asbcftools concat -a -d all
, or would there still be a reason to preferligate
? If there's still a reason to preferligate
, then is there a way to get it to treat offspring as scaffolded (eg. refrain from swapping haplotypes around) even when parents are no longer in the data?The text was updated successfully, but these errors were encountered: