Skip to content

Commit

Permalink
Merge pull request #1461 from ComparativeGenomicsToolkit/collapse
Browse files Browse the repository at this point in the history
[Experimental] Allow self-alignment in pangenome reference
  • Loading branch information
glennhickey authored Sep 25, 2024
2 parents 728fc9e + 1a207fe commit b282b58
Show file tree
Hide file tree
Showing 10 changed files with 210 additions and 32 deletions.
20 changes: 10 additions & 10 deletions build-tools/downloadPangenomeTools
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ fi
cd ${pangenomeBuildDir}
git clone https://github.com/lh3/minimap2.git
cd minimap2
git checkout 581f2d7123f651d04c9e103b1c7ea8f7051e909a
git checkout v2.28
# hack in flags support
sed -i Makefile -e 's/CFLAGS=/CFLAGS+=/'
make -j ${numcpu} ${MINI_ARM_FLAGS}
Expand All @@ -60,7 +60,7 @@ fi
cd ${pangenomeBuildDir}
git clone https://github.com/lh3/minigraph.git
pushd minigraph
git checkout v0.20
git checkout v0.21
# hack in flags support
sed -i Makefile -e 's/CFLAGS=/CFLAGS+=/'
make -j ${numcpu}
Expand All @@ -75,7 +75,7 @@ fi
cd ${pangenomeBuildDir}
git clone https://github.com/lh3/gfatools.git
cd gfatools
git checkout v0.5
git checkout c31be8a62efc6bdea4576029f7fbe84f345a6eed
# hack in flags support
sed -i Makefile -e 's/CFLAGS=/CFLAGS+=/'
make -j ${numcpu}
Expand Down Expand Up @@ -176,7 +176,7 @@ fi
cd ${pangenomeBuildDir}
git clone https://github.com/ComparativeGenomicsToolkit/cactus-gfa-tools.git
cd cactus-gfa-tools
git checkout 0c17bc4ae9a7cf174fa40805cde7f8f1f6de8225
git checkout 1121e370880ee187ba2963f0e46e632e0e762cc5
make -j ${numcpu}
if [[ $STATIC_CHECK -ne 1 || $(ldd paf2lastz | grep so | wc -l) -eq 0 ]]
then
Expand Down Expand Up @@ -235,7 +235,7 @@ fi

# hal2vg
cd ${pangenomeBuildDir}
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.7/hal2vg
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.8/hal2vg
chmod +x hal2vg
if [[ $STATIC_CHECK -ne 1 || $(ldd hal2vg | grep so | wc -l) -eq 0 ]]
then
Expand All @@ -245,7 +245,7 @@ else
fi
# clip-vg
cd ${pangenomeBuildDir}
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.7/clip-vg
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.8/clip-vg
chmod +x clip-vg
if [[ $STATIC_CHECK -ne 1 || $(ldd clip-vg | grep so | wc -l) -eq 0 ]]
then
Expand All @@ -255,7 +255,7 @@ else
fi
# halRemoveDupes
cd ${pangenomeBuildDir}
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.7/halRemoveDupes
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.8/halRemoveDupes
chmod +x halRemoveDupes
if [[ $STATIC_CHECK -ne 1 || $(ldd halRemoveDupes | grep so | wc -l) -eq 0 ]]
then
Expand All @@ -265,7 +265,7 @@ else
fi
# halMergeChroms
cd ${pangenomeBuildDir}
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.7/halMergeChroms
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.8/halMergeChroms
chmod +x halMergeChroms
if [[ $STATIC_CHECK -ne 1 || $(ldd halMergeChroms | grep so | wc -l) -eq 0 ]]
then
Expand All @@ -276,7 +276,7 @@ fi

# halUnclip
cd ${pangenomeBuildDir}
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.7/halUnclip
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.8/halUnclip
chmod +x halUnclip
if [[ $STATIC_CHECK -ne 1 || $(ldd halUnclip | grep so | wc -l) -eq 0 ]]
then
Expand All @@ -287,7 +287,7 @@ fi

# filter-paf-deletions
cd ${pangenomeBuildDir}
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.7/filter-paf-deletions
wget -q https://github.com/ComparativeGenomicsToolkit/hal2vg/releases/download/v1.1.8/filter-paf-deletions
chmod +x filter-paf-deletions
if [[ $STATIC_CHECK -ne 1 || $(ldd filter-paf-deletions | grep so | wc -l) -eq 0 ]]
then
Expand Down
4 changes: 4 additions & 0 deletions doc/pangenome.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,10 @@ The `--reference` option must be used to select a "reference" sample. This samp

It is therefore extremely important that the reference sample's assembly be **chromosome** scale. If there are many small contigs in the addition to chromosomes in the reference assembly, then please consider specifying the chromosomes with `--refContigs`. If you still want to keep the other contigs, add `--otherContig chrOther` (see explanation below).

### Self-Alignment and the Collapse Option

The `--collapse` option, added as an experimental prototype in [v2.9.1](https://github.com/ComparativeGenomicsToolkit/cactus/releases/tag/v2.9.1), can be used to incorporate self-alignments into the pangenome, *including the reference sample*. This will produce a more compact graph with, for example, tandem duplications being represented as cycles (like PGGB) rather than insertions (like minigraph). It also drops the invariant (see above) that the reference path by acyclic -- with this option, the reference path can contain cycles. The `--collapse` option is implemented by running `minimap2` using options found in the `minimapCollapseOptions` parameter in the configuration XML to align each input *contig* with itself. These self alignments are fed into Cactus alongide the usual sequence-to-minigraph alignments.

#### Multiple Reference Samples

The `--reference` option can accept multiple samples (separated by space). If multiple samples are specified beyond the first, they will be clipped as usual, but end up as "reference-sense" paths in the vg/gbz output. They can also be used as basis for VCF, and VCF files can be created based on them with the `--vcfReference` sample.
Expand Down
12 changes: 10 additions & 2 deletions src/cactus/cactus_progressive_config.xml
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,9 @@
<!-- minigraphMapOptions: flags to pass to minigraph for mapping -->
<!-- minigraphConstructOptions: flags to pass to minigraph for construction -->
<!-- minigraphConstructBatchSize: break minigraph construction into chunks of this size. this allows recovering from a failure during graph construction, and avoiding really long argument lists for each job. -->
<!-- minimapCollapseOptions: options for minimap2 when using it for self-alignments. Note these were inspired in part by https://github.com/lh3/minimap2/issues/106 !-->
<!-- collapse: Toggles self-alignment logic. Supported values are "none" and "all". Experimental values "reference" and "nonref" can also be tried. -->
<!-- maxCollapseDistanceRatio: filter out self-alignments that are more than this value times the alignment lenght apart. Setting to 0 would only accept tandem (or overlapping alignments). Raising it allows alignments further apart. Set to -1 to completely disable. -->
<!-- minigraphSortInput: Method used to determine input order. Valid values are "mash", "none" / "0" which refer to (decreasing) mash distance to reference, or no sorting (order in seqfile), respectively -->
<!-- minMAPQ: ignore minigraph alignments with mapping quality less than this -->
<!-- minGAFBlockLength: ignore minigraph alignments with block length less than this -->
Expand All @@ -334,13 +337,17 @@
<!-- delFilter: any deletions implied by split-read mappings greater than this are removed from the paf (by removing all lines of the smallest block bordering deletion)-->
<!-- delFilterThreshold: only remove deletion if it costs < delFilterThreshold * deletion-size matches. must be in range (0, 1] -->
<!-- delFilterQuerySizeThreshold: deletion removed if the supporting query contig has length < delFilterQuerySizeThreshold * deletion-size -->
<!-- minIdentity: ignore PAF lines with identity (col 10 / col 11) is less than this -->
<!-- minIdentity: ignore PAF lines with identity (col 10 / col 11) is less than this -->
<!-- minScore: ignore PAF lines with score (AS:i) less than this (adding this to filter negative-scoring minimap2 lines) -->
<!-- removeMinigraphFromPAF: replace all minigraph contigs with transitive alignments in cactus-align-->
<!-- cpu: use up to this many cpus for each minigraph command. -->
<graphmap
assemblyName="_MINIGRAPH_"
minigraphMapOptions="-c -xasm"
minigraphConstructOptions="-c -xggs"
minimapCollapseOptions="-c -D -xasm5 -m500"
collapse="none"
maxCollapseDistanceRatio="5"
minigraphConstructBatchSize="50"
minigraphSortInput="mash"
minMAPQ="5"
Expand All @@ -354,7 +361,8 @@
delFilter="10000000"
delFilterThreshold="0.01"
delFilterQuerySizeThreshold="2"
minIdentity="0.5"
minIdentity="0.5"
minScore="1"
removeMinigraphFromPAF="0"
cpu="6"
/>
Expand Down
Loading

0 comments on commit b282b58

Please sign in to comment.