Skip to content

Commit

Permalink
Merge pull request #8 from uclahs-cds/nwiltsie-regroup-modules
Browse files Browse the repository at this point in the history
Rearrange processes and modules, add SV workflow for Delly2
  • Loading branch information
nwiltsie authored Aug 7, 2024
2 parents be6926f + c9fde8c commit ca2b0ec
Show file tree
Hide file tree
Showing 27 changed files with 1,003 additions and 345 deletions.
57 changes: 6 additions & 51 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Changelog
All notable changes to the pipeline-name pipeline.
All notable changes to the StableLift pipeline.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

Expand All @@ -8,58 +8,13 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
---

## [Unreleased]
### Added
- Add `sample_id` extraction from BAM
- Add template input YAMLs
- Add pipeline-Nexflow-config as submodule and redirect set_resources_allocation
- Add pipeline-Nextflow-module as submodule
- Additional out of memory exit code
- Pipeline release action
- Template for NFTest testing results in PR template
- Enable dependabot
- Add example PlantUML image to README
- Add workflow to build documentation
- Add workflows to run Nextflow configuration tests

### Changed
- Switch resource limit checks to external scripts
- Update links in on-prem Confluence to point to cloud-based Confluence
- Fix `CODEOWNERS` file
- Use `schema.check_path` for `workDir` validation
- Add `Discussions` and `Contributors` to the Table of Contents in `README.md`
- Update from DSL1 to DSL2
- Standardize config structure
- Restructure repo so main script is main.nf
- Reorganize contributors and metadata
- Reorganize PR template so description is at top
- Update automatic node detection to allow for F2 detection
- Update Issue Template
- Standardize input/output/parameter structure in README
- Avoid modification of input parameter `output_dir`
- Create default docker container registry parameter for tools
- Use `methods.setup_process_afterscript()` to capture log files

---

## [1.0.0] - YYYY-MM-DD
### Added
- For new features.
- Added item 1.

### Changed
- For changes in existing functionality.
- Changed item 1.

### Deprecated
- For soon-to-be removed features.
- Add workflow for SNV callers (Mutect2, HaplotypeCaller, Strelka2, Muse2, SomaticSniper)
- Add workflow for SV caller (Delly2)
- Add pipeline diagram

### Removed
- For now removed features.
- Removed item 1.

### Fixed
- For any bug fixes.
- Fixed item 1.
### Changed

### Security
- In case of vulnerabilities.
- Sort VCF after liftover in SV branch
51 changes: 46 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,19 +1,60 @@
ARG R_VERSION=4.3.1
ARG R_VERSION="4.3.1"

ARG LIBBZ2_VERSION="1.0.8-*"
ARG LIBCURL_VERSION="7.81.0-*"
ARG LIBLZMA_VERSION="5.2.5-*"
ARG LIBXML2_VERSION="2.9.13+dfsg-*"
ARG PYTHON_VERSION="3.10.6-*"
ARG ZLIB_VERSION="1:1.2.11.dfsg-*"
ARG RLIBDIR="/usr/local/stablelift-R"

FROM rocker/r-ver:${R_VERSION} AS build

ARG LIBBZ2_VERSION
ARG LIBCURL_VERSION
ARG LIBLZMA_VERSION
ARG LIBXML2_VERSION
ARG ZLIB_VERSION

# Install build-time dependencies
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libbz2-dev=${LIBBZ2_VERSION} \
libcurl4-openssl-dev=${LIBCURL_VERSION} \
liblzma-dev=${LIBLZMA_VERSION} \
libxml2-dev=${LIBXML2_VERSION} \
zlib1g-dev=${ZLIB_VERSION} \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

ARG BIOC_VERSION="3.18"
ENV BIOC_VERSION=${BIOC_VERSION}

ARG RLIBDIR
ENV RENV_PATHS_CACHE ${RLIBDIR}/.cache

RUN mkdir -p ${RENV_PATHS_CACHE}

WORKDIR ${RLIBDIR}

COPY docker/install-stablelift.R /tmp
RUN Rscript /tmp/install-stablelift.R

# renv prints to stdout, so we need to change directories
WORKDIR /
RUN echo ".libPaths( c( .libPaths(), \"/usr/local/stablelift-R/renv/library/R-4.3/$(Rscript -e "cat(unname(unlist(R.version['platform'])))")\" ) )" >> /usr/local/lib/R/etc/Rprofile.site

FROM rocker/r-ver:${R_VERSION}

# Overwrite the site library with just the desired packages. By default rocker
# only bundles docopt and littler in that directory.
COPY --from=build /tmp/userlib /usr/local/lib/R/site-library
ARG RLIBDIR
COPY --from=build ${RLIBDIR} ${RLIBDIR}
COPY --from=build \
/usr/local/lib/R/etc/Rprofile.site \
/usr/local/lib/R/etc/Rprofile.site

# Install python (required for argparse). The version is not important, but
# let's pin it for stability.
ARG PYTHON_VERSION=3.10.6-1~22.04
ARG PYTHON_VERSION

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
Expand Down
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@ If you are using the UCLA Azure cluster, please use the [submission script](http

## Flow Diagram

A directed acyclic graph of your pipeline. The [PlantUML](https://plantuml.com/) code defining this diagram is version-controlled in the [docs/](./docs/) folder, and a [GitHub Action](https://github.com/uclahs-cds/tool-PlantUML-action) automatically regenerates the SVG image when that file is changed.

![Pipeline Graph](./docs/pipeline-flow.svg)
![Pipeline Graph](./docs/pipeline.mmd.svg)

---

Expand Down
2 changes: 1 addition & 1 deletion config/default.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ params {
gatk_version = '4.2.4.1'
pipeval_version = '5.0.0-rc.3'
samtools_version = '1.20'
stablelift_version = 'branch-nwiltsie-bootstrap' // FIXME
stablelift_version = 'branch-nwiltsie-regroup-modules' // FIXME

docker_image_bcftools = "${-> params.docker_container_registry}/bcftools-score:${params.bcftools_version}"
docker_image_bedtools = "${-> params.docker_container_registry}/bedtools:${params.bedtools_version}"
Expand Down
2 changes: 1 addition & 1 deletion config/methods.config
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ methods {
set_output_dir = {
def date = new Date().format("yyyyMMdd'T'HHmmss'Z'", TimeZone.getTimeZone('UTC'))

params.output_dir_base = "${params.output_dir}/${manifest.name}-${manifest.version}/${params.sample_id.replace(' ', '_')}"
params.output_dir_base = "${params.output_dir}/${manifest.name}-${manifest.version}/${params.sample_id.replace(' ', '_')}/StableLift-${manifest.version}"
params.log_output_dir = "${params.output_dir_base}/log-${manifest.name}-${manifest.version}-${date}"
}

Expand Down
11 changes: 11 additions & 0 deletions config/schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,17 @@ sample_id:
type: 'String'
required: true
help: 'sample id supplied from input yaml'
variant_caller:
type: 'String'
required: true
help: 'Tool used to call structural or somatic variants'
choices:
- Mutect2
- HaplotypeCaller
- Strelka2
- Muse2
- SomaticSniper
- Delly2
save_intermediate_files:
type: 'Bool'
required: true
Expand Down
9 changes: 8 additions & 1 deletion config/template.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,14 @@ params {
chain_file = "/hot/ref/tool-specific-input/liftOver/hg19ToHg38.over.chain"

// FIXME How to describe this file?
repeat_bed = "/hot/ref/database/RepeatMasker-v3.0.1/processed/GRCh38/GRCh38_RepeatMasker_intervals.bed"
repeat_bed = "/hot/ref/database/RepeatMasker-3.0.1/processed/GRCh38/GRCh38_RepeatMasker_intervals.bed"

// SV files
// FIXME Should this be bundled?
header_contigs = "/hot/code/nkwang/GitHub/uclahs-cds/project-method-AlgorithmEvaluation-BNCH-000142-GRCh37v38/report/manuscript/publish/GRCh38-vcf-header-contigs.txt"

// FIXME Should this be bundled?
gnomad_rds = "/hot/code/nkwang/GitHub/uclahs-cds/project-method-AlgorithmEvaluation-BNCH-000142-GRCh37v38/report/manuscript/publish/data/gnomad.v4.0.sv.Rds"
}

// Setup the pipeline config. DO NOT REMOVE THIS LINE!
Expand Down
40 changes: 18 additions & 22 deletions docker/install-stablelift.R
Original file line number Diff line number Diff line change
@@ -1,26 +1,22 @@
# Install the remotes package to the library
install.packages('remotes', lib = .Library)
install.packages('renv', lib = .Library)

# Make a temporary directory to hold all of the installed packages
localdir <- '/tmp/userlib'
dir.create(localdir)
options(
renv.settings.bioconductor.version = Sys.getenv('BIOC_VERSION')
)

dependencies <- c(
'ROCR' = '1.0-11',
'argparse' = '2.2.2',
'caret' = '6.0-94',
'data.table' = '1.14.8',
'doParallel' = '1.0.17',
'foreach' = '1.5.2',
'ranger' = '0.15.1',
'vcfR' = '1.14.0'
renv::init(
bare = TRUE,
bioconductor = Sys.getenv('BIOC_VERSION')
)

# Unfortunately, this will install the dependencies multiple times
for (name in names(dependencies)) {
remotes::install_version(
name,
unname(dependencies[name]),
lib = localdir
)
}
renv::install(c(
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'bioc::[email protected]'
))
79 changes: 79 additions & 0 deletions docs/pipeline.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
%%{init: {"flowchart": {"htmlLabels": false}} }%%

flowchart TD

classDef input fill:#ffffb3
classDef output fill:#b3de69
classDef gatk fill:#bebada
classDef bcftools fill:#fdb462
classDef R fill:#8dd3c7
classDef linux fill:#fb8072

subgraph legend ["`**Legend**`"]
direction RL
subgraph nodes ["`**Nodes**`"]
input[["Input File"]]:::input
input_node(["Parameterized Input"]):::input
output[["Output file"]]:::output
end

subgraph processes ["`**Processes**`"]
gatk_docker[GATK]:::gatk
bcftools_docker[bcftools]:::bcftools
r_docker[Rscript]:::R
linux_docker[Generic Linux]:::linux
end
end

legend
~~~ input_vcf[["Input VCF"]]:::input
--> pipeval:::linux
--> sv_vs_snv{{Variant Type?}}

sv_vs_snv ------> r_liftover
header_contigs .-> r_liftover
chain_file2 ..-> r_liftover
gnomad_rds .-> r_extract_sv

subgraph SV ["`**SV**`"]
%% Other input files
header_contigs([header_contigs]):::input
chain_file2([chain_file]):::input
gnomad_rds([gnomad_rds]):::input

r_liftover[liftover-Delly2-vcf.R]:::R
---> r_extract_sv[extract-VCF-features-SV.R]:::R

end

chain_file .-> bcftools_liftover
sv_vs_snv --> bcftools_liftover

subgraph SNV ["`**SNV**`"]
funcotator_sources([funcotator_sources]):::input
chain_file([chain_file]):::input
repeat_bed([repeat_bed]):::input

bcftools_liftover[bcftools +liftover]:::bcftools
---> gatk_func[gatk Funcotator]:::gatk
--> bcftools_annotate["`bcftools annotate*RepeatMasker*`"]:::bcftools
--> bcftools_annotate2["`bcftools annotate*Trinucleotide*`"]:::bcftools
--> r_extract_snv[extract-VCF-features.R]:::R
end

funcotator_sources .-> gatk_func
repeat_bed .-> bcftools_annotate

joinpaths{ }
r_extract_snv --> joinpaths
r_extract_sv --> joinpaths
joinpaths ---> r_predict_stability

subgraph Predict Stability ["`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**Predict Stability**`"]
r_predict_stability[predict-liftover-stability.R]:::R
--> bcftools_annotate3["`bcftools annotate*Stability*`"]:::bcftools

rf_model([rf_model]):::input .-> r_predict_stability
end

bcftools_annotate3 --> output_vcfs[["Output VCFs"]]:::output
1 change: 1 addition & 0 deletions docs/pipeline.mmd.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit ca2b0ec

Please sign in to comment.