Releases: openpipelines-bio/openpipeline
0.7.1
openpipelines 0.7.1
NEW FUNCTIONALITY
-
integrate/scvi: usenvcr.io/nvidia/pytorch:22.09-py3as base container to enable GPU acceleration. -
integrate/scvi: add--model_outputto save model. -
workflows/ingestion/cellranger_mapping: Addedoutput_typeto output the filtered Cell Ranger data as h5mu, not the converted raw 10xh5 output. -
Several components: added
--output_compressioncomponent to set the compression of output .h5mu files. -
workflows/full_pipelineandworkflows/integration: Addedleiden_resolutionargument to control the coarseness of the clustering. -
Added
--rna_thetaand--rna_harmony_thetato full and integration pipeline respectively in order to tune the diversity clustering penalty parameter for harmony integration.
BUG FIXES
-
mapping/cellranger_multi: Fix an issue where using a directory as value for--inputwould causeAttributeError. -
workflows/integration:init_posis no longer set to the integration layer (e.g.X_pca_integrated). -
dimred/pca: fixvarianceslot containing a second copy of the variance ratio matrix and not the variances.
MINOR CHANGES
-
integrationandfullworkflows: do not run harmony integration whenobs_covariatesis not provided. -
Add
highmemlabel todimred/pcacomponent. -
Remove disabled
convert/from_csv_to_h5mucomponent. -
Update to Viash 0.7.1.
-
Several components: update to scanpy 1.9.2
-
process_10xh5/filter_10xh5: speed up build by usingeddelbuettel/r2u:22.04base container.
MAJOR CHANGES
dataflow/concat: Renamed--compressionto--output_compression.
0.7.0
openpipelines 0.7.0
MAJOR CHANGES
- Removed
binfolder. As of viash 0.6.4, a_viash.yamlfile can be included in the root of a repository to set common viash options for the project.
These options were previously covered in thebin/initscript, but this new feature of viash makes its use unnecessary. Theviashandnextflowshould now be installed in a directory that is included in your$PATH.
MINOR CHANGES
filter/do_filter: raise an error instead of printing a warning when providing a column forvar_filerorobs_filterthat doesn't exist.
BUG FIXES
-
workflows/full_pipeline: Fix setting .var output column for filter_with_hvg. -
Fix running
mapping/cellranger_multiwithout passing all references. -
filter/filter_with_scrublet: now setsuse_approx_neighborstoFalseto avoid usingannoybecause it fails on processors that are missing the AVX-512 instruction sets. -
workflows: UpdatedWorkflowHelperto newer version that allows applying defaults when calling a subworkflow from another workflow. -
Several components: pin matplotlib to <3.7 to fix scanpy compatibility (see scverse/scanpy#2411).
-
workflows: fix a bug when running a subworkflow from a workflow would cause the parent config to be read instead of the subworklow config. -
correction/cellbender_remove_background: Fix description of input for cellbender_remove_background. -
filter/do_filter: resolved an issue where the .obs column instead of the .var column was being logged when filtering using the .var column. -
workflows/rna_singlesampleandworkflows/prot_singlesample: Correctly set var and obs columns while filtering with counts. -
filter/do_filter: removed the default input value forvar_filterargument. -
workflows/full_pipelineandworkflows/integration: fix PCA not using highly variable genes filter.
0.6.2
openpipelines 0.6.2
NEW FUNCTIONALITY
-
workflows/full_pipeline: addedfilter_with_hvg_obs_batch_keyargument for batched detection of highly variable genes. -
workflows/rna_multisample: addedfilter_with_hvg_obs_batch_key,filter_with_hvg_flavorandfilter_with_hvg_n_top_genesarguments. -
qc/calculate_qc_metrics: Add basic statistics:pct_dropout,num_zero_obs,obs_meanandtotal_countsare added to .var.num_nonzero_vars,pct_{var_qc_metrics},total_counts_{var_qc_metrics},pct_of_counts_in_top_{top_n_vars}_varsandtotal_countsare included in .obs -
workflows/multiomics/rna_multisampleandworkflows/multiomics/full_pipeline: addqc/calculate_qc_metricscomponent to workflow. -
workflows/multiomics/prot_singlesample: Processing unimodal single-sample CITE-seq data. -
workflows/multiomics/rna_singlesampleandworkflows/multiomics/full_pipeline: Add filtering arguments to pipeline.
MINOR CHANGES
-
convert/from_bdrhap_to_h5mu: bump R version to 4.2. -
process_10xh5/filter_10xh5: bump R version to 4.2. -
dataflow/concat: include path of file in error message when reading a mudata file fails. -
mapping/cellranger_multi: write cellranger console output to acellranger_multi.logfile.
BUG FIXES
-
mapping/htseq_count_to_h5mu: Fix a bug where reading in the gtf file causedAttributeError. -
dataflow/concat: the--input_idis no longer required when--modeis notmove. -
filter/filter_with_hvg: does no longer try to use--varm_nameto set non-existant metadata when running with--flavor seurat_v3, which was causingKeyError. -
filter/filter_with_hvg: Enforce thatn_top_genesis set whenflavoris set to 'seurat_v3'. -
filter/filter_with_hvg: Improve error message when trying to use 'cell_ranger' asflavorand passing unfiltered data. -
mapping/cellranger_multinow appliesgex_chemistry,gex_secondary_analysis,gex_generate_bam,gex_include_intronsandgex_expect_cells.
0.6.1
openpipeline 0.6.1
BUG FIXES
src/filter/filter_with_counts: Fix an issue where mitochrondrial genes were being detected in .var_names, which contain ENSAMBL IDs instead of gene symbols in the pipelines. Solution was to create a--var_gene_namesargument which allows selecting a .var column to check using a regex (--mitochondrial_gene_regex).
0.6.0
openpipeline 0.6.0
NEW FUNCTIONALITY
-
workflows/full_pipeline: addfilter_with_hvg_var_outputargument. -
dimred/pca: Add--overwriteand--var_inputarguments. -
src/tranform/clr: Perform CLR normalization on CITE-seq data. -
workflows/ingestion/cellranger_multi: Run Cell Ranger multi and convert the output to .h5mu. -
filter/remove_modality: Remove a single modality from a MuData file. -
mapping/star_align: Align.fastqfiles using STAR. -
mapping/star_align_v273a: Align.fastqfiles using STAR v2.7.3a. -
mapping/star_build_reference: Create a STAR reference index. -
mapping/cellranger_multi: Align fastq files using Cell Ranger multi. -
mapping/samtools_sort: Sort and (optionally) index alignments. -
mapping/htseq_count: Quantify gene expression for subsequent testing for differential expression. -
mapping/htseq_count_to_h5mu: Convert one or more HTSeq outputs to a MuData file. -
Added from
convert/from_cellranger_multi_to_h5mucomponent.
MAJOR CHANGES
-
convert/from_velocyto_to_h5mu: Moved tovelocity/velocyto_to_h5mu.
It also now accepts an optional--input_h5muargument, to allow directly reading
the RNA velocity data into a.h5mufile containing the other modalities. -
resources_test/cellranger_tiny_fastq: Include RNA velocity computations as part of
the script. -
mapping/cellranger_mkfastq: remove --memory and --cpu arguments as (resource management is automatically provided by viash).
MINOR CHANGES
-
Several components: use
gzipcompression for writing .h5mu files. -
Default value for
obs_covariatesargument of full pipeline is nowsample_id. -
Set the
tagdirective of all Nextflow components to '$id'.
BUG FIXES
-
Keep data for modalities that are not specifically enabled when running full pipeline.
-
Fix many components thanks to Viash 0.6.4, which causes errors to be
thrown when input and output files are defined but not found.
openpipeline 0.5.1
BREAKING CHANGES
-
reference/make_reference: Input files changed fromtype: stringtotype: fileto allow Nextflow to cache the input files fetched from URL. -
several components (except
from_h5ad_to_h5mu): the--modalityarguments no longer accept multiple values. -
Remove outdated
resources_test_scripts. -
convert/from_h5mu_to_seurat: Disabled because MuDataSeurat is currently broken, see https://github.com/PMBio/MuDataSeurat/issues/9. -
integrate/harmony: Disabled because it is currently not functioning and the alternative, harmonypy, is used in the workflows. -
dataflow/concat: Renamed --sample_names to --input_id and moved the ability to add sample id and to join the sample ids with the observation names tometadata/add_id -
Moved
dataflow/concat,dataflow/mergeanddataflow/split_modalitiesto a new namespace:dataflow. -
Moved
workflows/conversion/conversiontoworkflows/ingestion/conversion
NEW FUNCTIONALITY
-
metadata/add_id: Add an id to a column in .obs. Also allows joining the id to the .obs_names. -
workflows/ingestion/make_reference: A generic component to build a transcriptomics reference into one of many formats. -
integrate/scvi: Performs scvi integration. -
integrate/add_metadata: Add a csv containing metadata to the .obs or .var field of a mudata file. -
DataFlowHelper.nf: AddedpassthroughMap. Usage:include { passthroughMap as pmap } from "./DataFlowHelper.nf" workflow { Channel.fromList([["id", [input: "foo"], "passthrough"]]) | pmap{ id, data -> [id, data + [arg: 10]] } }Note that in the example above, using a regular
mapwould result in an exception being thrown,
that is, "Invalid method invocationcallwith arguments".A synonymous of doing this with a regular
map()would be:workflow { Channel.fromList([["id", [input: "foo"], "passthrough"]]) | map{ tup -> def (id, data) = tup [id, data + [arg: 10]] + tup.drop(2) } } -
correction/cellbender_remove_background: Eliminating technical artifacts from high-throughput single-cell RNA sequencing data. -
workflows/ingestion/cellranger_postprocessing: Add post-processing of h5mu files created from Cell Ranger data.
MAJOR CHANGES
-
workflows/utils/DataFlowHelper.nf: Added helper functionssetWorkflowArguments()andgetWorkflowArguments()to split the data field of a channel event into a hashmap. Example usage:| setWorkflowArguments( pca: [ "input": "input", "obsm_output": "obsm_pca" ] integration: [ "obs_covariates": "obs_covariates", "obsm_input": "obsm_pca" ] ) | getWorkflowArguments("pca") | pca | getWorkflowArguments("integration") | integration
-
mapping/cellranger_count: Allow passing both directories as well as individual fastq.gz files as inputs. -
convert/from_10xh5_to_h5mu: Allow reading in QC metrics, use gene ids as.obs_namesinstead of gene symbols. -
workflows/conversion: Update pipeline to use the latest practices and to get it to a working state.
MINOR CHANGES
-
dimred/umap: Streamline UMAP parameters by adding--obsm_outputparameter to allow choosing the output.obsmslot. -
workflows/multiomics/integration: Added arguments for tuning the various output slots of the integration pipeline, namely--obsm_pca,--obsm_integrated,--uns_neighbors,--obsp_neighbor_distances,--obsp_neighbor_connectivities,--obs_cluster,--obsm_umap. -
Switch to Viash 0.6.1.
-
filter/subset_h5mu: Add--modalityargument, export to VDSL3, add unit test. -
dataflow/split_modalities: Also output modality types in a separate csv.
BUG FIXES
-
convert/from_bd_to_10x_molecular_barcode_tags: Replaced UTF8 characters with ASCII. OpenJDK 17 or lower might throw the following exception when trying to read a UTF8 file:java.nio.charset.MalformedInputException: Input length = 1. -
dataflow/concat: Overriding sample name in .obs no longer raisesAttributeError. -
dataflow/concat: Fix false positives when checking for conflicts in .obs and .var when using--mode move.
openpipeline 0.5.0
Major redesign of the integration and multiomic workflows. Current list of workflows:
-
ingestion/bd_rhapsody: A generic pipeline for running BD Rhapsody WTA or Targeted mapping, with support for AbSeq, VDJ and/or SMK. -
ingestion/cellranger_mapping: A pipeline for running Cell Ranger mapping. -
ingestion/demux: A generic pipeline for running bcl2fastq, bcl-convert or Cell Ranger mkfastq. -
multiomics/rna_singlesample: Processing unimodal single-sample RNA transcriptomics data. -
multiomics/rna_multisample: Processing unimodal multi-sample RNA transcriptomics data. -
multiomics/integration: A pipeline for demultiplexing multimodal multi-sample RNA transcriptomics data. -
multiomics/full_pipeline: A pipeline to analyse multiple multiomics samples.
BREAKING CHANGES
- Many components: Renamed
.var["gene_ids"]and.var["feature_types"]to.var["gene_id"]and.var["feature_type"].
DEPRECATED
-
convert/from_10xh5_to_h5adandconvert/from_bdrhap_to_h5ad: Removed h5ad based components. -
mapping/bd_rhapsody_wtaandworkflows/ingestion/bd_rhapsody_wta: Deprecated in favour for more genericmapping/bd_rhapsodyandworkflows/ingestion/bd_rhapsodypipelines. -
convert/from_csv_to_h5mu: Disable until it is needed again. -
integrate/concat: Deprecated"concat"option for--other_axis_mode.
NEW COMPONENTS
-
graph/bbknn: Batch balanced KNN. -
transform/scaling: Scale data to unit variance and zero mean. -
mapping/bd_rhapsody: Added generic component for running the BD Rhapsody WTA or Targeted analysis, with support for AbSeq, VDJ and/or SMK. -
integrate/harmonyandintegrate/harmonypy: Run a Harmony integration analysis (R-based and Python-based, respectively). -
integrate/scanorama: Use Scanorama to integrate different experiments. -
reference/make_reference: Download a transcriptomics reference and preprocess it (adding ERCC spikeins and filtering with a regex). -
reference/build_bdrhap_reference: Compile a reference into a STAR index in the format expected by BD Rhapsody.
NEW WORKFLOWS
-
workflows/ingestion/bd_rhapsody: Added generic workflow for running the BD Rhapsody WTA or Targeted analysis, with support for AbSeq, VDJ and/or SMK. -
workflows/multiomics/full_pipeline: Implement pipeline for processing multiple multiomics samples.
NEW FUNCTIONALITY
-
convert/from_bdrhap_to_h5mu: Added support for being able to deal with WTA, Targeted, SMK, AbSeq and VDJ data. -
integrate/concat: Added"move"option to--other_axis_mode, which allows merging.obsand.varby only keeping elements of the matrices which are the same in each of the samples, moving the conflicting values to.varmor.obsm.
MAJOR CHANGES
-
Multiple components: Update to anndata 0.8 with mudata 0.2.0. This means that the format of the
.h5mufiles have changed. -
multiomics/rna_singlesample: Move transformation counts into layers instead of overwriting.X. -
Updated to Viash 0.6.0.
MINOR CHANGES
-
velocity/velocyto: Allow configuring memory and parallellisation. -
cluster/leiden: Add--obsp_connectivitiesparameter to allow choosing the output slot. -
workflows/multiomics/rna_singlesample,workflows/multiomics/rna_multisampleandworkflows/multiomics/integration: Allow choosing the output paths. -
neighbors/bbknnandneighbors/find_neighbors: Add parameters for choosing the input/output slots. -
dimred/pcaanddimred/umap: Add parameters for choosing the input/output slots. -
integrate/concat: Optimize concat performance by adding multiprocessing and refactoring functions. -
workflows/multimodal_integration: Addobs_covariatesargument to pipeline.
BUG FIXES
-
Several components: Revert using slim versions of containers because they do not provide the tools to run nextflow with trace capabilities.
-
integrate/concat: Fix an issue where joining boolean values causedTypeError. -
workflows/multiomics/rna_multisample,workflows/multiomics/rna_singlesampleandworkflows/multiomics/integration: Use nextflow trace reporting when running integration tests.
openpipeline 0.4.1
BUG FIXES
workflows/ingestion/bd_rhapsody_wta: use ':' as a separator for multiple input files and fix integration tests.
MINOR CHANGES
- Several components: pin mudata and scanpy dependencies so that anndata version <0.8.0 is used.
openpipeline 0.4.0
NEW FUNCTIONALITY
-
convert/from_bdrhap_to_h5mu: Merge one or more BD rhapsody outputs into an h5mu file. -
split/split_modalities: Split the modalities from a single .h5mu multimodal sample into seperate .h5mu files. -
integrate/concat: Combine data from multiple samples together.
MINOR CHANGES
-
mapping/bd_rhapsody_wta: Update to BD Rhapsody 1.10.1. -
mapping/bd_rhapsody_wta: Add parameters for overriding the minimum RAM & cores. Add--dryrunparameter. -
Switch to Viash 0.5.14.
-
convert/from_bdrhap_to_h5mu: Update to BD Rhapsody 1.10.1. -
resources_test/bdrhap_5kjrt: Add subsampled BD rhapsody datasets to test pipeline with. -
resources_test/bdrhap_ref_gencodev40_chr1: Add subsampled reference to test BD rhapsody pipeline with. -
integrate/merge: Merge several unimodal .h5mu files into one multimodal .h5mu file. -
Updated several python docker images to slim version.
-
mapping/cellranger_count_split: update container from ubuntu focal to ubuntu jammy -
download/sync_test_resources: update AWS cli tools from 2.7.11 to 2.7.12 by updating docker image -
download/download_file: now uses bash container instead of python. -
mapping/bd_rhapsody_wta: Use squashed docker image in which log4j issues are resolved.
BUG FIXES
-
workflows/utils/WorkflowHelper.nf: Renamedutils.nftoWorkflowHelper.nf. -
workflows/utils/WorkflowHelper.nf: Fix error message when required parameter is not specified. -
workflows/utils/WorkflowHelper.nf: Added helper functions:readConfig: Read a Viash config from a yaml file.viashChannel: Create a channel from the Viash config and the params object.helpMessage: Print a help message and exit.
-
mapping/bd_rhapsody_wta: Update picard to 2.27.3.
DEPRECATED
-
convert/from_bdrhap_to_h5ad: Deprecated in favour forconvert/from_bdrhap_to_h5mu. -
convert/from_10xh5_to_h5ad: Deprecated in favour forconvert/from_10xh5_to_h5mu.
openpipeline 0.3.1
NEW FUNCTIONALITY
bin/port_from_czbiohub_utilities.sh: Added helper script to import components and pipelines fromczbiohub/utilities
Imported components from czbiohub/utilities:
-
demux/cellranger_mkfastq: Demultiplex raw sequencing data. -
mapping/cellranger_count: Align fastq files using Cell Ranger count. -
mapping/cellranger_count_split: Split 10x Cell Ranger output directory into separate output fields.
Imported workflows from czbiohub/utilities:
-
workflows/1_ingestion/cellranger: Use Cell Ranger to preprocess 10x data. -
workflows/1_ingestion/cellranger_demux: Use cellranger demux to demultiplex sequencing BCL output to FASTQ. -
workflows/1_ingestion/cellranger_mapping: Use cellranger count to align 10x fastq files to a reference.
MINOR CHANGES
-
Fix
interactive/run_cirrocumulusscript raisingNotImplementedErrorcaused by usingMutData.var_names_make_unique()
on each modality instead of on the wholeMuDataobject. -
Fix
transform/normalize_totalandinteractive/run_cirrocumuluscomponent build missing a hdf5 dependency. -
interactive/run_cellxgene: Updated container to ubuntu:focal because it contains python3.6 but cellxgene dropped python3.6 support. -
mapping/bd_rhapsody_wta: Set--parallelto true by default. -
mapping/bd_rhapsody_wta: Translate Bash script into Python. -
download/sync_test_resources: Add--dryrun,--quiet, and--deletearguments. -
convert/from_h5mu_to_seurat: Useeddelbuettel/r2u:22.04docker container in order to speed up builds by downloading precompiled R packages. -
mapping/cellranger_count: Use 5Gb for testing (to adhere to github CI runner memory constraints). -
convert/from_bdrhap_to_h5ad: change test data to output frommapping/bd_rhapsody_wtaafter reducing the BD Rhapsody test data size. -
Various
config.vsh.yamls: Renamedvalues:tochoices:. -
download/download_fileandtransfer/publish: Switch base container frombash:5.1topython:3.10. -
mapping/bd_rhapsody_wta: Make sure procps is installed.
BUG FIXES
-
mapping/bd_rhapsody_wta: Use a smaller test dataset to reduce test time and make sure that the Github Action runners do not run out of disk space. -
download/sync_test_resources: Disable the use of the Amazon EC2 instance metadata service to make script work on Github Actions runners. -
convert/from_h5mu_to_seurat: Fix unit test requiring Seurat by using native R functions to test the Seurat object instead. -
mapping/cellranger_countandbcl_demus/cellranger_mkfastq: cellranger uses the--parameter=valueformatting instead of--parameter valueto set command line arguments. -
mapping/cellranger_count:--nosecondaryis no longer always applied. -
mapping/bd_rhapsody_wta: Added workaround for bug in Viash 0.5.12 where triple single quotes are incorrectly escaped (viash-io/viash#139).
DEPRECATED
bcl_demux/cellranger_mkfastq: Duplicate ofdemux/cellranger_mkfastq.