Releases: openpipelines-bio/openpipeline
openpipelines.bio v3.0.0
BREAKING CHANGES
-
transfer/publish: remove component after deprecating it in 2.1.0 (PR #1019). -
Removed
split_h5mu_train_testcomponent (PR #1020). -
tar_extracthas been deprecated and will be removed in openpipeline 4.0 (PR #1014). Use vsh://toolbox/bgzip instead. -
compress_h5mu: renamecompressionargument tooutput_compression(PR #1017, PR #1018). -
delimit_fraction: remove unusedlayerargument (PR #1018). -
download_filehas been deprecated and will be removed in openpipeline 3.0 (PR #1015). -
scarches: Loading of legacy models no longer asumes the model to based on SCANVI. An argument (reference_class) was added which need to be set in this case (PR #1035). -
convert/from_h5mu_to_seurathas been deprecated and will be removed in openpipeline 4.0. Useconvert/from_h5mu_or_h5ad_to_seuratinstead (PR #1046).
NEW FUNCTIONALITY
-
liana: enabled jobs to be run in parallel and added two new arguments:consensus_opts,de_method(PR #1039) -
from_h5mu_or_h5ad_to_seurat: converts an h5ad file or a single modality from an h5mu file to a seurat object (PR #1046).
EXPERIMENTAL
Warning: These experimental features are subject to change in future releases.
-
Added
from_h5mu_or_h5ad_to_tiledbcomponent (PR #1034). -
Added
differential_expression/create_pseudobulk: Generation of pseudobulk samples from single-cell transcriptomics data,
to create bulk-like expression profiles suitable for differential expression analysis with methods designed for bulk differential expression analysis (PR #1042). -
Added
annotate/singler: Cell type annotation using SingleR (PR #1051). -
Added
tiledb/move_mudata_obsm_to_tiledb(PR #1065).
MAJOR CHANGES
-
mapping/cellranger_*: Upgrade CellRanger to v9.0 (PR #992 and #1006). -
leiden: bump base container to 3.13 (PR #1030). -
scanvi,scarches,scviandtotalvi: bump scvi-tools to1.3.1and base image tonvcr.io/nvidia/pytorch:25.05-py3(PR #1035). -
lianapy: update liana to1.5.0(PR #1039)
MINOR CHANGES
-
velocyto: pin base container topython:3.10-slim-bookworm(PR #1063). -
mapping/cellranger_multi: The output from Cell Ranger is now displayed as Cell Ranger is running (PR #1045). -
Remove
workflowsdirectory (PR #993). The workflows which were at one point in this directory were all deprecated and moved tosrc/workflows. -
Move output file compression argument for AnnData and MuData files to a base config file (
src/base/h5_compression_argument.yaml) (PR #1017). -
Add missing descriptions to components and arguments (PR #1018).
-
Add
scopeto component and workflow configurations (see https://viash.io/reference/config/scope.html) (PR #1013 and #1032). -
workflows/multiomics/process_samples: Add optional--skip_scrublet_doublet_detectionflag to bypass Scrublet doublet detection. Scrublet doublet detection runs by default and can now be optionally disabled (PR #1049). -
Nextflow runner: use
resourceLimitsdirective in the labels config to set a global limit on the memory (PR #1060).
BUG FIXES
-
cellranger_multi: Fix error when running Cell Ranger without any computational resources specified (PR #1056) -
Bump viash to 0.9.4. This adds support for nextflow versions starting major version 25.01 and fixes an issue where an integer being passed to a argument with
type: doubleresulted in an error (PR #1016). -
Fix running
neigbors_leiden_umapworkflow with-stubenabled (PR #1026). -
Add missing CUDA enabled
jaxlibto components that usescvi-tools(scanvi,scarches,scviandtotalvi) (PR #1028) -
leiden: fix issue where the logging system was shut down prematurely after the calculations were done (PR #1030) -
Added missing
gpulabel toscarchescomponent (PR #1027). -
conversion/from_cellranger_multi_to_h5mu: fix conversion to MuData for experiments that combine probe barcodes with other feature barcodes (e.g. Antibody Capture and CIRSPR Guide Capture) (PR #1062).
openpipelines.bio v2.1.2
openpipelines.bio v2.1.1
openpipelines.bio v2.0.1
OpenPipelines.bio v1.0.5
OpenPipelines.bio v2.1.0
BREAKING CHANGES
-
Deprecation of
metadata/duplicate_obsandmetadata/duplicate_varcomponents (PR #952). -
Deprecation of
workflows/annotation/scgpt_integration_knncomponent (PR #952). -
annotate/scanvi: Remove scarches functionality from this component, as it is already covered inintegrate/scarches(PR #986).
NEW FUNCTIONALITY
-
dataflow/concatenate_h5mu: addmodalityparameter (PR #977). -
filter_with_scrublet: addexpected_doublet_rate,stdev_doublet_rate,n_neighborsandsim_doublet_ratioarguments (PR #974). -
feature_annotation/aling_query_reference: Added a component to align a query and reference dataset (PR #948, #958, #972). -
workflows/qc/qcworkflow: Added ribosomal gene detection (PR #961). -
workflows/rna/rna_singlesample,workflows/multiomics/process_samplesworkflows: Added ribosomal gene detection (PR #968). -
scanvi: enable CUDA acceleration (PR #969). -
workflows/annotation/scvi_knnworkflow: Cell-type annotation based on scVI integration followed by KNN label transfer (PR #954). -
convert/from_h5ad_to_seurat: Add component to convert from h5ad to Seurat (PR #980). -
workflows/annotation/scanvi_scarchesworkflow: Cell-type annotation based on scANVI integration and annotation with scArches for reference mapping (PR #898). -
integrate/scarches: Implemented functionality to align the query dataset with the model registry and extend functionality to predict labels for scANVI models (PR #898). -
workflows/annotation/harmony_knnworkflow: Cell-type annotation based on harmony integration with KNN label transfer (PR #836). -
from_cellranger_multi_to_h5mu: add support forcustommodality (PR #982). -
integrate/scvi: Enable passing any .var field for gene name information instead of .var index, using the--var_gene_namesparameter (PR #986).
MAJOR CHANGES
-
Several components: when a component processes a single modality, only that modality is read into memory (PR #944)
-
The
transfer/publishcomponent is deprecated and will be removed in a future major release (PR #941).
MINOR CHANGES
-
Bump viash to
0.9.3(PR #995). -
Several workflows: refactor neighbors, leiden and UMAP in a separate subworkflow (PR #942 and PR #949).
-
grep_annotation_columnandsubset_obsp: Fix compatibility for SciPy (PR #945). -
popv: Pin numpy<2 after new release of scvi-tools (PR #946). -
Various components (
scgptandannotate): Add resource labels (PR #947, PR #950). -
feature_annotation/highly_variable_features_scanpy: Enable calculation of HVG on a subset of genes (PR #957, PR #959). -
integrate/scvi,integrate/totalviandintegrate/scarches: update base image to nvcr.io/nvidia/pytorch:24.12-py3, pin scvi-tools version to 1.1.5, unpin jax and jaxlib version (PR #970). -
annotate/celltypist: Enable passing any layer with log normalized counts, enforce checking whether counts are log normalized (PR #971). -
process_10xh5/filter_10xh5: update container base to ubuntu 24.04 (PR #983).
BUG FIXES
-
Fix
-stubruns (PR #1000). -
cluster/leiden: Fix an issue where insufficient shared memory (size of/dev/shm) causes the processing to hang. -
utils/subset_vars: Convert .var column used for subsetting of dtype "boolean" to dtype "bool" when it doesn't contain NaN values (PR #959). -
resources_test_scripts/annotation_test_data.sh: Add a layer to the annotation reference dataset with log normalized counts (PR #960). -
annotate/celltypist: Fix missing values in annotation column caused by index misalignment (PR #976). -
workflows/annotation/scgpt_annotationandworkflows/integrate/scgpt_leiden: Parameterization of HVG flavor with default methodcell_rangerinstead ofseurat_v3(PR #979). -
dataflow/merge: Resolved an issue where merging two MuData objects with overlappingvarorobscolumns sometimes resulted in an unsupported nullable dtype (PR #990), for instance when mergingpd.IntegerDtypeandpd.FloatDtype. These columns are now correctly cast to their native numpy dtypes before writing. -
workflows/annotation/harmony_knn: Only process RNA modality in the workflow (PR #988). -
Documentation CI: Fix building the documentation using CI (PR #1003).
OpenPipelines.bio v2.1.0-rc.2
BUG FIXES
- Fix
-stubruns (PR #1000).
OpenPipelines.bio v2.1.0-rc.1
BREAKING CHANGES
-
Deprecation of
metadata/duplicate_obsandmetadata/duplicate_varcomponents (PR #952). -
Deprecation of
workflows/annotation/scgpt_integration_knncomponent (PR #952). -
annotate/scanvi: Remove scarches functionality from this component, as it is already covered inintegrate/scarches(PR #986).
NEW FUNCTIONALITY
-
dataflow/concatenate_h5mu: addmodalityparameter (PR #977). -
filter_with_scrublet: addexpected_doublet_rate,stdev_doublet_rate,n_neighborsandsim_doublet_ratioarguments (PR #974). -
feature_annotation/aling_query_reference: Added a component to align a query and reference dataset (PR #948, #958, #972). -
workflows/qc/qcworkflow: Added ribosomal gene detection (PR #961). -
workflows/rna/rna_singlesample,workflows/multiomics/process_samplesworkflows: Added ribosomal gene detection (PR #968). -
scanvi: enable CUDA acceleration (PR #969). -
workflows/annotation/scvi_knnworkflow: Cell-type annotation based on scVI integration followed by KNN label transfer (PR #954). -
convert/from_h5ad_to_seurat: Add component to convert from h5ad to Seurat (PR #980). -
workflows/annotation/scanvi_scarchesworkflow: Cell-type annotation based on scANVI integration and annotation with scArches for reference mapping (PR #898). -
integrate/scarches: Implemented functionality to align the query dataset with the model registry and extend functionality to predict labels for scANVI models (PR #898). -
workflows/annotation/harmony_knnworkflow: Cell-type annotation based on harmony integration with KNN label transfer (PR #836). -
from_cellranger_multi_to_h5mu: add support forcustommodality (PR #982). -
integrate/scvi: Enable passing any .var field for gene name information instead of .var index, using the--var_gene_namesparameter (PR #986).
MAJOR CHANGES
-
Several components: when a component processes a single modality, only that modality is read into memory (PR #944)
-
The
transfer/publishcomponent is deprecated and will be removed in a future major release (PR #941).
MINOR CHANGES
-
Bump viash to
0.9.3(PR #995). -
Several workflows: refactor neighbors, leiden and UMAP in a separate subworkflow (PR #942 and PR #949).
-
grep_annotation_columnandsubset_obsp: Fix compatibility for SciPy (PR #945). -
popv: Pin numpy<2 after new release of scvi-tools (PR #946). -
Various components (
scgptandannotate): Add resource labels (PR #947, PR #950). -
feature_annotation/highly_variable_features_scanpy: Enable calculation of HVG on a subset of genes (PR #957, PR #959). -
integrate/scvi,integrate/totalviandintegrate/scarches: update base image to nvcr.io/nvidia/pytorch:24.12-py3, pin scvi-tools version to 1.1.5, unpin jax and jaxlib version (PR #970). -
annotate/celltypist: Enable passing any layer with log normalized counts, enforce checking whether counts are log normalized (PR #971). -
process_10xh5/filter_10xh5: update container base to ubuntu 24.04 (PR #983).
BUG FIXES
-
cluster/leiden: Fix an issue where insufficient shared memory (size of/dev/shm) causes the processing to hang. -
utils/subset_vars: Convert .var column used for subsetting of dtype "boolean" to dtype "bool" when it doesn't contain NaN values (PR #959). -
resources_test_scripts/annotation_test_data.sh: Add a layer to the annotation reference dataset with log normalized counts (PR #960). -
annotate/celltypist: Fix missing values in annotation column caused by index misalignment (PR #976). -
workflows/annotation/scgpt_annotationandworkflows/integrate/scgpt_leiden: Parameterization of HVG flavor with default methodcell_rangerinstead ofseurat_v3(PR #979). -
dataflow/merge: Resolved an issue where merging two MuData objects with overlappingvarorobscolumns sometimes resulted in an unsupported nullable dtype (e.g. mergingpd.IntegerDtypeandpd.FloatDtype). These columns are now correctly cast to their native numpy dtypes before writing(PR #990). -
workflows/annotation/harmony_knn: Only process RNA modality in the workflow (PR #988).
OpenPipelines.bio v1.0.4
OpenPipelines.bio v2.0.0
BREAKING CHANGES
-
velocity/scvelo: updatescveloto0.3.3, which also removes support for usingloominput files. The component now uses aMuDataobject as input. Several arguments were added to support selecting different inputs from the MuData file:counts_layer,modality,layer_spliced,layer_unspliced,layer_ambiguous. Anoutput_h5muargument was has been added (PR #932). -
src/annotate/onclassandsrc/annotate/celltypist: Input parameter for gene name layers of input datasets has been updated to--input_var_gene_namesandreference_var_gene_names(PR #919). -
Several components under
src/scgpt(cross_check_genes,tokenize_pad,binning) now processes the input (query) datasets differently. Instead of subsetting datasets based on genes in the model vocabulary and/or highly variable genes, these components require an input .var column with a boolean mask specifying this information. The results are written back to the original input data, preserving the dataset structure (PR #832). -
query/cellxgene_census: The default output layer has been changed from.layers["counts"]to.Xto be more aligned with the standard OpenPipelines format (PR #933).
Use argument--output_layer_counts countsto revert the behaviour to the previous default. -
Added cell multiplexing support to the
from_cellranger_multi_to_h5mucomponent and thecellranger_multiworkflow. For thefrom_cellranger_multi_to_h5mucomponent, theoutputargument now requires a value containing a wildcard character*, which will be replaced by the sample ID to form the final output file names. Additionally, asample_csvargument is added to thefrom_cellragner_multi_to_h5mucomponent which describes the sample name per output file. No change is required for theoutput_h5muargument from thecellranger_multiworkflow, the workflow will just emit multiple events in case of a multiplexed run, one for each sample. The id of the events (and default output file names) are set by--sample_ids(in case of cell multiplexing), or (as before) by the user providedidfor the input (PR #803 and PR #902). -
demux/bcl_convert: update BCL convert from 3.10 to 4.2 (PR #774). -
demux/cellranger_mkfastq,mapping/cellranger_count,mapping/cellranger_multiandreference/build_cellranger_reference: update cellranger to8.0.1(PR #774 and PR #811). -
Removed
--disable_library_compatibility_checkin favour of--check_library_compatibilityto themapping/cellranger_multicomponent and theingestion/cellranger_multiworkflow (PR #818). -
lianapy: bumped version to1.3.0(PR #827 and PR #862). Additionally,groupbyis now a required argument. -
concat: this component was deprecated and has now been removed, useconcatenate_h5muinstead (PR #796). -
The
workflowsfolder in the root of the project no longer contains symbolic links to the build workflows intarget.
Using any workflows that was previously linked in this directory will now result in an error which will indicate
the location of the workflow to be used instead (PR #796). -
XGBoost: bump version to2.0.3(PR #646). -
Several components: update anndata to
0.11.1and mudata to0.3.1(PR #645 and PR #901), and scanpy to1.10.4(PR #901). -
filter/filter_with_hvg: this component was deprecated and has now been removed. Usefeature_annotation/highly_variable_features_scanpyinstead (PR #843). -
dataflow/concat: this component was deprecated and has now been removed. Usedataflow/concatenate_h5muinstead (PR #857). -
convert/from_h5mu_to_seurat: bump seurat to latest version (PR #850). -
workflows/ingestion/bd_rhapsody: Upgrade BD Rhapsody 1.x to 2.x, thereby changing the interface of the workflow (PR #846). -
mapping/bd_rhapsody: Upgrade BD Rhapsody 1.x to 2.x, thereby changing the interface of the workflow (PR #846). -
reference/make_bdrhap_reference: Upgrade BD Rhapsody 1.x to 2.x, thereby changing the interface of the workflow (PR #846). -
reference/build_star_reference: Renamemapping/star_build_referencetoreference/build_star_reference(PR #846). -
reference/cellranger_mkgtf: Renamereference/mkgtftoreference/cellranger_mkgtf(PR #846). -
labels_transfer/xgboost: Align interface with new annotation workflow- Store label probabilities instead of uncertainties
- Take
.h5muformat as an input instead of.h5ad
-
reference/build_cellranger_arc_reference: a default value of "output" is now specified for the argument--genome, inline withreference/build_cellranger_referencecomponent. Additionally, providing a value for--organismis no longer required and its default value ofHomo Sapienshas been removed (PR #864).
NEW FUNCTIONALITY
Important
Workflows from the workflows/annotation and workflows/integration/scgpt_leiden namespaces, plus their newly implemented dependencies, are not yet considered to be part of the stable public API. Their functionality and interface may be subject to change.
-
velocyto_to_h5mu: now writes counts to.X(PR #932) -
qc/calculate_atac_qc_metrics: new component for calculating ATAC QC metrics (PR #868). -
workflows/annotation/scgpt_integration_knnworkflow: Cell-type annotation based on scGPT integration with KNN label transfer (PR #875). -
CI: Use
params.resources_testin test workflows in order to point to an alternative location (e.g. a cache) (PR #889). -
Added
demux/cellranger_atac_mkfastqcomponent: demultiplex raw sequencing data for ATAC experiments (PR #726). -
process_samples,process_batchesandrna_multisampleworkflows: added functionality to scale the log-normalized
gene expression data to unit variance and zero mean. The scaled data will be output to a different layer and the
representation with reduced dimensions will be created and stored in addition to the non-scaled data (PR #733). -
transform/scaling: add--input_layerand--output_layerarguments (PR #733). -
CI: added checking of mudata contents for multiple workflows (PR #783).
-
Added multiple arguments to the
cellranger_multiworkflow in order to maintain feature parity with themapping/cellranger_multicomponent (PR #803). -
convert/from_cellranger_to_h5mu: add support for antigen analysis. -
Added
demux/cellranger_atac_mkfastqcomponent: demultiplex raw sequencing data for ATAC experiments (PR #726). -
Added
reference/build_cellranger_referencecomponent: build reference file compatible with ATAC and ATAC+GEX experiments (PR #726). -
demux/bcl_convert: add support for no lane splitting (PR #804). -
reference/cellranger_mkgtfcomponent: Added cellranger mkgtf as a standalone component (PR #771). -
scgpt/cross_check_genescomponent: Added a gene-model cross check component for scGPT (PR #758). -
scgpt/embedding: component: Added scGPT embedding component (PR #761) -
scgpt/tokenize_pad: component: Added scGPT padding and tokenization component (PR #754). -
scgpt/binningcomponent: Added a scGPT pre-processing binning component (PR #765). -
workflows/integration/scgpt_leidenworkflow with scGPT integration followed by Leiden clustering (PR #794). -
scgpt/cell_type_annotationcomponent: Added scGPT cell type annotation component (PR #798). -
resources_test_scripts/scGPT.sh: Added script to include scGPT test resources (PR #800). -
transform/clrcomponent: Added the option to set theaxisalong which to apply CLR. Possible to override
on workflow level as well (PR #767). -
annotate/celltypistcomponent: Added a CellTypist annotation component (PR #825). -
dataflow/split_h5mucomponent: Added a component to split a single h5mu file into multiple h5mu files based on the values of an .obs column (PR #824). -
workflows/test_workflows/ingestioncomponents &workflows/ingestion: Added standalone components for integration testing of ingestion workflows (PR #801). -
workflows/ingestion/make_reference: Add additional arguments passed through to the STAR and BD Rhapsody reference components (PR #846). -
annotate/random_forest_annotationcomponent: Added a random forest cell type annotation component (PR #848). -
dataflow/concatenate_h5mu: data from.uns, both originating from the global and per-modality slots, is now retained in the final concatenated output object. Additionally, added theuns_merge_modeargument in order to tune the behavior when conflicting keys are detected across samples (PR #859). -
dimred/densmapcomponent: Added a densMAP dimensionality reduction component (PR #748). -
annotate/scanvicomponent: Added a component to annotate cells using scANVI (PR #833). -
transform/bpcells_regress_outcomponent: Added a component to regress out effects of confounding variables in the count matrix using BPCells (PR #863). -
transform/regress_out: Allow providing 'input' and 'output' layers for scanpy regress_out functionality (PR #863). -
workflows/ingestion/make_reference: add possibility to build CellRanger ARC references. Added--motifs_file,--non_nuclear_contigsand--output_cellranger_arcarguments (PR #864). -
Test resources (reference_gencodev41_chr1): switch reference genome for CellRanger to ARC variant (PR #864).
-
transform/bpcells_regress_outcomponent: Added a component to regress out effects of confounding variables in the count matrix using BPCells (PR #863). -
transform/regress_out: Allow providing 'input' and 'output' layers for scanpy regress_out functionality (PR #863). -
Added
transform/tfidfcomponent: normalize ATAC data with TF-IDF (PR #870). -
Added
dimred/lsicomponent (PR #552). -
metadata/duplicate_obscomponent: Added a component to make a copy from one .obs field or index to another .obs field within...