Releases: openpipelines-bio/openpipeline
OpenPipelines.bio v0.12.3
BUG FIXES
dataflow/concat: Fix an issue where joining columns with different datatypes causedTypeError(PR #619).dataflow/concat: FixTypeErrorwhen using mode 'move' and a column with conflicting metadata does not exist across all samples (PR #631).
Full Changelog: 0.12.2...0.12.3
OpenPipelines.bio v0.12.2
BUG FIXES
dataflow/concatanddataflow/concatenate_h5mu: Fix an issue where using--mode moveon samples with non-overlapping features would causevar_namesto become unaligned to the data (PR #653).
OpenPipelines.bio v0.11.1
BUG FIXES
dataflow/concat: Fix an issue where using--mode moveon samples with non-overlapping features would causevar_namesto become unaligned to the data (PR #653).
OpenPipelines.bio v0.12.1
BUG FIXES
- rna_singlesample: Fix filtering parameters values min_counts, max_counts, min_genes_per_cell, max_genes_per_cell and min_cells_per_gene not being passed to the filter_with_counts component (PR #614).
- prot_singlesample: Fix filtering parameters values min_counts, max_counts, min_proteins_per_cell, max_proteins_per_cell and min_cells_per_protein not being passed to the filter_with_counts component (PR #614).
OpenPipelines.bio v0.12.0
BREAKING CHANGES
The detection of mitochondrial genes has been revisited in order to remove the interdependency with the count filtering and the QC metric calculation.
Implementing this changes involved breaking some existing functionality:
-
filter/filter_with_counts: removed--var_gene_names,--mitochondrial_gene_regex,--var_name_mitochondrial_genes,--min_fraction_mitoand--max_fraction_mito(PR #585). -
workflows/prot_singlesample: removed--min_fraction_mitoand--max_fraction_mitobecause regex-based detection detection of mitochondrial genes is not possible (PR #585). -
The fraction of counts that originated from mitochondrial genes used to be written to an .obs column with a name that was derived from
pct_suffixed by the name of the mitochondrial gene column. The--obs_name_mitochondrial_fractionargument is introduced to change the destination column and the default prefix has changed frompct_tofraction_(PR #585).
NEW FUNCTIONALITY
-
workflows/qc: A pipeline to add basic qc statistics to a MuData object (PR #585). -
workflows/rna_singlesample: added--obs_name_mitochondrial_fractionand make sure that the values from--max_fraction_mitoand--min_fraction_mitoare bound between 0 and 1 (PR #585). -
Added
filter/delimit_fraction: Turns an annotation column containing values between 0 and 1 into a boolean column based on thresholds (PR #585). -
Added
metadata/grep_annotation_column: Perform a regex lookup on a column from the annotation matrices .obs or .var (PR #585). -
workflows/full_pipelines: added--obs_name_mitochondrial_fractionargument (PR #585). -
workflows/prot_multisample: added--var_qc_metricsand--top_n_varsarguments (PR #585).
MINOR CHANGES
OpenPipelines.bio v0.11.0
BREAKING CHANGES
-
Nextflow VDSL3: set
simplifyOutputtoFalseby default. This implies that components and workflows will output a hashmap with a sole "output" entry when there is only one output (PR #563). -
integrate/scvi: renamemodel_outputargument tooutput_modelin order to align with thescvi_leidenworkflow. This also fixes a bug with the workflow where the argument did not function (PR #562).
MINOR CHANGES
-
dataflow/concat: reduce memory consumption when using--other_axis_mode moveby processing only one annotation matrix (.var,.obs) at a time (PR #569). -
convert/from_h5ad_to_h5mu,convert/from_h5mu_to_h5ad,dimred/pca,dimred/umap/,
filter/filter_with_counts,filter/filter_with_hvg,filter/remove_modality,filter/subset_h5mu,
integrate/scanorama,transform/delete_layerandtransform/log1p: update python to3.9(PR #572). -
integrate/scarches: update base image,scvi-toolsandpandastonvcr.io/nvidia/pytorch:23.09-py3,~=1.0.3and~=2.1.0respectively (PR #572). -
integrate/totalvi: update python to 3.9 and scvi-tools to~=1.0.3(PR #572). -
correction/cellbender_remove_background: change base image tonvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04and downwgrade MuData to 0.2.1 because it is the oldest version that uses python 3.7 (PR #575). -
Several integration workflows: prevent leiden from being executed when no resolutions are provided (PR #583).
-
dataflow/concat: bump pandas to ~=2.1.1 and reduce memory consumption by only reading one modality into memory at a time (PR #568). -
annotate/popv: bumpjaxandjaxlibto0.4.10, scanpy to1.9.4, scvi to1.0.3and pinml-dtypesto < 0.3.0 (PR #565). -
velocity/scvelo: pin matplotlib to < 3.8.0 (PR #566). -
mapping/multi_star: pin multiqc to 1.15.0 (PR #566). -
mapping/bd_rhapsody: pin pandas version to <2 (PR #563). -
query/cellxgene_census: replaced labelsinglecpuwith labelmidcpu. -
query/cellxgene_census: avoid creating MuData object in memory by writing the modality directly to disk (PR #558). -
integrate/scvi: usemidcpulabel instead ofsinglecpu(PR #561).
BUG FIXES
-
transform/clr: raise an error when CLR fails to return the requested output (PR #579). -
correction/cellbender_remove_background: fix missing helper functionality when using Fusion (PR #575). -
convert/from_bdrhap_to_h5mu: AvoidTypeError: Can't implicitly convert non-string objects to stringsby using categorical dtypes when a string column contains NA values (PR #563). -
qc/calculate_qc_metrics: fix calculating mitochondrial gene related QC metrics when only or no mitochondrial genes were found (PR #564).
OpenPipelines.bio v0.10.0
BREAKING CHANGES
-
workflows/full_pipeline: removed--prot_min_fraction_mitoand--prot_max_fraction_mito(PR #451) -
workflows/rna_multisampleandworkflows/prot_multisample: Removed concatenation from these pipelines. The input for these pipelines is now a single mudata file that contains data for multiple samples. If you wish to use this pipeline on multiple single-sample mudata files, you can use thedataflow/concatcomponents on them first. This also implies that the ability to add ids to multiple single-sample mudata files prior to concatenation is no longer required, hence the removal of--add_id_to_obs,--sample_id,--add_id_obs_output, and--add_id_make_observation_keys_unique(PR #475). -
The
scvipipeline was renamed toscvi_leidenbecauseleidenclustering was added to the pipeline (PR #499). -
Upgrade
correction/cellbender_remove_backgroundfrom CellBender v0.2 to CellBender v0.3.0 (PR #523).
Between these versions, several arguments related to the slots of the output file have been changed.
MAJOR CHANGES
-
Several components: update anndata to 0.9.3 and mudata to 0.2.3 (PR #423).
-
Base resources assigned for a process without any labels is now 1 CPU and 2GB (PR #518).
-
Updated to Viash 0.7.5 (PR #513).
-
Removed deprecated
variant: vdsl3tags (PR #513). -
Removed unused
version: dev(PR #513). -
multiomics/integration/harmony_leiden: Refactored data flow (PR #513). -
ingestion/bd_rhapsody: Refactored data flow (PR #513). -
query/cellxgene_census: increased returned metadata content, revised query option, added filtering strategy and refactored functionality (PR #520). -
Refactor loggers using
setup_logger()helper function (PR #534). -
Refactor unittest tests to pytest tests (PR #534).
MINOR CHANGES
-
Add resource labels to several components (PR #518).
-
full_pipeline: default value for--var_qc_metricsis now the combined values specified for--mitochondrial_gene_regexand--filter_with_hvg_var_output. -
dataflow/concat: reduce memory consumption by only reading one modality at the same time (PR #474). -
Components that use CellRanger, BCL Convert or bcl2fastq: updated from Ubuntu 20.04 to Ubuntu 22.04 (PR #494).
-
Components that use CellRanger: updated Picard to 2.27.5 (PR #494).
-
interprete/liana: Update lianapy to 0.1.9 (PR #497). -
qc/multiqc: add unittests (PR #502). -
reference/build_cellranger_reference: add unit tests (PR #506). -
reference/build_bd_rhapsody_reference: add unittests (PR #504).
NEW FUNCTIONALITY
-
Added
compression/compress_h5mucomponent (PR #530). -
Resource management: when a process exits with a status code between 137 and 140, retry the process with increased memory requirements. Memory scales by multiplying the base memory assigned to the process with the attempt number (PR #518 and PR #527).
-
integrate/scvi: Add--n_hidden_nodes,--n_dimensions_latent_space,--n_hidden_layers,--dropout_rate,--dispersion,--gene_likelihood,--use_layer_normalization,--use_batch_normalization,--encode_covariates,--deeply_inject_covariatesand--use_observed_lib_sizeparameters. -
filter/filter_with_counts: add--var_name_mitochondrial_genesargument to store a boolean array corresponding the detected mitochondrial genes. -
full_pipelineandrna_singlesamplepipelines: add--var_name_mitochondrial_genes,--var_gene_namesand--mitochondrial_gene_regexarguments to specify mitochondrial gene detection behaviour. -
integrate/scvi: Add--obs_labels,--obs_size_factor,--obs_categorical_covariateand--obs_continuous_covariatearguments (PR #496). -
Added
var_qc_metrics_fill_na_valueargument tocalculate_qc_metrics(PR #477). -
Added
multiomics/multisamplepipeline to run multisample processing followed by the integration setup. It is considered an entrypoint into the full pipeline which skips the single-sample processing. The idea is to allow a a re-run of these steps after a sample has already been processed by thefull_pipeline. Keep in mind that samples that are provided as input to this pipeline are processed separately and are not concatenated. Hence, the input should be a concatenated sample (PR #475). -
Added
multiomics/integration/bbknn_leidenworkflow. (PR #456). -
workflows/prot_multisampleandworkflows/full_pipelines: add basic QC statistics to prot modality (PR #485). -
mapping/cellranger_multi: Add tests for the mapping of Crispr Guide Capture data (PR #494). -
convert/from_cellranger_multi_to_h5mu: addperturbation_efficiencies_by_featureandperturbation_efficiencies_by_featureinformation to .uns slot ofgdomodality (PR #494). -
convert/from_cellranger_multi_to_h5mu: addfeature_referenceinformation to the MuData object. Information is split between the modalities. For exampleCRISPR Guide Captureinformation if added to the.unsslot of thegdomodality, whileAntibody Captureinformation is added to the .uns slot ofprot(PR #494). -
Added
multiomics/integration/totalvi_leidenpipeline (PR #500). -
Added totalVI component (PR #386).
-
workflows/full_pipeline: Addpca_overwriteargument (PR #511). -
Add
main_build_viash_hubaction to build, tag, and push components and docker images for viash-hub.com (PR #480). -
integration/bbknn_leiden: Update state management tofromState/toState(PR #538).
DOCUMENTATION
-
images: Added images for various concepts, such as a sample, a cell, RNA, ADT, ATAC, VDJ (PR #515). -
multiomics/rna_singlesample: Add image for workflow (PR #515). -
multiomics/rna_multisample: Add image for workflow (PR #515). -
multiomics/prot_singlesample: Add image for workflow (PR #515). -
multiomics/prot_multisample: Add image for workflow (PR #515).
BUG FIXES
-
Fix an issue with
workflows/multiomics/scanorama_leidenwhere the--outputargument doesn't work as expected (PR #509). -
Fix an issue with
workflows/full_pipelinenot correctly caching previous runs (PR #460). -
Fix incorrect namespaces of the integration pipelines (PR #464).
-
Fix an issue in several workflows where the
--outputargument would not work (PR #476). -
integration/harmony_leidenandintegration/scanorama_leiden: Fix an issue where the prefix of the columns that store the leiden clusters was hardcoded toleiden, instead of adapting to the value for--obs_cluster(PR #482). -
velocity/velocyto: Resolve symbolic link before checking whether the transcriptome is a gzip (PR #484). -
workflows/integration/scanorama_leiden: fix an issue where--obsm_input, --obs_batch,--batch_size,--sigma,--approx,--alphaand-knn` were not working beacuse they were not passed through to the scanorama component (PR #487). -
workflows/integration/scanorama_leiden: fix leiden being calculated on the wrong embedding because the--obsm_inputargument was not correctly set to the output embedding of scanorama (PR #487). -
mapping/cellranger_multi: Fix and issue where modalities did not have the proper name (PR #494). -
metadata/add_uns_to_obs: FixKeyError: 'ouput_compression'error (PR #501). -
neighbors/bbknn: Fix--inputnot being a required argument (PR #518). -
Create
correction/cellbender_remove_background_v0.2for legacy CellBender v0.2 format (PR #523). -
integrate/scvi: Ensure output has the same dimensionality as the input (PR #524). -
mapping/bd_rhapsody: Fix--dryrunargument not working (PR #534). -
qc/multiqc: Fix component not working for multiple inputs (PR #537). Also converted Bash script to Python scripts. -
neighbors/bbknn: Fix--uns_output,--obsp_distancesand--obsp_connectivitiesnot being processed correctly (PR #538).
OpenPipelines.bio v0.10.1
MINOR CHANGES
BUG FIXES
-
integration/bbknn_leiden: Set leiden clustering parameter to multiple (#542, PR #545). -
integration/scvi_leiden: Fix component name in Viash config (PR #547). -
integration/harmony_leiden: Pass--uns_neighborsargumentumap(PR #548). -
Add workaround for bug where resources aren't available when using Nextflow fusion by including
setup_logger,subset_varsandcompress_h5muin the script itself (PR #549).
0.9.0
Openpipelines 0.9.0
BREAKING CHANGES
Running the integration in the full_pipeline deemed to be impractical because a plethora of integration methods exist, which in turn interact with other functionality (like clustering). This generates a large number of possible usecases which one pipeline cannot cover in an easy manner. Instead, each integration methods will be split into its separate pipeline, and the full_pipeline will prepare for integration by performing steps that are required by many integration methods. Therefore, the following changes were performed:
-
workflows/full_pipeline:harmonyintegration andleidenclustering are removed from the pipeline. -
Added
initialize_integrationto run calculations that output information commonly required by the integration methods. This pipeline runs PCA, nearest neighbours and UMAP. This pipeline is run as a subpipeline at the end offull_pipeline. -
Added
leiden_harmonyintegration pipeline: run harmony integration followed by neighbour calculations and leiden clustering. Also runs umap on the result. -
Removed the
integrationpipeline.
The old behavior of the full_pipeline can be obtained by running full_pipeline followed by the leiden_harmony pipeline.
-
The
crisprandhashingmodalities have been renamed togdoandhtorespectively (PR #392). -
Updated Viash to 0.7.4 (PR #390).
-
cluster/leiden: Output is now stored into.obsminstead of.obs(PR #431).
NEW FUNCTIONALITY
-
cluster/leidenandintegration/harmony_leiden: allow running leiden multiple times with multiple resolutions (PR #431). -
workflows/full_pipeline: PCA, nearest neighbours and UMAP are now calculated for theprotmodality (PR #396). -
transform/clr: addedoutput_layerargument (PR #396). -
workflows/integration/scvi: Run scvi integration followed by neighbour calculations and run umap on the result (PR #396). -
mapping/cellranger_multiandworkflows/ingestion/cellranger_multi: Added--vdj_inner_enrichment_primersargument (PR #417). -
metadata/move_obsm_to_obs: Move a matrix from an.obsmslot into.obs(PR #431). -
integrate/scvivalidity checks for non-normalized input, obs and vars in order to proceed to training (PR #429). -
schemas: Added schema files for authors (PR #436). -
schemas: Added schema file for Viash configs (PR #436). -
schemas: Refactor author import paths (PR #436). -
schemas: Added schema file for file format specification files (PR #437). -
query/cellxgene_census: Query Cellxgene census component and save the results to a MuData file. (PR #433).
MAJOR CHANGES
-
report/mermaid: Now usedmermaid-clito generate images instead of creating a request tomermaid.ink. New--output_format,--width,--heightand--background_colorarguments were added (PR #419). -
All components that used
pythonas base container: useslimversion to reduce container image size (PR #427).
MINOR CHANGES
-
integrate/scvi: update scvi to 1.0.0 (PR #448) -
mapping/multi_star: Added--min_success_ratewhich causes component to fail when the success rate of processed samples were successful (PR #408). -
correction/cellbender_remove_backgroundandtransform/clr: update muon to 0.1.5 (PR #428) -
ingestion/cellranger_postprocessing: split integration tests into several workflows (PR #425). -
schemas: Add schema file for author yamls (PR #436). -
mapping/multi_star,mapping/star_build_referenceandmapping/star_align: update STAR from 2.7.10a to 2.7.10b (PR #441).
BUG FIXES
-
annotate/popv: Fix concat issue when the input data has multiple layers (#395, PR #397). -
annotate/popv: Fix indexing issue when MuData object contain non overlapping modalities (PR #405). -
mapping/multi_star: Fix issue where temp dir could not be created when group_id contains slashes (PR #406). -
mapping/multi_star_to_h5mu: Use glob to look for count files recursively (PR #408). -
annotate/popv: PinPopV,jaxandjaxlibversions (PR #415). -
integrate/scvi: the max_epochs is no longer required since it has a default value (PR #396). -
workflows/full_pipeline: fixmake_observation_keys_uniqueparameter not being correctly passed to theadd_idcomponent, causingValueError: Observations are not unique across samplesduring execution of theconcatcomponent (PR #422). -
annotate/popv: now setsaproxtoFalseto avoid usingannoyin scanorama because it fails on processors that are missing the AVX-512 instruction sets, causingIllegal instruction (core dumped). -
workflows/full_pipeline: Avoid adding sample names to observation ids twice (PR #457).
0.8.0
openpipelines 0.8.0
BREAKING CHANGES
-
workflows/full_pipeline: Renamed inconsistencies in argument naming (#372):rna_min_vars_per_cellwas renamed torna_min_genes_per_cellrna_max_vars_per_cellwas renamed torna_max_genes_per_cellprot_min_vars_per_cellwas renamed toprot_min_proteins_per_cellprot_max_vars_per_cellwas renamed toprot_max_proteins_per_cell
-
velocity/scvelo: bump anndata from <0.8 to 0.9.
NEW FUNCTIONALITY
-
Added an extra label
veryhighmemmostly forcellranger_multiwith a large number of samples. -
Added
multiomics/prot_multisamplepipeline. -
Added
clrfunctionality toprot_multisamplepipeline. -
Added
interpret/lianapy: Enables the use of any combination of ligand-receptor methods and resources, and their consensus. -
filter/filter_with_scrublet: Add--allow_automatic_threshold_detection_fail: when scrublet fails to detect doublets, the component will now putNAin the output columns. -
workflows/full_pipeline: Allow not setting the sample ID to the .obs column of the MuData object. -
workflows/rna_multisample: Add the ID of the sample to the .obs column of the MuData object. -
correction/cellbender_remove_background: addobsm_latent_gene_encodingparameter to store the latent gene representation.
BUG FIXES
-
transform/clr: fix anndata object instead of matrix being stored as a layer in outputMuData, resulting inNoneTypeErrorobject after reading the.layersback in. -
dataflow/concatanddataflow/merge: fixed a bug where boolean values were cast to their string representation. -
workflows/full_pipeline: fix running pipeline with-stub. -
Fixed an issue where passing a remote file URI (for example
http://ors3://) asparam_listcausedNo such fileerrors. -
workflows/full_pipeline: Fix incorrectly named filtering arguments (#372). -
correction/cellbender_remove_background: addobsm_latent_gene_encodingparameter to store the latent gene representation.
MINOR CHANGES
-
integrate/scarches,integrate/scviandcorrection/cellbender_remove_background: Update base container tonvcr.io/nvidia/pytorch:22.12-py3 -
integrate/scvi: addgpulabel for nextflow platform. -
integrate/scvi: use cuda enabledjaxinstall. -
convert/from_cellranger_multi_to_h5mu,dataflow/concatanddataflow/merge: update pandas to 2.0.0 -
dataflow/concatanddataflow/merge: Boolean and integer columns are now represented by theBooleanArrayandIntegerArraydtypes in order to allow storingNAvalues. -
interpret/lianapy: use the latest development release (commit 11156ddd0139a49dfebdd08ac230f0ebf008b7f8) of lianapy in order to fix compatibility with numpy 1.24.x. -
filter/filter_with_hvg: Add error when specified input layer cannot be found in input data. -
workflows/multiomics/full_pipeline: publish the output from sample merging to allow running different integrations.