Spectral library matcher

A script based on the matchms library allowing to calculate spectral similarity measures between two mgf (usually a query file and a library file). A library of natural products in silico generated spectra is availabe here: https://doi.org/10.5281/zenodo.5607185

Requirements

With uv

uv sync
uv venv

and activate it

source .venv/bin/activate

With conda

You can create a conda environment with environment.yml.

conda env create -f environment.yml

and activate it

conda activate spectral_lib_matcher

With docker

docker build -t spectrallibmatcher .

docker run -it --rm -v $PWD:/app spectrallibmatcher bash

Run the tests to check that everything works:

docker run -it --rm -v $PWD:/app spectrallibmatcher bash --login scripts/run_tests

Running the spectral matching

python src/processor.py [-h] [-g] [-o file.out] [--parent_mz_tolerance [-p]] [--msms_mz_tolerance [-m]] [--min_score [-s]] [--similarity_method [-z]] [--min_peaks [-k]] [-c] [-v] query.mgf database.mgf [database.mgf ...]

positional arguments:

query.mgf the source MGF file or GNPS job ID (if -g == True)
database.mgf the database(s) MGF or binary format

optional arguments:

-h, --help show this help message and exit
-g specifies that GNPS is the source of the query_file
-o file.out output file
--parent_mz_tolerance [-p], -p [-p]
- tolerance for the parent ion (MS) (default 0.01)
--msms_mz_tolerance [-m], -m [-m]
- tolerance for the MS/MS ions (default 0.01)
--min_score [-s], -s [-s]
- minimal score to consider (default 0.2)
--similarity_method [-z], -z [-z]
- similarity method used to perform spectral matching (default ModifiedCosine (the list of available similarity methods is listed at https://matchms.readthedocs.io/en/latest/api/matchms.similarity.html#submodules))
--min_peaks [-k], -k [-k]
- minimal number of peaks to consider (default 6)
-c additional cleaning step on the database file
-v print additional details to stdout

Command line example:

python src/processor.py -v -o data/annotations.tsv -p 0.01 -m 0.01 -s 0.2 -k 6 -z ModifiedCosine data/query.mgf data/spectral_lib.mgf

Using the -g argument you can alternatively use a GNPS job id for a direct download of the spectral file

python src/processor.py -v -g -o data/annotations.tsv -p 0.01 -m 0.01 -s 0.2 -k 6 -z ModifiedCosine d7a9cacf9ccd4510a04d119ab1561ea5 data/spectral_lib.mdbl

If you want to compare two MGF's without structural annotation, use the --index true argument to match indices ( feature_id's) instead.

python src/processor.py -v -g -o data/annotations.tsv -p 0.01 -m 0.01 -s 0.2 -k 6 -z ModifiedCosine data/query.mgf data/spectral_lib.mgf -i true

Using binary libraries

To accelerate the matching especially when always using the same library, it is possible to use specialy crafted binary libraries. Under the hood, it is a Python pickle object but with a custom header on the file (because we found some MGF files that pickle would unmarshal)

Create a binary library

python src/binary_library_builder.py -v -o data/spectral_lib.mdbl data/spectral_lib.mgf

Use a binary library

There is nothing special to do, the processor will detect automatically if your library is a mgf or a binary.

python src/processor.py -v -o data/annotations.tsv -p 0.01 -m 0.01 -s 0.2 -k 6 -z ModifiedCosine data/query.mgf data/spectral_lib.mdbl

For the h4ck3rs

You can also use Spec2Vec and MS2DeepScore. Therefore, you'll need to either train your own models or get them from Zenodo as indicated in the respective repositories (and store them in models/). This part experimental and we won't offer support for it.

Citations

Depending on which parts you used, do not forget to cite:

matchms: https://doi.org/10.21105/joss.02411
spec2vec: https://doi.org/10.1371/journal.pcbi.1008724
ms2deepscore: https://doi.org/10.1186/s13321-021-00558-4

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.github/workflows		.github/workflows
data		data
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spectral library matcher

Requirements

With uv

With conda

With docker

Running the spectral matching

Command line example:

Using binary libraries

Create a binary library

Use a binary library

For the h4ck3rs

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

License

mandelbrot-project/spectral_lib_matcher

Folders and files

Latest commit

History

Repository files navigation

Spectral library matcher

Requirements

With uv

With conda

With docker

Running the spectral matching

Command line example:

Using binary libraries

Create a binary library

Use a binary library

For the h4ck3rs

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages