Using SpikeFlow with yeast genomes #23

hideo1 · 2024-12-31T00:12:14Z

Hello Davide,

I'm currently working on a ChIP-seq project and would love to use SpikeFlow. However, I noticed that the supported genomes listed are:
Endogenous: mm9, mm10, hg19, hg38
Exogenous (spike-in): dm3, dm6, mm10, mm9, hg19, hg38, hs1

For my project, the conditions are as follows:
Endogenous: Schizosaccharomyces pombe (fission yeast)
Exogenous (spike-in): Saccharomyces cerevisiae (budding yeast)

Could you please guide me on what modifications I need to make in order to use these organisms with SpikeFlow? As I’m relatively new to bioinformatics, any detailed instructions would be greatly appreciated.

Thank you in advance for your help!

Best regards,
Hideo

DavideBrex · 2025-01-03T18:50:28Z

Dear Hideo,
Sorry for not getting back to you sooner, I was on holiday with limited access to my PC.

Regarding your question, SpikeFlow retrieves the genomes fasta file from the UCSC Genome Browser. However, the genome of Schizosaccharomyces pombe is not available on UCSC. Only S. cerevisiae is available (see here).

To run SpikeFlow, you will need to generate the index for bowtie 2 and provide the path (check the config.yaml):

#otherwise provide the path to the folder containing the bowtie index (and add the name of the index after the path)
#PLEASE NOTE: the index must be created with the same version of bowtie2 as the one installed in the conda environment (2.5.3)

To generate a bowtie2 index, you must first retrieve the fasta files of both Schizosaccharomyces pombe and S. cerevisiae.
Then:

Add a flag to Exogenous genome chromosome names to distinguish them from endogenous genome:

awk 'match($0, "^>") {{sub("^>", ">EXO_")}} 1' sacCer3.fa > sacCer3.EXO.fa

Then merge endogenous and exogenous genomes:

cat ASM294v3.fa sacCer3.EXO.fa > ASM294v3_sacCer3.EXO.merged.fa

And finally, build the index:

bowtie2-build --threads 10 ASM294v3_sacCer3.EXO.merged.fa {output_path}/index_ref

Once this is done, you should be able to run the pipeline without problems. The only step that will fail is the peak annotation, since SpikeFlow imports R libraries for human and mice. Anyway, you can disable the peak annotation by removing these lines in the common.smk

I hope this helps! Let me know if you have any doubts.

Best,

Davide

hideo1 · 2025-01-04T04:51:56Z

Dear Davide, Thank you so much for your kind instruction. I will give it a try! Best regards, Hideo

…

2025/01/04 3:50、Davide Bressan ***@***.***>のメール: Dear Hideo, Sorry for not getting back to you sooner, I was on holiday with limited access to my PC. Regarding your question, SpikeFlow retrieves the genomes fasta file from the UCSC Genome Browser. However, the genome of Schizosaccharomyces pombe is not available on UCSC. Only S. cerevisiae is available (see here <https://hgdownload.soe.ucsc.edu/downloads.html>). To run SpikeFlow, you will need to generate the index for bowtie 2 and provide the path (check the config.yaml): #otherwise provide the path to the folder containing the bowtie index (and add the name of the index after the path) #PLEASE NOTE: the index must be created with the same version of bowtie2 as the one installed in the conda environment (2.5.3) To generate a bowtie2 index, you must first retrieve the fasta files of both Schizosaccharomyces pombe and S. cerevisiae. Then: Add a flag to Exogenous genome chromosome names to distinguish them from endogenous genome: awk 'match($0, "^>") {{sub("^>", ">EXO_")}} 1' sacCer3.fa > sacCer3.EXO.fa Then merge endogenous and exogenous genomes: cat ASM294v3.fa sacCer3.EXO.fa > ASM294v3_sacCer3.EXO.merged.fa And finally, build the index: bowtie2-build --threads 10 ASM294v3_sacCer3.EXO.merged.fa {output_path}/index_ref Once this is done, you should be able to run the pipeline without problems. The only step that will fail is the peak annotation, since SpikeFlow imports R libraries for human and mice. Anyway, you can disable the peak annotation by removing these lines <https://github.com/DavideBrex/SpikeFlow/blob/65b421b53be8dc1dc95125704398fd6d94016b9e/workflow/rules/common.smk#L405>in the common.smk I hope this helps! Let me know if you have any doubts. Best, Davide — Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLJWXLHK4IBKZAIDREDP7L2I3LYTAVCNFSM6AAAAABUMXTMEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRZGY3TCNRYGQ>. You are receiving this because you authored the thread.

hideo1 · 2025-01-04T09:24:07Z

Hi Davide,

Sorry for bothering you again.

I'm encountering an error while testing the workflow on my Mac (Apple M4 Max).

Is it ever possible that all these 4 packages ( - py2bit - macs2 - epic2 - deeptoolsintervals) are not available from conda-forge and bioconda?

I'm suspecting I might be missing a step or configuring something incorrectly. Any insights or suggestions on how to resolve this issue would be greatly appreciated.

Thanks in advance for your time and assistance.

Best regards,

Hideo
—-
(snakemake) Mac:SpikeFlow-1.3.0 hideo$ snakemake --cores 10 --software-deployment-method conda
Workflow defines that rule get_reference_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule create_bowtie_index is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_spike_genome is eligible for caching between workflows (use the --cache argument to enable this).
Assuming unrestricted shared filesystem usage.
host: Mac.lan
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however important for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Creating conda environment workflow/envs/various.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /Users/hideo/Projects/SpikeFlow-1.3.0/workflow/rules/../envs/various.yaml:
Command:
conda env create --quiet --no-default-packages --file "/Users/hideo/Projects/SpikeFlow-1.3.0/.snakemake/conda/6398709402edf13d6faf088041528a53_.yaml" --prefix "/Users/hideo/Projects/SpikeFlow-1.3.0/.snakemake/conda/6398709402edf13d6faf088041528a53_"
Output:
Channels:

conda-forge
bioconda
Platform: osx-arm64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed

PackagesNotFoundError: The following packages are not available from current channels:

py2bit
macs2
epic2
deeptoolsintervals

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

DavideBrex · 2025-01-04T12:36:44Z

Hi Hideo,

The error occurs because those four packages (py2bit, macs2, epic2, deeptoolsintervals) aren't built for the osx-arm64 platform, which is used by Apple Silicon chips like the M1 and M4 Max. Bioconda and Conda-Forge channels may not provide binaries for all platforms, and some packages may only be available for linux-64 or osx-64 architectures. You can see for yourself if you search on anaconda/bioconda:

pysam supports osx-arm 64 (see here). See under Installers
deeptoolsintervals does not (see here).

SpikeFlow was originally developed to run on remote servers or cluster systems, that are Linux based. This is because the alignment step with bowtie2 requires several Gbs of RAM (for mouse and human samples), and usually cannot be performed on a local machine. I am afraid right now I do not have the time to work on a mac compatible version of SpikeFlow, so, if possible, I suggest you switch to a Linux-based os.

Alternatively, you can install a linux virtual machine on your mac system and try to run the workflow from there.

Best,

Davide

hideo1 · 2025-01-04T12:43:56Z

Hi Davide, Thank you so much for your prompt response! I will look into the linux virtual machine option then. Thanks again, Hideo

…

2025/01/04 21:37、Davide Bressan ***@***.***>のメール: Hi Hideo, The error occurs because those four packages (py2bit, macs2, epic2, deeptoolsintervals) aren't built for the osx-arm64 platform, which is used by Apple Silicon chips like the M1 and M4 Max. Bioconda and Conda-Forge channels may not provide binaries for all platforms, and some packages may only be available for linux-64 or osx-64 architectures. You can see for yourself if you search on anaconda/bioconda: pysam supports osx-arm 64 (see here <https://anaconda.org/bioconda/pysam>). See under Installers deeptoolsintervals does not (see here <https://anaconda.org/bioconda/deeptoolsintervals>). SpikeFlow was originally developed to run on remote servers or cluster systems, that are Linux based. This is because the alignment step with bowtie2 requires several Gbs of RAM (for mouse and human samples), and usually cannot be performed on a local machine. I am afraid right now I do not have the time to work on a mac compatible version of SpikeFlow, so, if possible, I suggest you switch to a Linux-based os. Alternatively, you can install a linux virtual machine on your mac system and try to run the workflow from there. Best, Davide — Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLJWXJ7IBE7BSUD62GPKGL2I7IXDAVCNFSM6AAAAABUMXTMEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZRGI3TMMRYGA>. You are receiving this because you authored the thread.

DavideBrex self-assigned this Jan 3, 2025

DavideBrex added the enhancement New feature or request label Jan 3, 2025

DavideBrex added help wanted Extra attention is needed and removed enhancement New feature or request labels Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using SpikeFlow with yeast genomes #23

Using SpikeFlow with yeast genomes #23

hideo1 commented Dec 31, 2024

DavideBrex commented Jan 3, 2025

hideo1 commented Jan 4, 2025 via email

hideo1 commented Jan 4, 2025 •

edited

Loading

DavideBrex commented Jan 4, 2025

hideo1 commented Jan 4, 2025 via email

Using SpikeFlow with yeast genomes #23

Using SpikeFlow with yeast genomes #23

Comments

hideo1 commented Dec 31, 2024

DavideBrex commented Jan 3, 2025

hideo1 commented Jan 4, 2025 via email

hideo1 commented Jan 4, 2025 • edited Loading

DavideBrex commented Jan 4, 2025

hideo1 commented Jan 4, 2025 via email

hideo1 commented Jan 4, 2025 •

edited

Loading