Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using SpikeFlow with yeast genomes #23

Open
hideo1 opened this issue Dec 31, 2024 · 5 comments
Open

Using SpikeFlow with yeast genomes #23

hideo1 opened this issue Dec 31, 2024 · 5 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@hideo1
Copy link

hideo1 commented Dec 31, 2024

Hello Davide,

I'm currently working on a ChIP-seq project and would love to use SpikeFlow. However, I noticed that the supported genomes listed are:
Endogenous: mm9, mm10, hg19, hg38
Exogenous (spike-in): dm3, dm6, mm10, mm9, hg19, hg38, hs1

For my project, the conditions are as follows:
Endogenous: Schizosaccharomyces pombe (fission yeast)
Exogenous (spike-in): Saccharomyces cerevisiae (budding yeast)

Could you please guide me on what modifications I need to make in order to use these organisms with SpikeFlow? As I’m relatively new to bioinformatics, any detailed instructions would be greatly appreciated.

Thank you in advance for your help!

Best regards,
Hideo

@DavideBrex DavideBrex self-assigned this Jan 3, 2025
@DavideBrex DavideBrex added the enhancement New feature or request label Jan 3, 2025
@DavideBrex
Copy link
Owner

Dear Hideo,
Sorry for not getting back to you sooner, I was on holiday with limited access to my PC.

Regarding your question, SpikeFlow retrieves the genomes fasta file from the UCSC Genome Browser. However, the genome of Schizosaccharomyces pombe is not available on UCSC. Only S. cerevisiae is available (see here).

To run SpikeFlow, you will need to generate the index for bowtie 2 and provide the path (check the config.yaml):

#otherwise provide the path to the folder containing the bowtie index (and add the name of the index after the path)
#PLEASE NOTE: the index must be created with the same version of bowtie2 as the one installed in the conda environment (2.5.3)

To generate a bowtie2 index, you must first retrieve the fasta files of both Schizosaccharomyces pombe and S. cerevisiae.
Then:

Add a flag to Exogenous genome chromosome names to distinguish them from endogenous genome:

awk 'match($0, "^>") {{sub("^>", ">EXO_")}} 1' sacCer3.fa > sacCer3.EXO.fa

Then merge endogenous and exogenous genomes:

cat ASM294v3.fa sacCer3.EXO.fa > ASM294v3_sacCer3.EXO.merged.fa

And finally, build the index:

bowtie2-build --threads 10 ASM294v3_sacCer3.EXO.merged.fa {output_path}/index_ref

Once this is done, you should be able to run the pipeline without problems. The only step that will fail is the peak annotation, since SpikeFlow imports R libraries for human and mice. Anyway, you can disable the peak annotation by removing these lines in the common.smk

I hope this helps! Let me know if you have any doubts.

Best,

Davide

@DavideBrex DavideBrex added help wanted Extra attention is needed and removed enhancement New feature or request labels Jan 3, 2025
@hideo1
Copy link
Author

hideo1 commented Jan 4, 2025 via email

@hideo1
Copy link
Author

hideo1 commented Jan 4, 2025

Hi Davide,

Sorry for bothering you again.

I'm encountering an error while testing the workflow on my Mac (Apple M4 Max).

Is it ever possible that all these 4 packages ( - py2bit - macs2 - epic2 - deeptoolsintervals) are not available from conda-forge and bioconda?

I'm suspecting I might be missing a step or configuring something incorrectly. Any insights or suggestions on how to resolve this issue would be greatly appreciated.

Thanks in advance for your time and assistance.

Best regards,

Hideo
—-
(snakemake) Mac:SpikeFlow-1.3.0 hideo$ snakemake --cores 10 --software-deployment-method conda
Workflow defines that rule get_reference_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule create_bowtie_index is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_spike_genome is eligible for caching between workflows (use the --cache argument to enable this).
Assuming unrestricted shared filesystem usage.
host: Mac.lan
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however important for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Creating conda environment workflow/envs/various.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /Users/hideo/Projects/SpikeFlow-1.3.0/workflow/rules/../envs/various.yaml:
Command:
conda env create --quiet --no-default-packages --file "/Users/hideo/Projects/SpikeFlow-1.3.0/.snakemake/conda/6398709402edf13d6faf088041528a53_.yaml" --prefix "/Users/hideo/Projects/SpikeFlow-1.3.0/.snakemake/conda/6398709402edf13d6faf088041528a53_"
Output:
Channels:

  • conda-forge
  • bioconda
    Platform: osx-arm64
    Collecting package metadata (repodata.json): ...working... done
    Solving environment: ...working... failed

PackagesNotFoundError: The following packages are not available from current channels:

  • py2bit
  • macs2
  • epic2
  • deeptoolsintervals

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

@DavideBrex
Copy link
Owner

Hi Hideo,

The error occurs because those four packages (py2bit, macs2, epic2, deeptoolsintervals) aren't built for the osx-arm64 platform, which is used by Apple Silicon chips like the M1 and M4 Max. Bioconda and Conda-Forge channels may not provide binaries for all platforms, and some packages may only be available for linux-64 or osx-64 architectures. You can see for yourself if you search on anaconda/bioconda:

  • pysam supports osx-arm 64 (see here). See under Installers
  • deeptoolsintervals does not (see here).

SpikeFlow was originally developed to run on remote servers or cluster systems, that are Linux based. This is because the alignment step with bowtie2 requires several Gbs of RAM (for mouse and human samples), and usually cannot be performed on a local machine. I am afraid right now I do not have the time to work on a mac compatible version of SpikeFlow, so, if possible, I suggest you switch to a Linux-based os.

Alternatively, you can install a linux virtual machine on your mac system and try to run the workflow from there.

Best,

Davide

@hideo1
Copy link
Author

hideo1 commented Jan 4, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants