-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using SpikeFlow with yeast genomes #23
Comments
Dear Hideo, Regarding your question, SpikeFlow retrieves the genomes fasta file from the UCSC Genome Browser. However, the genome of Schizosaccharomyces pombe is not available on UCSC. Only S. cerevisiae is available (see here). To run SpikeFlow, you will need to generate the index for bowtie 2 and provide the path (check the config.yaml):
To generate a bowtie2 index, you must first retrieve the fasta files of both Schizosaccharomyces pombe and S. cerevisiae. Add a flag to Exogenous genome chromosome names to distinguish them from endogenous genome:
Then merge endogenous and exogenous genomes:
And finally, build the index:
Once this is done, you should be able to run the pipeline without problems. The only step that will fail is the peak annotation, since SpikeFlow imports R libraries for human and mice. Anyway, you can disable the peak annotation by removing these lines in the common.smk I hope this helps! Let me know if you have any doubts. Best, Davide |
Dear Davide,
Thank you so much for your kind instruction.
I will give it a try!
Best regards,
Hideo
… 2025/01/04 3:50、Davide Bressan ***@***.***>のメール:
Dear Hideo,
Sorry for not getting back to you sooner, I was on holiday with limited access to my PC.
Regarding your question, SpikeFlow retrieves the genomes fasta file from the UCSC Genome Browser. However, the genome of Schizosaccharomyces pombe is not available on UCSC. Only S. cerevisiae is available (see here <https://hgdownload.soe.ucsc.edu/downloads.html>).
To run SpikeFlow, you will need to generate the index for bowtie 2 and provide the path (check the config.yaml):
#otherwise provide the path to the folder containing the bowtie index (and add the name of the index after the path)
#PLEASE NOTE: the index must be created with the same version of bowtie2 as the one installed in the conda environment (2.5.3)
To generate a bowtie2 index, you must first retrieve the fasta files of both Schizosaccharomyces pombe and S. cerevisiae.
Then:
Add a flag to Exogenous genome chromosome names to distinguish them from endogenous genome:
awk 'match($0, "^>") {{sub("^>", ">EXO_")}} 1' sacCer3.fa > sacCer3.EXO.fa
Then merge endogenous and exogenous genomes:
cat ASM294v3.fa sacCer3.EXO.fa > ASM294v3_sacCer3.EXO.merged.fa
And finally, build the index:
bowtie2-build --threads 10 ASM294v3_sacCer3.EXO.merged.fa {output_path}/index_ref
Once this is done, you should be able to run the pipeline without problems. The only step that will fail is the peak annotation, since SpikeFlow imports R libraries for human and mice. Anyway, you can disable the peak annotation by removing these lines <https://github.com/DavideBrex/SpikeFlow/blob/65b421b53be8dc1dc95125704398fd6d94016b9e/workflow/rules/common.smk#L405>in the common.smk
I hope this helps! Let me know if you have any doubts.
Best,
Davide
—
Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLJWXLHK4IBKZAIDREDP7L2I3LYTAVCNFSM6AAAAABUMXTMEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRZGY3TCNRYGQ>.
You are receiving this because you authored the thread.
|
Hi Davide, Sorry for bothering you again. I'm encountering an error while testing the workflow on my Mac (Apple M4 Max). Is it ever possible that all these 4 packages ( - py2bit - macs2 - epic2 - deeptoolsintervals) are not available from conda-forge and bioconda? I'm suspecting I might be missing a step or configuring something incorrectly. Any insights or suggestions on how to resolve this issue would be greatly appreciated. Thanks in advance for your time and assistance. Best regards, Hideo
PackagesNotFoundError: The following packages are not available from current channels:
Current channels: To search for alternate channels that may provide the conda package you're
and use the search bar at the top of the page. |
Hi Hideo, The error occurs because those four packages (py2bit, macs2, epic2, deeptoolsintervals) aren't built for the osx-arm64 platform, which is used by Apple Silicon chips like the M1 and M4 Max. Bioconda and Conda-Forge channels may not provide binaries for all platforms, and some packages may only be available for linux-64 or osx-64 architectures. You can see for yourself if you search on anaconda/bioconda: SpikeFlow was originally developed to run on remote servers or cluster systems, that are Linux based. This is because the alignment step with bowtie2 requires several Gbs of RAM (for mouse and human samples), and usually cannot be performed on a local machine. I am afraid right now I do not have the time to work on a mac compatible version of SpikeFlow, so, if possible, I suggest you switch to a Linux-based os. Alternatively, you can install a linux virtual machine on your mac system and try to run the workflow from there. Best, Davide |
Hi Davide,
Thank you so much for your prompt response!
I will look into the linux virtual machine option then.
Thanks again,
Hideo
… 2025/01/04 21:37、Davide Bressan ***@***.***>のメール:
Hi Hideo,
The error occurs because those four packages (py2bit, macs2, epic2, deeptoolsintervals) aren't built for the osx-arm64 platform, which is used by Apple Silicon chips like the M1 and M4 Max. Bioconda and Conda-Forge channels may not provide binaries for all platforms, and some packages may only be available for linux-64 or osx-64 architectures. You can see for yourself if you search on anaconda/bioconda:
pysam supports osx-arm 64 (see here <https://anaconda.org/bioconda/pysam>). See under Installers
deeptoolsintervals does not (see here <https://anaconda.org/bioconda/deeptoolsintervals>).
SpikeFlow was originally developed to run on remote servers or cluster systems, that are Linux based. This is because the alignment step with bowtie2 requires several Gbs of RAM (for mouse and human samples), and usually cannot be performed on a local machine. I am afraid right now I do not have the time to work on a mac compatible version of SpikeFlow, so, if possible, I suggest you switch to a Linux-based os.
Alternatively, you can install a linux virtual machine on your mac system and try to run the workflow from there.
Best,
Davide
—
Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLJWXJ7IBE7BSUD62GPKGL2I7IXDAVCNFSM6AAAAABUMXTMEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZRGI3TMMRYGA>.
You are receiving this because you authored the thread.
|
Hello Davide,
I'm currently working on a ChIP-seq project and would love to use SpikeFlow. However, I noticed that the supported genomes listed are:
Endogenous: mm9, mm10, hg19, hg38
Exogenous (spike-in): dm3, dm6, mm10, mm9, hg19, hg38, hs1
For my project, the conditions are as follows:
Endogenous: Schizosaccharomyces pombe (fission yeast)
Exogenous (spike-in): Saccharomyces cerevisiae (budding yeast)
Could you please guide me on what modifications I need to make in order to use these organisms with SpikeFlow? As I’m relatively new to bioinformatics, any detailed instructions would be greatly appreciated.
Thank you in advance for your help!
Best regards,
Hideo
The text was updated successfully, but these errors were encountered: