-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathbefore-you-start.qmd
292 lines (216 loc) · 11.6 KB
/
before-you-start.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# Before you Start {.unnumbered}
The summer school course that this textbook is derived from was designed to be as practical as possible. This means that most of the chapters are designed to act as a walkthrough to guide you through the steps on how to generate and analyse data for each of the major steps of an ancient metagenomics project.
The summer school utilised cloud computing to provide a consistent computing platform for all participants, however all tools and data demonstrated are open-source and publicly available. We describe here to approximately recreate the computing platform used during the summer schools.
## Basic requirements
::: {.callout-warning}
Bioinformatics often involve large computing resource requirements! While we aim to make example data and processing as efficient as possible, we cannot guarantee that they will all be able to work on standard laptops or desktop computing - most likely due to memory/RAM requirements. As a guide, the cloud nodes used during the summer school had 16 cores and 32 GB of RAM.
:::
To following the practical chapters of this text book, you will require:
- A unix based operating system (e.g., Linux, MacOS, or possibly Windows with Linux Subsystem - however the latter has not be tested )
- A corresponding Unix terminal
- An internet connection
- A web browser
- A [`conda`](https://docs.conda.io/en/latest/miniconda.html) installation with [`bioconda`](https://bioconda.github.io/#usage) configured.
- Conda is a very popular package manager for installing software in bioinformatics. `bioconda` is currently the most popular distribution source of bioinformatics software for conda.
For each chapter, if it requires pre-prepared data, the top of the page will have a box called 'Self guided: chapter envionment setup' that link to a `.tar` archive.
This contain the raw data will be available to download for following the chapter
Furthermore, the same box will describe how to use a conda `.yml` file that specifies the software environment for that chapter will also be available for you to install.
See the rest of this page on how to install `conda` (if not already available to you), and also how to create `conda` software environments.
## Software Environments
Before loading the environment for the exercises, the software environment will need to be created using the `.yml` with the instructions below, and then activated. A list of the software in each chapter's environment can be found in the [Appendix](appendix.html#conda-environments).
If you've not yet installed `conda`, please follow the instructions in the box below.
::: {.callout-note title="Quick guide to installing conda" collapse=true}
These instructions have been tested on Ubuntu 22.04, but should apply to most Linux operating systems. For OSX you may need to download a different file from [here](https://www.google.com/search?channel=fs&client=ubuntu-sn&q=miniconda).
- Change directory to somewhere suitable for installing a few gigabytes of software, e.g. `mkdir ~/bin/ && cd ~/bin/`
- Download miniconda
```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
```
- Run the install script
```bash
bash bash Miniconda3-latest-Linux-x86_64.sh
```
- Review license
- Agree to license
- Make sure to install miniconda to the correct directory! e.g. `/home/<YOUR_USER>/bin/miniconda3`
- Yes to running `conda init`
- Copy the `conda config` command
- Close the terminal (e.g. with `exit` or <kbd>ctrl</kbd> + <kbd>d</kbd>)
- Open the terminal again and run the command you copied (i.e., `conda config --set auto_activate_base false`)
- Exit and open the terminal again
- Type `conda --version` to check conda is installed and working
- Set up bioconda
```bash
conda config --add channels bioconda
conda config --add channels conda-forge
```
:::
Once `conda` is installed and `bioconda` configured, at the beginning of each chapter, to create the `conda` environment from the `yml` file, you will need to carry out the following steps.
1. Download and unpack the `conda` env file the top of the chapter by right clicking on the link and pressing 'save as'. Once uncompressed, change into the directory.
2. Then you can run the following conda command to install the software into it's dedicated environment
```bash
conda env create -f /<PATH/<TO>/<DOWNLOADED_FILE>.yml
```
::: {.callout-note}
You only have to run the environment creation once for each chapter/environment!
:::
3. Follow the instructions as prompted. Once created, you can see a list of installed environments with
```bash
conda env list
```
4. To load the relevant environment, you can run
```bash
conda activate <NAME_OF_ENVIRONMENT>
```
3. Once finished with the chapter, you can deactivate the environment with
```bash
conda deactivate
```
To reuse the environment, just run step 4 and 5 as necessary.
::: {.callout-tip}
To delete a conda software environment, run `conda remove --name <NAME_OF_ENV> --all -y`
:::
::: {.callout-note}
If at any point you have issues of running out of space on your machine, you can first try running
```bash
conda clean --all
```
And you can answer 'yes' to all the prompts to remove unused packages and caches.
If you are still having issues, you can remove the environments that you don't need any more with the following.
```
conda env list
```
To list all your existing conda environments, and then delete the directory of the unneeded environment with `rm`.
```bash
rm -r /<path>/<to>/<conda_install>/envs/<environment_name>
```
:::
## Additional Software {.unnumbered}
For some chapters you may need the following software/and or data manually installed, which are not available on `bioconda`:
### Introduction to the command line
- [rename](https://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html) (if not already installed, e.g. on OSX)
```bash
sudo apt install rename
```
### _De novo_ assembly
- [MetaWRAP](https://github.com/bxlab/metaWRAP)
```bash
conda create -n metawrap-env python=2.7
conda activate metawrap-env
conda install -c bioconda biopython=1.68 bwa=0.7.17 maxbin2=2.2.7 metabat2 samtools=1.9 checkm-genome=1.0.12
cd /<path>/<to>/denovo-assembly
git clone https://github.com/bxlab/metaWRAP.git
## don't forget to update path/to!
export PATH=$PATH:/<path>/<to>/metaWRAP/bin
```
::: {.callout-warning}
If you close your terminal halfway through the chapter, when opening the terminal again to continue the chapter, you MUST re-export the path to the metaWRAP bin directory to have access to the software!
:::
### Functional Profiling
- [HUMAnN3](https://github.com/biobakery/humann) UniRef database (where the functional providing conda environment is already activated - see the Functional Profiling chapter for more details)
```bash
humann3_databases --download uniref uniref90_ec_filtered_diamond /<path>/<to>/functional-profiling/humann3_db
```
### Authentication
- [pip](https://pip.pypa.io/en/stable/installation/)
- [metaDMG](https://github.com/metaDMG-dev/metaDMG-cpp)
- Make a conda environment file called `metadmg.yml` containing the following.
```yaml
name: metaDMG
channels:
- conda-forge
- bioconda
dependencies:
- conda-forge::python=3.9.15
- bioconda::htslib=1.17
- conda-forge::eigen=3.4.0
- conda-forge::cxx-compiler=1.5.2
- conda-forge::c-compiler=1.5.2
- conda-forge::gsl=2.7
- conda-forge::iminuit=2.17.0
- conda-forge::numpyro=0.10.1
- conda-forge::joblib=1.2.0
- conda-forge::numba=0.56.2
- conda-forge::flatbuffers=22.9.24
- conda-forge::psutil=5.9.4
```
- Create the environment with
```bash
conda env create -f metadmg.yml
```
- Change to the chapter's directory
```bash
cd /<path>/<to>/authentication
```
- Activate the new environment
```bash
conda activate metaDMG
```
- Clone the latest version of metaDMG, and compile
```bash
git clone https://github.com/metaDMG-dev/metaDMG-cpp.git
cd metaDMG-cpp
make clean && make CPPFLAGS="-L${CONDA_PREFIX}/lib -I${CONDA_PREFIX}/include" HTSSRC=systemwide -j 8
```
- Install some patches, and some additional modules
```bash
pip install git+https://github.com/metaDMG-dev/metaDMG-core #@stopiferrors_branch
pip install metaDMG[viz]
```
- Deactivate the conda environment
```bash
conda deactivate
```
- Make the metaDMG executable available in the environment
```bash
export PATH="$PATH:/<path>/<to>/authentication/metaDMG-cpp/"
```
::: {.callout-warning}
If you close your terminal halfway through the chapter, when opening the terminal again to continue the chapter, you MUST re-export the path to the metaDMG-cpp directory to have access to the software!
:::
### Phylogenomics
- [Tempest](http://tree.bio.ed.ac.uk/download.html?name=tempest&id=102&num=3) (v1.5.3)
- It is also recommended to assign the following `bash` variable so you can access the tool without the full path
```bash
cd /<path>/<to>/phylogenomics
tar -xvf TempEst_v1.5.3.tgz
cd TempEst_V1.5.3
export PATH="$PATH:/<path>/<to>/phylogenomics/TempEst_v1.5.3/bin/"
```
- If you get an error like `Exception in thread "main" java.lang.UnsatisfiedLinkError: Can't load library: /usr/lib/jvm/java-11-openjdk-amd64/lib/libawt_xawt.so`, make sure you have Java installed e.g.
```bash
sudo apt install openjdk-11-jdk
```
- [MEGAX](https://www.megasoftware.net) (v11.0.11)
::: {.callout-warning}
If you close your terminal halfway through the chapter, when opening the terminal again to continue the chapter, you MUST re-export the path to the Tempest bin directory to have access to the software!
:::
### Ancient metagenomic pipelines
- [Docker](https://www.docker.com/) (installation method will vary depending on your OS)
- [Standard install](https://docs.docker.com/engine/install/ubuntu/)
- Linux-nerd install
```bash
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
## May need to do a reboot or something here
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
sudo reboot ## will kick you out, but it'll be back in a minute or two
```
- [aMeta](https://github.com/NBISweden/aMeta) (make sure you've already downloaded the data directory as per the [chapter instructions](ancient-metagenomic-pipelines.qmd))
```bash
cd /<path>/<to>/ancient-metagenomic-pipelines/
git clone https://github.com/NBISweden/aMeta
cd aMeta
## We have to patch the environment to use an old version of Snakemake as aMeta is not compatible with the latest version
sed -i 's/snakemake-minimal>=5.18/snakemake <=6.3.0/' workflow/envs/environment.yaml
conda env create -f workflow/envs/environment.yaml
```