forked from elifesciences-publications/HAWK
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
72 lines (48 loc) · 1.94 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
HAWK version 0.9.8
# HAWK
Hitting associations with k-mers
## Installation
To install HAWK run (X.Y.Z is the version)
```
tar xf hawk-X.Y.Z-beta.tar
cd hawk-X.Y.Z-beta
make
```
## Prerequisites
JELLYFISH (modified version available in supplements)
EIGENSTRAT (modified version available in supplements)
R (with foreach and doParallel packages)
ABYSS
## Counting k-mers
The first step in the pipeline is to count k-mers in each sample, find
total number of k-mers per sample, discard k-mers that appear once in samples and sort
the k-mers. The k-mer file contains one line per k-mer present and each
line contains an integer representing the k-mer and its count separated
by a space. The integer representation is given by using 0 for 'A',
1 for 'C', 2 for 'G' and 3 for 'T'.
k-mer counting can be done using a modified version of the tool JELLYFISH
provided in the 'supplements' folder with HAWK. All of the steps mentioned
above can be performed by installing this version of JELLYFISH and then
running the script 'countKmers' in supplements with necessary modifications.
This will write the names of sorted k-mer count files in 'sorted_files.txt'
and total k-mer count in samples in 'total_kmers.txt'.
## Running HAWK
Copy 'sorted_files.txt' and 'total_kmers.txt' corresponding to the samples
into a folder as well as a file named 'gwas_info.txt' containing three columns separated by tabs giving a sample ID, male/female/unknown denoted by M/F/U and Case/Control status of the sample for each sample. For example
```
SRR3050845 U Control
SRR3050846 U Case
SRR3050847 U Control
```
Copy the scripts 'runHawk' and 'runAbyss' into the folder and run
```
./runHawk
```
The k-mers with significant association to case and controls will be in
'case_kmers.fasta' and 'control_kmers.fasta' which can then be assembled by running
```
./runAbyss
```
The assembled
sequences will be in 'case_abyss.25_49.fasta' and 'control_abyss.25_49.fasta'
respectively.