Skip to content

Commit a386816

Browse files
author
Romulo Antao
committed
Dependencies added
Phylip dependecies added to the project (OSX files)
1 parent cc978fc commit a386816

File tree

709 files changed

+117599
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

709 files changed

+117599
-2
lines changed

README.md

+41-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,41 @@
1-
#ncd_python
1+
How to use ncd.py
2+
3+
"python ncd_calc.py data_folder compressor"
4+
"compressors available: zlib or bz2"
5+
"example: python ncd.py data/animals/ zlib"
6+
7+
8+
This program uses the Normalized Compression Distance to evaluate the similarity degree the mtDNA of different animals, thus inferring their genetic similarity.
9+
10+
The results obtained from the ncd.py script are in the format required for post processing in Phylip (http://evolution.genetics.washington.edu/phylip.html).
11+
12+
13+
To generate a phylogenetic tree from the example dataset using the zlib compressor, the following procedure shall be followed:
14+
15+
1) Execute the script
16+
17+
python ncd.py data/animals/ zlib
18+
19+
the output file, "q.phy" is a upper diagonal matrix
20+
21+
2) execute the Phylip program neighbor
22+
23+
./phylip-3.695/neighbor
24+
25+
3) Choose "q.phy" as source and select Upper Triangular Data Matrix (R). Process the files and the program will output:
26+
27+
Output written on file "outfile"
28+
Tree written on file "outtree"
29+
Done.
30+
31+
3) Execute the script "name_replace" to introduce the names of the animals in the outtree file
32+
33+
python "name_replace.py"
34+
35+
4) Execute the Phylip program drawtree
36+
37+
./phylip-3.695/neighbor
38+
39+
5) Choose "outtree" as the input file and "font1" as the used font. Select option (B) to not use branch length and produce the output file.
40+
41+
6) the result phylogenetic tree will be saved in the newly created "plotfile"

font1

+193
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
CA 501 21 18 28
2+
-1956 1135 -1956 2735 -1442 2442 -12835
3+
CB 502 21 21 28
4+
-1456 1435 -1456 2356 2655 2754 2852 2850 2748 2647
5+
2346 -1446 2346 2645 2744 2842 2839 2737 2636 2335
6+
1435 -13135
7+
CC 503 21 21 28
8+
-2851 2753 2555 2356 1956 1755 1553 1451 1348 1343
9+
1440 1538 1736 1935 2335 2536 2738 2840 -13135
10+
CD 504 21 21 28
11+
-1456 1435 -1456 2156 2455 2653 2751 2848 2843 2740
12+
2638 2436 2135 1435 -13135
13+
CE 505 21 19 28
14+
-1456 1435 -1456 2756 -1446 2246 -1435 2735 -12935
15+
CF 506 21 18 28
16+
-1456 1435 -1456 2756 -1446 2246 -12835
17+
CG 507 21 21 28
18+
-2851 2753 2555 2356 1956 1755 1553 1451 1348 1343
19+
1440 1538 1736 1935 2335 2536 2738 2840 2843 -2343
20+
2843 -13135
21+
CH 508 21 22 28
22+
-1456 1435 -2856 2835 -1446 2846 -13235
23+
CI 509 21 8 28
24+
-1456 1435 -11835
25+
CJ 510 21 16 28
26+
-2256 2240 2137 2036 1835 1635 1436 1337 1240 1242
27+
-12635
28+
CK 511 21 21 28
29+
-1456 1435 -2856 1442 -1947 2835 -13135
30+
CL 512 21 17 28
31+
-1456 1435 -1435 2635 -12735
32+
CM 513 21 24 28
33+
-1456 1435 -1456 2235 -3056 2235 -3056 3035 -13435
34+
CN 514 21 22 28
35+
-1456 1435 -1456 2835 -2856 2835 -13235
36+
CO 515 21 22 28
37+
-1956 1755 1553 1451 1348 1343 1440 1538 1736 1935
38+
2335 2536 2738 2840 2943 2948 2851 2753 2555 2356
39+
1956 -13235
40+
CP 516 21 21 28
41+
-1456 1435 -1456 2356 2655 2754 2852 2849 2747 2646
42+
2345 1445 -13135
43+
CQ 517 21 22 28
44+
-1956 1755 1553 1451 1348 1343 1440 1538 1736 1935
45+
2335 2536 2738 2840 2943 2948 2851 2753 2555 2356
46+
1956 -2239 2833 -13235
47+
CR 518 21 21 28
48+
-1456 1435 -1456 2356 2655 2754 2852 2850 2748 2647
49+
2346 1446 -2146 2835 -13135
50+
CS 519 21 20 28
51+
-2753 2555 2256 1856 1555 1353 1351 1449 1548 1747
52+
2345 2544 2643 2741 2738 2536 2235 1835 1536 1338
53+
-13035
54+
CT 520 21 16 28
55+
-1856 1835 -1156 2556 -12635
56+
CU 521 21 22 28
57+
-1456 1441 1538 1736 2035 2235 2536 2738 2841 2856
58+
-13235
59+
CV 522 21 18 28
60+
-1156 1935 -2756 1935 -12835
61+
CW 523 21 24 28
62+
-1256 1735 -2256 1735 -2256 2735 -3256 2735 -13435
63+
CX 524 21 20 28
64+
-1356 2735 -2756 1335 -13035
65+
CY 525 21 18 28
66+
-1156 1946 1935 -2756 1946 -12835
67+
CZ 526 21 20 28
68+
-2756 1335 -1356 2756 -1335 2735 -13035
69+
Ca 601 21 19 28
70+
-2549 2535 -2546 2348 2149 1849 1648 1446 1343 1341
71+
1438 1636 1835 2135 2336 2538 -12935
72+
Cb 602 21 19 28
73+
-1456 1435 -1446 1648 1849 2149 2348 2546 2643 2641
74+
2538 2336 2135 1835 1636 1438 -12935
75+
Cc 603 21 18 28
76+
-2546 2348 2149 1849 1648 1446 1343 1341 1438 1636
77+
1835 2135 2336 2538 -12835
78+
Cd 604 21 19 28
79+
-2556 2535 -2546 2348 2149 1849 1648 1446 1343 1341
80+
1438 1636 1835 2135 2336 2538 -12935
81+
Ce 605 21 18 28
82+
-1343 2543 2545 2447 2348 2149 1849 1648 1446 1343
83+
1341 1438 1636 1835 2135 2336 2538 -12835
84+
Cf 606 21 12 28
85+
-2056 1856 1655 1552 1535 -1249 1949 -12235
86+
Cg 607 21 19 28
87+
-2549 2533 2430 2329 2128 1828 1629 -2546 2348 2149
88+
1849 1648 1446 1343 1341 1438 1636 1835 2135 2336
89+
2538 -12935
90+
Ch 608 21 19 28
91+
-1456 1435 -1445 1748 1949 2249 2448 2545 2535 -12935
92+
Ci 609 21 8 28
93+
-1356 1455 1556 1457 1356 -1449 1435 -11835
94+
Cj 610 21 10 28
95+
-1556 1655 1756 1657 1556 -1649 1632 1529 1328 1128
96+
-12035
97+
Ck 611 21 17 28
98+
-1456 1435 -2449 1439 -1843 2535 -12735
99+
Cl 612 21 8 28
100+
-1456 1435 -11835
101+
Cm 613 21 30 28
102+
-1449 1435 -1445 1748 1949 2249 2448 2545 2535 -2545
103+
2848 3049 3349 3548 3645 3635 -14035
104+
Cn 614 21 19 28
105+
-1449 1435 -1445 1748 1949 2249 2448 2545 2535 -12935
106+
Co 615 21 19 28
107+
-1849 1648 1446 1343 1341 1438 1636 1835 2135 2336
108+
2538 2641 2643 2546 2348 2149 1849 -12935
109+
Cp 616 21 19 28
110+
-1449 1428 -1446 1648 1849 2149 2348 2546 2643 2641
111+
2538 2336 2135 1835 1636 1438 -12935
112+
Cq 617 21 19 28
113+
-2549 2528 -2546 2348 2149 1849 1648 1446 1343 1341
114+
1438 1636 1835 2135 2336 2538 -12935
115+
Cr 618 21 13 28
116+
-1449 1435 -1443 1546 1748 1949 2249 -12335
117+
Cs 619 21 17 28
118+
-2446 2348 2049 1749 1448 1346 1444 1643 2142 2341
119+
2439 2438 2336 2035 1735 1436 1338 -12735
120+
Ct 620 21 12 28
121+
-1556 1539 1636 1835 2035 -1249 1949 -12235
122+
Cu 621 21 19 28
123+
-1449 1439 1536 1735 2035 2236 2539 -2549 2535 -12935
124+
Cv 622 21 16 28
125+
-1249 1835 -2449 1835 -12635
126+
Cw 623 21 22 28
127+
-1349 1735 -2149 1735 -2149 2535 -2949 2535 -13235
128+
Cx 624 21 17 28
129+
-1349 2435 -2449 1335 -12735
130+
Cy 625 21 16 28
131+
-1249 1835 -2449 1835 1631 1429 1228 1128 -12635
132+
Cz 626 21 17 28
133+
-2449 1335 -1349 2449 -1335 2435 -12735
134+
C0 700 21 20 28
135+
-1956 1655 1452 1347 1344 1439 1636 1935 2135 2436
136+
2639 2744 2747 2652 2455 2156 1956 -13035
137+
C1 701 21 20 28
138+
-1652 1853 2156 2135 -13035
139+
C2 702 21 20 28
140+
-1451 1452 1554 1655 1856 2256 2455 2554 2652 2650
141+
2548 2345 1335 2735 -13035
142+
C3 703 21 20 28
143+
-1556 2656 2048 2348 2547 2646 2743 2741 2638 2436
144+
2135 1835 1536 1437 1339 -13035
145+
C4 704 21 20 28
146+
-2356 1342 2842 -2356 2335 -13035
147+
C5 705 21 20 28
148+
-2556 1556 1447 1548 1849 2149 2448 2646 2743 2741
149+
2638 2436 2135 1835 1536 1437 1339 -13035
150+
C6 706 21 20 28
151+
-2653 2555 2256 2056 1755 1552 1447 1442 1538 1736
152+
2035 2135 2436 2638 2741 2742 2645 2447 2148 2048
153+
1747 1545 1442 -13035
154+
C7 707 21 20 28
155+
-2756 1735 -1356 2756 -13035
156+
C8 708 21 20 28
157+
-1856 1555 1453 1451 1549 1748 2147 2446 2644 2742
158+
2739 2637 2536 2235 1835 1536 1437 1339 1342 1444
159+
1646 1947 2348 2549 2651 2653 2555 2256 1856 -13035
160+
C9 709 21 20 28
161+
-2649 2546 2344 2043 1943 1644 1446 1349 1350 1453
162+
1655 1956 2056 2355 2553 2649 2644 2539 2336 2035
163+
1835 1536 1438 -13035
164+
C. 710 21 10 28
165+
-1537 1436 1535 1636 1537 -12035
166+
C, 711 21 10 28
167+
-1636 1535 1436 1537 1636 1634 1532 1431 -12035
168+
C: 712 21 10 28
169+
-1549 1448 1547 1648 1549 -1537 1436 1535 1636 1537
170+
-12035
171+
C; 713 21 10 28
172+
-1549 1448 1547 1648 1549 -1636 1535 1436 1537 1636
173+
1634 1532 1431 -12035
174+
C! 714 21 10 28
175+
-1556 1542 -1537 1436 1535 1636 1537 -12035
176+
C? 715 21 18 28
177+
-1351 1352 1454 1555 1756 2156 2355 2454 2552 2550
178+
2448 2347 1945 1942 -1937 1836 1935 2036 1937 -12835
179+
C/ 720 21 22 28
180+
-3060 1228 -13235
181+
C( 721 21 14 28
182+
-2160 1958 1755 1551 1446 1442 1537 1733 1930 2128
183+
-12435
184+
C) 722 21 14 28
185+
-1360 1558 1755 1951 2046 2042 1937 1733 1530 1328
186+
-12435
187+
C- 724 21 26 28
188+
-1444 3244 -13635
189+
C* 728 21 16 28
190+
-1850 1838 -1347 2341 -2347 1341 -12635
191+
C 699 21 16 28
192+
-12635
193+

ncd.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ def ncd_calc(glob_files, compressor = "zlib", level = 6, precompressed = 0):
3232
from itertools import combinations
3333
file_datas = {}
3434
files = glob(glob_files+'*')
35-
35+
3636
for filec in files:
3737
file_datas[filec] = open(filec).read()
3838

0 commit comments

Comments
 (0)