Skip to content

Commit

Permalink
Fix documentation on domain architecture creation
Browse files Browse the repository at this point in the history
add sample files
  • Loading branch information
Damianos Melidis committed May 8, 2020
1 parent 523a4a8 commit 0c8bcc2
Show file tree
Hide file tree
Showing 3 changed files with 202 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ The main dependencies are listed below:
* Select the output domain annotation **type**: overlap, non overlapping or non redundant. Then set if *GAP* domain is also added to annotations.
Change folder/files paths appropropriately and uncomment the first section in [main.py](code/main.py)
* Parse domain hits per protein running `main.py`
* id_domains_type.tab file will be created; a sample of the first 100 lines of the full file, for non overlapping with *GAP*, is saved at [sample](domain_architecture_creation/id_domains_no_overlap_gap_sample_100.tab)
* id_domains_type.tab file will be created; a sample of the first 100 lines of the full file, for non overlapping with *GAP*, is saved at [sample file](domain_architecture_creation/id_domains_no_overlap_gap_sample_100.tab)

3. Get domain architecture corpus
* Change folder/files paths appropriately and uncomment the first section in [main.py](code/main.py)
* run `main.py`
* domains_corpus_type.txt file will be created; sample of the first 100 line of the full file, for non overlapping with *GAP*, is saved at [sample](domain_architecture_creation/domains_corpus_no_overlap_gap_sample_100.txt)
* domains_corpus_type.txt file will be created; sample of the first 100 line of the full file, for non overlapping with *GAP*, is saved at [sample file](domain_architecture_creation/domains_corpus_no_overlap_gap_sample_100.txt)

## Intrinsic evaluation - WIP
Data and example running experiments for:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
IPR015424
IPR039421
IPR036640 IPR003439
IPR036291
IPR006426
GAP IPR029055 GAP
IPR034466 GAP
GAP IPR029044 GAP
IPR003696 GAP IPR003696 IPR038152
IPR001036
IPR009241
IPR025161
IPR009057
IPR025668
IPR012337
IPR038965
IPR038717
IPR032694
IPR009000 IPR004160
IPR036397
IPR006291
IPR027417
IPR025668
IPR006119 GAP
IPR025161
IPR009057 IPR011075
IPR027806
IPR009057
GAP IPR025528
IPR009057
IPR011707 IPR008972
IPR010921
IPR036388
IPR025668
IPR025668
IPR012337 GAP
IPR027417
IPR025668
GAP IPR000160
IPR009057
GAP IPR031726
IPR009057
GAP IPR027805
IPR036249
IPR039426
IPR027417 GAP
IPR025161 IPR002559
GAP IPR001387
IPR025668
IPR009057
IPR009057
IPR008972
IPR009057
IPR025668
IPR029058
IPR002559 GAP
IPR038390
GAP IPR027417
IPR036397
GAP IPR001296
IPR027806
IPR002559
IPR003220 GAP IPR005063
IPR025246 GAP
GAP IPR027805 GAP
IPR023205
IPR025161
IPR001584
IPR012337 GAP
GAP IPR025948 IPR012337
IPR005538
IPR039554
IPR029016 IPR036388
IPR012337
GAP IPR003715 GAP
GAP IPR032807 GAP IPR005702
IPR036388
IPR006119 IPR006120
IPR039060
IPR002622
IPR025161
IPR008878
IPR036388
IPR017511
IPR038717
IPR025668
IPR011547 GAP IPR036513
IPR008878
IPR036397
IPR006121
IPR000653
IPR003361
IPR002514 GAP
IPR025246 GAP IPR001584
IPR036188
IPR036766
IPR005888
GAP IPR001584
IPR030389
IPR015424
100 changes: 100 additions & 0 deletions domain_architecture_creation/id_domains_no_overlap_gap_sample_100.tab
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
uniprot_id interpro_ids evidence_db_ids
A0A000 IPR015424 SSF53383
A0A001 IPR039421 PTHR24221
A0A002 IPR036640 IPR003439 G3DSA:1.20.1560.10 PS50893
A0A003 IPR036291 SSF51735
A0A004 IPR006426 PIRSF001589
A0A006 GAP IPR029055 GAP gap_no_evid G3DSA:3.60.20.10 gap_no_evid
A0A007 IPR034466 GAP SFLDG01123 gap_no_evid
A0A008 GAP IPR029044 GAP gap_no_evid SSF53448 gap_no_evid
A0A009 IPR003696 GAP IPR003696 IPR038152 PF02543 gap_no_evid PF02543 G3DSA:3.90.870.20
A0A009DWE1 IPR001036 PTHR32063
A0A009DWF5 IPR009241 PF05973
A0A009DWF8 IPR025161 PF13340
A0A009DWH9 IPR009057 SSF46689
A0A009DWI3 IPR025668 PF13612
A0A009DWJ5 IPR012337 SSF53098
A0A009DWL0 IPR038965 PTHR42648
A0A009DWL5 IPR038717 PF13358
A0A009DWM1 IPR032694 PTHR34820
A0A009DWN1 IPR009000 IPR004160 SSF50447 PF03143
A0A009DWP3 IPR036397 G3DSA:3.30.420.10
A0A009DWW6 IPR006291 TIGR01387
A0A009DWX0 IPR027417 SSF52540
A0A009DWZ5 IPR025668 PF13737
A0A009DXE7 IPR006119 GAP SM00857 gap_no_evid
A0A009DXK7 IPR025161 PF13340
A0A009DXM2 IPR009057 IPR011075 SSF46689 PF16925
A0A009DXP2 IPR027806 PF13359
A0A009DXR6 IPR009057 SSF46689
A0A009DXU1 GAP IPR025528 gap_no_evid PF14384
A0A009DXW5 IPR009057 SSF46689
A0A009DXY7 IPR011707 IPR008972 PF07732 G3DSA:2.60.40.420
A0A009DY31 IPR010921 SSF48295
A0A009DY47 IPR036388 G3DSA:1.10.10.10
A0A009DYC8 IPR025668 PF13737
A0A009DYF2 IPR025668 PF13737
A0A009DYI3 IPR012337 GAP SSF53098 gap_no_evid
A0A009DYX5 IPR027417 SSF52540
A0A009DZ29 IPR025668 PF13737
A0A009DZ65 GAP IPR000160 gap_no_evid SM00267
A0A009DZ91 IPR009057 SSF46689
A0A009DZA9 GAP IPR031726 gap_no_evid PF15864
A0A009DZN6 IPR009057 SSF46689
A0A009DZV4 GAP IPR027805 gap_no_evid PF13613
A0A009E034 IPR036249 SSF52833
A0A009E0A3 IPR039426 PTHR30069
A0A009E0P1 IPR027417 GAP SSF52540 gap_no_evid
A0A009E0R3 IPR025161 IPR002559 PF13340 PF01609
A0A009E0R9 GAP IPR001387 gap_no_evid SM00530
A0A009E0T7 IPR025668 PF13612
A0A009E0W0 IPR009057 SSF46689
A0A009E0W4 IPR009057 SSF46689
A0A009E0Y6 IPR008972 G3DSA:2.60.40.420
A0A009E130 IPR009057 SSF46689
A0A009E1R1 IPR025668 PF13737
A0A009E233 IPR029058 G3DSA:3.40.50.1820
A0A009E282 IPR002559 GAP PF01609 gap_no_evid
A0A009E2B1 IPR038390 G3DSA:1.20.58.1000
A0A009E2X0 GAP IPR027417 gap_no_evid SSF52540
A0A009E3I5 IPR036397 G3DSA:3.30.420.10
A0A009E3M2 GAP IPR001296 gap_no_evid PF00534
A0A009E3R0 IPR027806 PF13359
A0A009E3R5 IPR002559 PF01609
A0A009E4H8 IPR003220 GAP IPR005063 PF03811 gap_no_evid PF03400
A0A009E4L8 IPR025246 GAP PF13936 gap_no_evid
A0A009E4X5 GAP IPR027805 GAP gap_no_evid PF13613 gap_no_evid
A0A009E5D3 IPR023205 PIRSF001488
A0A009E5S5 IPR025161 PF13340
A0A009E5V4 IPR001584 PS50994
A0A009E6A1 IPR012337 GAP SSF53098 gap_no_evid
A0A009E6I2 GAP IPR025948 IPR012337 gap_no_evid PF13276 SSF53098
A0A009E6U8 IPR005538 PTHR33931
A0A009E753 IPR039554 PF13744
A0A009E759 IPR029016 IPR036388 G3DSA:3.30.450.40 G3DSA:1.10.10.10
A0A009E7J5 IPR012337 SSF53098
A0A009E7X4 GAP IPR003715 GAP gap_no_evid PF02563 gap_no_evid
A0A009E7X8 GAP IPR032807 GAP IPR005702 gap_no_evid PF13807 gap_no_evid TIGR01007
A0A009E8H7 IPR036388 G3DSA:1.10.10.10
A0A009E8Q3 IPR006119 IPR006120 PS51736 PF02796
A0A009E8Z6 IPR039060 PTHR40455
A0A009E921 IPR002622 PF01710
A0A009E935 IPR025161 PF13340
A0A009E947 IPR008878 PTHR36455
A0A009E952 IPR036388 G3DSA:1.10.10.10
A0A009E983 IPR017511 PTHR32303:SF4
A0A009E995 IPR038717 PF13358
A0A009E9A3 IPR025668 PF13586
A0A009E9E8 IPR011547 GAP IPR036513 PF00916 gap_no_evid G3DSA:3.30.750.24
A0A009E9F6 IPR008878 PTHR36455
A0A009E9G3 IPR036397 G3DSA:3.30.420.10
A0A009E9H9 IPR006121 PS50846
A0A009E9M2 IPR000653 PIRSF000390
A0A009E9M7 IPR003361 PIRSF015689
A0A009E9Q4 IPR002514 GAP PF01527 gap_no_evid
A0A009E9S2 IPR025246 GAP IPR001584 PF13936 gap_no_evid PS50994
A0A009E9U8 IPR036188 G3DSA:3.50.50.60
A0A009EAB1 IPR036766 G3DSA:1.10.3880.10
A0A009EAE1 IPR005888 TIGR01181
A0A009EAJ8 GAP IPR001584 gap_no_evid PS50994
A0A009EAK7 IPR030389 PS51711

0 comments on commit 0c8bcc2

Please sign in to comment.