Skip to content

Commit 91d1474

Browse files
authored
Merge pull request #23 from RNAcentral/import-ena-with-python
Import ena with python
2 parents 7d80bda + 24d72ef commit 91d1474

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+154637
-202
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,3 +67,4 @@ data/Homo_sapiens.GRCh38.87.chromosome.*.dat
6767
data/Mus_musculus.GRCm38.87.chromosome.*.dat
6868
data/pub/
6969
luigi.cfg
70+
logging.cfg

data/ena/anticodon-in-note.embl

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
rel_std_pro_04_r133.ncr
2+
ID CP000102.1:337323..337406:tRNA; SV 1; linear; genomic DNA; STD; PRO; 84 BP.
3+
XX
4+
PA CP000102.1
5+
XX
6+
PR Project:PRJNA15579;
7+
XX
8+
DT 06-JAN-2006 (Rel. 86, Created)
9+
DT 15-MAY-2014 (Rel. 120, Last updated, Version 6)
10+
XX
11+
DE Methanosphaera stadtmanae DSM 3091 tRNA-Leu
12+
XX
13+
KW .
14+
XX
15+
OS Methanosphaera stadtmanae DSM 3091
16+
OC Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales;
17+
OC Methanobacteriaceae; Methanosphaera.
18+
XX
19+
RN [1]
20+
RX DOI; 10.1128/JB.188.2.642-658.2006.
21+
RX PUBMED; 16385054.
22+
RA Fricke W.F., Seedorf H., Henne A., Kruer M., Liesegang H., Hedderich R.,
23+
RA Gottschalk G., Thauer R.K.;
24+
RT "The genome sequence of Methanosphaera stadtmanae reveals why this human
25+
RT intestinal archaeon is restricted to methanol and H2 for methane formation
26+
RT and ATP synthesis";
27+
RL J. Bacteriol. 188(2):642-658(2006).
28+
XX
29+
RN [2]
30+
RA Fricke W.F., Seedorf H., Henne A., Kruer M., Liesegang H., Hedderich R.,
31+
RA Gottschalk G., Thauer R.K.;
32+
RT ;
33+
RL Submitted (05-AUG-2005) to the INSDC.
34+
RL Institute of Microbiology and Genetics, Georg August University Goettingen,
35+
RL Goettingen Genomics Laboratory, Grisebach Str 8, Goettingen D-37077,
36+
RL Germany
37+
XX
38+
DR MD5; 5caa01d86f18febcf564b491596b89dd.
39+
DR BioSample; SAMN02604180.
40+
XX
41+
FH Key Location/Qualifiers
42+
FH
43+
FT source 1..84
44+
FT /organism="Methanosphaera stadtmanae DSM 3091"
45+
FT /strain="DSM 3091"
46+
FT /mol_type="genomic DNA"
47+
FT /db_xref="taxon:339860"
48+
FT tRNA CP000102.1:337323..337406
49+
FT /operon="trnA"
50+
FT /locus_tag="Msp_0274"
51+
FT /product="tRNA-Leu"
52+
FT /note="codon recognized: CUA"
53+
XX
54+
SQ Sequence 84 BP; 13 A; 26 C; 31 G; 14 T; 0 other;
55+
gcgggggtgc ccgagctggc caaaggggac aggcttagga cctgttggcg taggcctccc 60
56+
agggttcgaa tccctgctcc cgca 84
57+
//
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
ID HG975378.1:1..299:ncRNA; SV 1; linear; transcribed RNA; STD; HUM; 299 BP.
2+
XX
3+
PA HG975378.1
4+
XX
5+
PR Project:PRJEB6238;
6+
XX
7+
DT 07-MAY-2014 (Rel. 120, Created)
8+
DT 03-MAR-2015 (Rel. 124, Last updated, Version 2)
9+
XX
10+
DE Homo sapiens (human) Small nucleolar RNA 7SL
11+
XX
12+
KW RNAcentral; TPA; TPA:specialist_db.
13+
XX
14+
OS Homo sapiens (human)
15+
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
16+
OC Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
17+
OC Homo.
18+
XX
19+
RN [1]
20+
RP 1-299
21+
RG lncRNAdb and RNAcentral
22+
RA ;
23+
RT ;
24+
RL Submitted (30-APR-2014) to the INSDC.
25+
XX
26+
RN [2]
27+
RX DOI; 10.1093/nar/gkq1138.
28+
RX PUBMED; 21112873.
29+
RA Amaral P.P., Clark M.B., Gascoigne D.K., Dinger M.E., Mattick J.S.;
30+
RT "lncRNAdb: a reference database for long noncoding RNAs";
31+
RL Nucleic Acids Res. 39(Database issue):D146-D151(2011).
32+
XX
33+
DR lncRNAdb; 140; 7SL.
34+
DR MD5; 020711a90d35bb197e29e085595dd52e.
35+
XX
36+
CC lncRNAdb; 140; 7SL.
37+
CC
38+
CC Specialist DB : lncRNAdb (The long non-coding RNA database)
39+
CC URL : http://www/lncrnadb.org/
40+
XX
41+
AH LOCAL_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
42+
AS 1-299 X04248.1 1-299
43+
XX
44+
FH Key Location/Qualifiers
45+
FH
46+
FT source 1..299
47+
FT /organism="Homo sapiens"
48+
FT /mol_type="transcribed RNA"
49+
FT /db_xref="taxon:9606"
50+
FT ncRNA HG975378.1:1..299
51+
FT /gene="RN7SL1"
52+
FT /product="Small nucleolar RNA 7SL"
53+
FT /note="biotype:SRP_RNA"
54+
FT /note="ECO:0000305"
55+
FT /note="SO:0000590"
56+
FT /note="GO:0006617"
57+
FT /note="GO:0048501"
58+
FT /experiment="EXISTENCE: lncRNAdb literature review [PMID:
59+
FT 12244299,6196367,6181418,6802847,3403542,6084597,10924331,
60+
FT 7528809,7529207,1704372,20610725,18617187,17881443,
61+
FT 17164479,8389475,10834842,10684931,15611297,20668672,
62+
FT 911771,6209580]"
63+
FT /ncRNA_class="SRP_RNA"
64+
XX
65+
SQ Sequence 299 BP; 56 A; 83 C; 105 G; 55 T; 0 other;
66+
gccgggcgcg gtggcgcgtg cctgtagtcc cagctactcg ggaggctgag gctggaggat 60
67+
cgcttgagtc caggagttct gggctgtagt gcgctatgcc gatcgggtgt ccgcactaag 120
68+
ttcggcatca atatggtgac ctcccgggag cgggggacca ccaggttgcc taaggagggg 180
69+
tgaaccggcc caggtcggaa acggagcagg tcaaaactcc cgtgctgatc agtagtggga 240
70+
tcgcgcctgt gaatagccac tgcactccag cctgggcaac atagcgagac cccgtctct 299
71+
//

data/ena/function.embl

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
ID EU410654.1:1..92:ncRNA; SV 1; linear; genomic DNA; STD; PLN; 92 BP.
2+
XX
3+
PA EU410654.1
4+
XX
5+
DT 19-FEB-2008 (Rel. 94, Created)
6+
DT 04-JUN-2008 (Rel. 96, Last updated, Version 2)
7+
XX
8+
DE Chlamydomonas reinhardtii CrCD10e snoRNA
9+
XX
10+
KW .
11+
XX
12+
OS Chlamydomonas reinhardtii
13+
OC Eukaryota; Viridiplantae; Chlorophyta; Chlorophyceae; Chlamydomonadales;
14+
OC Chlamydomonadaceae; Chlamydomonas.
15+
XX
16+
RN [1]
17+
RP 1-92
18+
RX DOI; 10.1534/genetics.107.086025.
19+
RX PUBMED; 18493037.
20+
RA Chen C.L., Chen C.J., Vallon O., Huang Z.P., Zhou H., Qu L.H.;
21+
RT "Genomewide analysis of box C/D and box H/ACA snoRNAs in Chlamydomonas
22+
RT reinhardtii reveals an extensive organization into intronic gene clusters";
23+
RL Genetics 179(1):21-30(2008).
24+
XX
25+
RN [2]
26+
RP 1-92
27+
RA Chen C.-L., Chen C.-J., Vallon O., Huang Z.-P., Zhou H., Qu L.-H.;
28+
RT ;
29+
RL Submitted (18-JAN-2008) to the INSDC.
30+
RL Zhongshan University, Key Laboratory of Gene Engineering of the Ministry of
31+
RL Education, Biotechnology Research Center, Guangzhou 510275, People's
32+
RL Republic of China
33+
XX
34+
DR MD5; 5c4a1c47bfd172e4474c757c89f6ce6c.
35+
XX
36+
CC Determined from Chlamydomonas reinhardtii genome in GenBank
37+
CC Accession Number ABCN00000000.
38+
XX
39+
FH Key Location/Qualifiers
40+
FH
41+
FT source 1..92
42+
FT /organism="Chlamydomonas reinhardtii"
43+
FT /mol_type="genomic DNA"
44+
FT /db_xref="taxon:3055"
45+
FT ncRNA EU410654.1:1..92
46+
FT /product="CrCD10e snoRNA"
47+
FT /function="guide for 26S rRNA methylation at U1043"
48+
FT /note="C/D box snoRNA"
49+
FT /ncRNA_class="snoRNA"
50+
XX
51+
SQ Sequence 92 BP; 22 A; 28 C; 22 G; 20 T; 0 other;
52+
gcggcgatga ctgccctcga gtgcctaact cgtcttacct attttcagag tcccctcaag 60
53+
gggcgcgctg ctgaaacaaa tacacaatga gc 92
54+
//
55+
ID AB046489.1:221..306:tRNA; SV 1; linear; genomic DNA; STD; VRT; 86 BP.
56+
XX
57+
PA AB046489.1
58+
XX
59+
DT 20-NOV-2003 (Rel. 77, Created)
60+
DT 29-DEC-2010 (Rel. 107, Last updated, Version 2)
61+
XX
62+
DE Eurypharynx pelecanoides (pelican eel) tRNA
63+
XX
64+
KW .
65+
XX
66+
OS Eurypharynx pelecanoides (pelican eel)
67+
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
68+
OC Actinopterygii; Neopterygii; Teleostei; Anguilliformes; Eurypharyngidae;
69+
OC Eurypharynx.
70+
OG Mitochondrion
71+
XX
72+
RN [1]
73+
RA Inoue J., Miya M., Tsukamoto K., Nishida M.;
74+
RT ;
75+
RL Submitted (23-JUL-2000) to the INSDC.
76+
RL Contact:Jun Inoue Atmosphere and Ocean Research Institute, The University
77+
RL of Tokyo; 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8564, Japan
78+
XX
79+
RN [2]
80+
RX DOI; 10.1093/molbev/msg206.
81+
RX PUBMED; 12949142.
82+
RA Inoue J., Miya M., Tsukamoto K., Nishida M.;
83+
RT "Evolution of the deep-sea gulper eel mitochondrial genomes: large-scale
84+
RT gene rearrangements originated within the eels";
85+
RL Mol. Biol. Evol. 20(11):1917-1924(2003).
86+
XX
87+
DR MD5; 84f7c99e40be8ba95932912d86d9ce43.
88+
XX
89+
FH Key Location/Qualifiers
90+
FH
91+
FT source 1..86
92+
FT /organism="Eurypharynx pelecanoides"
93+
FT /organelle="mitochondrion"
94+
FT /isolate="D"
95+
FT /mol_type="genomic DNA"
96+
FT /db_xref="taxon:55117"
97+
FT tRNA complement(AB046489.1:221..306)
98+
FT /function="tRNA-Pro"
99+
XX
100+
SQ Sequence 86 BP; 16 A; 16 C; 25 G; 29 T; 0 other;
101+
aagttagaga gcgatttaat ttatgacccc ggccttgggg ccttggggtc ttggggatgg 60
102+
gagtttaatt tcccccctta attttg 86
103+
//
104+
ID CP003783.1:1548698..1548818:ncRNA; SV 1; linear; genomic DNA; STD; PRO; 121 BP.
105+
XX
106+
PA CP003783.1
107+
XX
108+
PR Project:PRJNA172244;
109+
XX
110+
DT 29-AUG-2012 (Rel. 113, Created)
111+
DT 15-MAY-2014 (Rel. 120, Last updated, Version 6)
112+
XX
113+
DE Bacillus subtilis QB928 sporulation-specific regulatory RNA
114+
XX
115+
KW .
116+
XX
117+
OS Bacillus subtilis QB928
118+
OC Bacteria; Firmicutes; Bacilli; Bacillales; Bacillaceae; Bacillus.
119+
XX
120+
RN [1]
121+
RX PUBMED; 23105055.
122+
RA Yu C.S., Yim K.Y., Tsui S.K., Chan T.F.;
123+
RT "Complete Genome Sequence of Bacillus subtilis Strain QB928, a Strain
124+
RT Widely Used in B. subtilis Genetic Studies";
125+
RL J. Bacteriol. 194(22):6308-6309(2012).
126+
XX
127+
RN [2]
128+
RA Yu C.-S., Yim K.-Y., Mat W.-K., Tsui S.K., Wong J.T., Chan T.-F.;
129+
RT ;
130+
RL Submitted (19-AUG-2012) to the INSDC.
131+
RL School of Life Sciences, The Chinese University of Hong Kong, G12, Run Run
132+
RL Shaw Science Building, The Chinese University of Hong Kong, Shatin, N.T.,
133+
RL Hong Kong, China
134+
XX
135+
DR MD5; 4d454fc72801b39b5f4d10cbc60aca39.
136+
DR BioSample; SAMN02603911.
137+
XX
138+
CC Bacillus subtilis str. QB928 is available from Bacillus Genetic
139+
CC Stock Center (BGSC).
140+
XX
141+
FH Key Location/Qualifiers
142+
FH
143+
FT source 1..121
144+
FT /organism="Bacillus subtilis QB928"
145+
FT /strain="QB928"
146+
FT /mol_type="genomic DNA"
147+
FT /db_xref="taxon:1220533"
148+
FT ncRNA CP003783.1:1548698..1548818
149+
FT /gene="csfG"
150+
FT /locus_tag="B657_miscRNA23"
151+
FT /product="sporulation-specific regulatory RNA"
152+
FT /function="1.8 : Sporulation"
153+
FT /ncRNA_class="other"
154+
XX
155+
SQ Sequence 121 BP; 39 A; 23 C; 35 G; 24 T; 0 other;
156+
ataaaaaaat ccccgcaggc atctgcgggg tccttctatt ccttaatatg ttaaggagaa 60
157+
ggcaaaggga gaggagaaac cggaggaaga acttatgggg aaacgtaagt cttctccgcg 120
158+
g 121
159+
//

data/ena/gene_synonym.embl

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
ID CP000948.1:2011661..2011909:misc_RNA; SV 1; linear; genomic DNA; STD; PRO; 249 BP.
2+
XX
3+
PA CP000948.1
4+
XX
5+
PR Project:PRJNA20079;
6+
XX
7+
DT 26-MAR-2008 (Rel. 95, Created)
8+
DT 16-MAY-2014 (Rel. 120, Last updated, Version 8)
9+
XX
10+
DE Escherichia coli str. K-12 substr. DH10B small RNA
11+
XX
12+
KW .
13+
XX
14+
OS Escherichia coli str. K-12 substr. DH10B
15+
OC Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales;
16+
OC Enterobacteriaceae; Escherichia.
17+
XX
18+
RN [1]
19+
RX DOI; 10.1128/JB.01695-07.
20+
RX PUBMED; 18245285.
21+
RA Durfee T., Nelson R., Baldwin S., Plunkett G.III., Burland V., Mau B.,
22+
RA Petrosino J.F., Qin X., Muzny D.M., Ayele M., Gibbs R.A., Csorgo B.,
23+
RA Posfai G., Weinstock G.M., Blattner F.R.;
24+
RT "The complete genome sequence of Escherichia coli DH10B: insights into the
25+
RT biology of a laboratory workhorse";
26+
RL J. Bacteriol. 190(7):2597-2606(2008).
27+
XX
28+
RN [2]
29+
RA Plunkett G.III.;
30+
RT ;
31+
RL Submitted (20-FEB-2008) to the INSDC.
32+
RL Department of Genetics and Biotechnology, University of Wisconsin, 425G
33+
RL Henry Mall, Madison, WI 53706, USA
34+
XX
35+
DR MD5; 9971e731d14a9b452613a844c1ca14f2.
36+
DR BioSample; SAMN02604262.
37+
XX
38+
CC DH10B and DH10B-T1R are available from Invitrogen Corporation
39+
CC (http://www.invitrogen.com).
40+
XX
41+
FH Key Location/Qualifiers
42+
FH
43+
FT source 1..249
44+
FT /organism="Escherichia coli str. K-12 substr. DH10B"
45+
FT /strain="K-12"
46+
FT /sub_strain="DH10B"
47+
FT /mol_type="genomic DNA"
48+
FT /db_xref="taxon:316385"
49+
FT misc_RNA CP000948.1:2011661..2011909
50+
FT /gene="ryeA"
51+
FT /gene_synonym="IS091"
52+
FT /gene_synonym="sraC"
53+
FT /gene_synonym="tpke79"
54+
FT /locus_tag="ECDH10B_1978"
55+
FT /product="small RNA"
56+
XX
57+
SQ Sequence 249 BP; 73 A; 48 C; 66 G; 62 T; 0 other;
58+
aaagtcagcg aaggaaatgc ttctggcttt taacagataa aaagagaccg aacacgattc 60
59+
ctgtattcgg tccagggaaa tggctcttgg gagagagccg tgcgctaaaa gttggcatta 120
60+
atgcaggctt agttgccttg ccctttaaga atagatgacg acgccaggtt ttccagtttg 180
61+
cgtgcaaaat ggtcaataaa aagcgtggtg gtcatcagct gaaatgttaa aaaccgcccg 240
62+
ttctggtga 249
63+
//

0 commit comments

Comments
 (0)