Skip to content

Commit 09d3e4d

Browse files
committed
update docs
1 parent 9aebeed commit 09d3e4d

File tree

3 files changed

+49
-48
lines changed

3 files changed

+49
-48
lines changed

introduction/index.html

+15-15
Original file line numberDiff line numberDiff line change
@@ -1482,15 +1482,15 @@ <h1>Introduction</h1>
14821482
class="gdoc-markdown__link"
14831483
href="https://bioinf.shenwei.me/LexicMap/introduction/#searching"
14841484
>fast and memory-efficient</a></strong>.</li>
1485-
<li>LexicMap is easy to <a
1485+
<li>LexicMap is <strong>easy to <a
14861486
class="gdoc-markdown__link"
14871487
href="http://bioinf.shenwei.me/LexicMap/installation/"
1488-
>install</a>,
1488+
>install</a></strong>,
14891489
we provide <a
14901490
class="gdoc-markdown__link"
14911491
href="https://github.com/shenwei356/LexicMap/releases/"
14921492
>binary files</a> with no dependencies for Linux, Windows, MacOS (x86 and arm CPUs).</li>
1493-
<li>LexicMap is easy to use (<a
1493+
<li>LexicMap is <strong>easy to use</strong> (<a
14941494
class="gdoc-markdown__link"
14951495
href="http://bioinf.shenwei.me/LexicMap/tutorials/index/"
14961496
>tutorials</a> and <a
@@ -1542,7 +1542,7 @@ <h1>Introduction</h1>
15421542
<li><strong>We added the support of suffix matching of seeds, making seeds much more tolerant to mutations</strong>. Any 31-bp seed with a common ≥15 bp prefix or suffix can be matched, which means <strong>seeds are immune to any single SNP</strong>.</li>
15431543
</ol>
15441544
</li>
1545-
<li>A multi-level index enables fast and low-memory variable-length seed matching and chaining.</li>
1545+
<li>A hierarchical index enables fast and low-memory variable-length seed matching and chaining.</li>
15461546
<li>A pseudo alignment algorithm is used to find similar sequence regions from chaining results for alignment.</li>
15471547
<li>A <a
15481548
class="gdoc-markdown__link"
@@ -1761,9 +1761,9 @@ <h1>Introduction</h1>
17611761
<tr>
17621762
<td style="text-align:left">GTDB complete</td>
17631763
<td style="text-align:right">402,538</td>
1764-
<td style="text-align:right">578 GB</td>
1764+
<td style="text-align:right">443 GB</td>
17651765
<td style="text-align:left">LexicMap</td>
1766-
<td style="text-align:right">906 GB</td>
1766+
<td style="text-align:right">973 GB</td>
17671767
<td style="text-align:right">10 h 36 m</td>
17681768
<td style="text-align:right">63.3 GB</td>
17691769
</tr>
@@ -1772,16 +1772,16 @@ <h1>Introduction</h1>
17721772
<td style="text-align:right"></td>
17731773
<td style="text-align:right"></td>
17741774
<td style="text-align:left">Blastn</td>
1775-
<td style="text-align:right">360 GB</td>
1775+
<td style="text-align:right">387 GB</td>
17761776
<td style="text-align:right">3 h 11 m</td>
17771777
<td style="text-align:right">718 MB</td>
17781778
</tr>
17791779
<tr>
17801780
<td style="text-align:left">AllTheBacteria HQ</td>
17811781
<td style="text-align:right">1,858,610</td>
1782-
<td style="text-align:right">3.1 TB</td>
1782+
<td style="text-align:right">2.5 TB</td>
17831783
<td style="text-align:left">LexicMap</td>
1784-
<td style="text-align:right">3.88 TB</td>
1784+
<td style="text-align:right">4.26 TB</td>
17851785
<td style="text-align:right">48 h 08 m</td>
17861786
<td style="text-align:right">88.6 GB</td>
17871787
</tr>
@@ -1790,7 +1790,7 @@ <h1>Introduction</h1>
17901790
<td style="text-align:right"></td>
17911791
<td style="text-align:right"></td>
17921792
<td style="text-align:left">Blastn</td>
1793-
<td style="text-align:right">1.76 TB</td>
1793+
<td style="text-align:right">1.93 TB</td>
17941794
<td style="text-align:right">14 h 03 m</td>
17951795
<td style="text-align:right">2.9 GB</td>
17961796
</tr>
@@ -1806,9 +1806,9 @@ <h1>Introduction</h1>
18061806
<tr>
18071807
<td style="text-align:left">Genbank+RefSeq</td>
18081808
<td style="text-align:right">2,340,672</td>
1809-
<td style="text-align:right">3.5 TB</td>
1809+
<td style="text-align:right">2.7 TB</td>
18101810
<td style="text-align:left">LexicMap</td>
1811-
<td style="text-align:right">4.94 TB</td>
1811+
<td style="text-align:right">5.43 TB</td>
18121812
<td style="text-align:right">54 h 33 m</td>
18131813
<td style="text-align:right">178.3 GB</td>
18141814
</tr>
@@ -1817,7 +1817,7 @@ <h1>Introduction</h1>
18171817
<td style="text-align:right"></td>
18181818
<td style="text-align:right"></td>
18191819
<td style="text-align:left">Blastn</td>
1820-
<td style="text-align:right">2.15 TB</td>
1820+
<td style="text-align:right">2.37 TB</td>
18211821
<td style="text-align:right">14 h 04 m</td>
18221822
<td style="text-align:right">4.3 GB</td>
18231823
</tr>
@@ -1914,8 +1914,8 @@ <h1>Introduction</h1>
19141914
<td style="text-align:left">LexicMap</td>
19151915
<td style="text-align:right">3,867,003</td>
19161916
<td style="text-align:right">2,228,339</td>
1917-
<td style="text-align:right">1,165 s</td>
1918-
<td style="text-align:right">20.2 GB</td>
1917+
<td style="text-align:right">1,254 s</td>
1918+
<td style="text-align:right">21.4 GB</td>
19191919
</tr>
19201920
<tr>
19211921
<td style="text-align:left"></td>

search/en.data.min.json

+1-1
Large diffs are not rendered by default.

tutorials/index/index.html

+33-32
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
"url" : "https://bioinf.shenwei.me/LexicMap/tutorials/index/",
6060
"headline": "Step 1. Building a database",
6161
"description": "Table of contents Table of contents TL;DR Input Hardware requirements Algorithm Parameters Steps Output File structure Index size Explore the index TL;DR Prepare input files: Sequences of each reference genome should be saved in separate FASTA\/Q files, with identifiers in the file names. E.g., GCF_000006945.2.fna.gz While if you save a few small (viral) complete genomes (one sequence per genome) in each file, it’s feasible as sequence IDs in search result can help to distinguish targe genomes.",
62-
"wordCount" : "2840",
62+
"wordCount" : "2851",
6363
"inLanguage": "en",
6464
"isFamilyFriendly": "true",
6565
"mainEntityOfPage": {
@@ -2045,12 +2045,12 @@ <h1>Step 1. Building a database</h1>
20452045
</label>
20462046
<div class="gdoc-markdown--nested gdoc-tabs__content">
20472047
<pre><code># 15 genomes
2048-
demo.lmi: 69.89 MB
2049-
56.65 MB seeds
2050-
12.93 MB genomes
2051-
312.53 KB masks.bin
2052-
375.00 B genomes.map.bin
2053-
323.00 B info.toml
2048+
demo.lmi: 73.30 MB (73,297,328)
2049+
59.41 MB seeds
2050+
13.57 MB genomes
2051+
320.03 kB masks.bin
2052+
375 B genomes.map.bin
2053+
323 B info.toml
20542054
</code></pre>
20552055

20562056
</div>
@@ -2066,12 +2066,12 @@ <h1>Step 1. Building a database</h1>
20662066
</label>
20672067
<div class="gdoc-markdown--nested gdoc-tabs__content">
20682068
<pre><code># 85,205 genomes
2069-
gtdb_repr.lmi: 212.48 GB
2070-
145.69 GB seeds
2071-
66.78 GB genomes
2072-
2.03 MB genomes.map.bin
2073-
312.53 KB masks.bin
2074-
329.00 B info.toml
2069+
gtdb_repr.lmi: 228.15 GB (228,149,871,198)
2070+
156.44 GB seeds
2071+
71.71 GB genomes
2072+
2.13 MB genomes.map.bin
2073+
320.03 kB masks.bin
2074+
329 B info.toml
20752075
</code></pre>
20762076

20772077
</div>
@@ -2087,12 +2087,12 @@ <h1>Step 1. Building a database</h1>
20872087
</label>
20882088
<div class="gdoc-markdown--nested gdoc-tabs__content">
20892089
<pre><code># 402,538 genomes
2090-
gtdb_complete.lmi: 906.04 GB
2091-
543.06 GB seeds
2092-
362.98 GB genomes
2093-
9.60 MB genomes.map.bin
2094-
312.53 KB masks.bin
2095-
330.00 B info.toml
2090+
gtdb_complete.lmi: 972.85 GB (972,854,821,322)
2091+
583.10 GB seeds
2092+
389.74 GB genomes
2093+
10.06 MB genomes.map.bin
2094+
320.03 kB masks.bin
2095+
330 B info.toml
20962096
</code></pre>
20972097

20982098
</div>
@@ -2108,12 +2108,13 @@ <h1>Step 1. Building a database</h1>
21082108
</label>
21092109
<div class="gdoc-markdown--nested gdoc-tabs__content">
21102110
<pre><code># 2,340,672 genomes
2111-
genbank_refseq.lmi: 4.94 TB
2112-
2.77 TB seeds
2113-
2.17 TB genomes
2114-
55.81 MB genomes.map.bin
2115-
312.53 KB masks.bin
2116-
332.00 B info.toml
2111+
genbank_refseq.lmi: 5.43 TB (5,428,824,803,581)
2112+
3.04 TB seeds
2113+
2.38 TB genomes
2114+
821.17 MB kmers-m12345.tsv
2115+
58.52 MB genomes.map.bin
2116+
320.03 kB masks.bin
2117+
332 B info.toml
21172118
</code></pre>
21182119

21192120
</div>
@@ -2129,12 +2130,12 @@ <h1>Step 1. Building a database</h1>
21292130
</label>
21302131
<div class="gdoc-markdown--nested gdoc-tabs__content">
21312132
<pre><code># 1,858,610 genomes
2132-
atb_hq.lmi: 3.88 TB
2133-
2.11 TB seeds
2134-
1.77 TB genomes
2135-
39.22 MB genomes.map.bin
2136-
312.53 KB masks.bin
2137-
332.00 B info.toml
2133+
atb_hq.lmi: 4.26 TB (4,261,437,129,065)
2134+
2.32 TB seeds
2135+
1.94 TB genomes
2136+
41.12 MB genomes.map.bin
2137+
320.03 kB masks.bin
2138+
332 B info.toml
21382139
</code></pre>
21392140

21402141
</div>
@@ -2144,7 +2145,7 @@ <h1>Step 1. Building a database</h1>
21442145
<li>Directory/file sizes are counted with <a
21452146
class="gdoc-markdown__link"
21462147
href="https://github.com/shenwei356/dirsize"
2147-
>https://github.com/shenwei356/dirsize</a>. (base: 1024)</li>
2148+
>https://github.com/shenwei356/dirsize</a> v1.2.1 (<code>dirsize -k</code>, base: 1000).</li>
21482149
<li>Index building parameters: <code>-k 31 -m 40000</code>. Genome batch size: <code>-b 5000</code> for GTDB datasets, <code>-b 25000</code> for others.</li>
21492150
</ul>
21502151
<div class="flex align-center gdoc-page__anchorwrap">

0 commit comments

Comments
 (0)