Skip to content

Commit 375bbf5

Browse files
committed
update doc
1 parent 0b80e1e commit 375bbf5

File tree

4 files changed

+34
-4
lines changed

4 files changed

+34
-4
lines changed

search/en.data.min.json

+1-1
Large diffs are not rendered by default.

tutorials/index/index.html

+2-1
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@
6565
"url" : "https://bioinf.shenwei.me/LexicMap/tutorials/index/",
6666
"headline": "Building an index",
6767
"description": "Table of contents Table of contents TL;DR Input Hardware requirements Algorithm Parameters Steps Output File structure Index size Explore the index TL;DR Prepare input files: Sequences of each reference genome should be saved in separate FASTA\/Q files, with identifiers in the file names. E.g., GCF_000006945.2.fna.gz Run: From a directory with multiple genome files:\nlexicmap index -I genomes\/ -O db.lmi From a file list with one file per line:\nlexicmap index -X files.",
68-
"wordCount" : "2744",
68+
"wordCount" : "2763",
6969
"inLanguage": "en",
7070
"isFamilyFriendly": "true",
7171
"mainEntityOfPage": {
@@ -1545,6 +1545,7 @@ <h1>Building an index</h1>
15451545
>GCA_000765055.1</a> has &gt;150 Mb.
15461546
The flag <code>-g/--max-genome</code> (default 15 Mb) is used to skip these input files, and the file list would be written to a file
15471547
via the flag <code>-G/--big-genomes</code>.</li>
1548+
<li><strong>Minimum sequence length</strong>. A flag <code>-l/--min-seq-len</code> can filter out sequences shorter than the threshold (default is the <code>k</code> value).</li>
15481549
</ul>
15491550
</li>
15501551
<li><strong>At most 17,179,869,184 (2<sup>34</sup>) genomes are supported</strong>. For more genomes, just build multiple indexes.</li>

usage/index/index.html

+5-1
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
"url" : "https://bioinf.shenwei.me/LexicMap/usage/index/",
6060
"headline": "index",
6161
"description": "$ lexicmap index -h Generate an index from FASTA\/Q sequences Input: *1. Sequences of each reference genome should be saved in separate FASTA\/Q files, with reference identifiers in the file names. 2. Input plain or gzip\/xz\/zstd\/bzip2 compressed FASTA\/Q files can be given via positional arguments or the flag -X\/--infile-list with a list of input files. Flag -S\/--skip-file-check is optional for skipping file checking if you trust the file list. 3. Input can also be a directory containing sequence files via the flag -I\/--in-dir, with multiple-level sub-directories allowed.",
62-
"wordCount" : "1278",
62+
"wordCount" : "1324",
6363
"inLanguage": "en",
6464
"isFamilyFriendly": "true",
6565
"mainEntityOfPage": {
@@ -1436,6 +1436,7 @@ <h1>index</h1>
14361436
</span></span><span class="line"><span class="cl"> 5. Maximum genome size: 268,435,456.
14371437
</span></span><span class="line"><span class="cl"> More precisely: $total_bases + ($num_contigs - 1) * 1000 &lt;= 268,435,456, as we concatenate contigs with
14381438
</span></span><span class="line"><span class="cl"> 1000-bp intervals of N’s to reduce the sequence scale to index.
1439+
</span></span><span class="line"><span class="cl"> 6. A flag -l/--min-seq-len can filter out sequences shorter than the threshold (default is the k value).
14391440
</span></span><span class="line"><span class="cl">
14401441
</span></span><span class="line"><span class="cl"> Attention:
14411442
</span></span><span class="line"><span class="cl"> *1) ► You can rename the sequence files for convenience, e.g., GCF_000017205.1.fa.gz, because the genome
@@ -1539,6 +1540,9 @@ <h1>index</h1>
15391540
</span></span><span class="line"><span class="cl"> assemblies from Genbank) will be skipped. Need to be smaller than the
15401541
</span></span><span class="line"><span class="cl"> maximum supported genome size: 268435456 (default 15000000)
15411542
</span></span><span class="line"><span class="cl"> --max-open-files int ► Maximum opened files, used in merging indexes. (default 512)
1543+
</span></span><span class="line"><span class="cl"> -l, --min-seq-len int ► Maximum sequence length to index. The value would be k for values
1544+
</span></span><span class="line"><span class="cl"> &lt;= 0 (default -1)
1545+
</span></span><span class="line"><span class="cl"> --no-desert-filling ► Disable sketching desert filling (only for debug).
15421546
</span></span><span class="line"><span class="cl"> -O, --out-dir string ► Output LexicMap index directory.
15431547
</span></span><span class="line"><span class="cl"> --partitions int ► Number of partitions for indexing seeds (k-mer-value data) files.
15441548
</span></span><span class="line"><span class="cl"> (default 512)

usage/utils/kmers/index.html

+26-1
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
"url" : "https://bioinf.shenwei.me/LexicMap/usage/utils/kmers/",
6060
"headline": "kmers",
6161
"description": "$ lexicmap utils kmers -h View k-mers captured by the masks Attention: 1. Mask index (column mask) is 1-based. 2. Prefix means the length of shared prefix between a k-mer and the mask. 3. K-mer positions (column pos) are 1-based. For reference genomes with multiple sequences, the sequences were concatenated to a single sequence with intervals of N\u0027s. 4. Reversed means if the k-mer is reversed for suffix matching. Usage: lexicmap utils kmers [flags] -d \u003cindex path\u003e [-m \u003cmask index\u003e] [-o out.",
62-
"wordCount" : "1003",
62+
"wordCount" : "1197",
6363
"inLanguage": "en",
6464
"isFamilyFriendly": "true",
6565
"mainEntityOfPage": {
@@ -1443,6 +1443,7 @@ <h1>kmers</h1>
14431443
</span></span><span class="line"><span class="cl"> -h, --help help for kmers
14441444
</span></span><span class="line"><span class="cl"> -d, --index string ► Index directory created by &#34;lexicmap index&#34;.
14451445
</span></span><span class="line"><span class="cl"> -m, --mask int ► View k-mers captured by Xth mask. (0 for all) (default 1)
1446+
</span></span><span class="line"><span class="cl"> -f, --only-forward ► Only output forward k-mers.
14461447
</span></span><span class="line"><span class="cl"> -o, --out-file string ► Out file, supports and recommends a &#34;.gz&#34; suffix (&#34;-&#34; for stdout).
14471448
</span></span><span class="line"><span class="cl"> (default &#34;-&#34;)
14481449
</span></span><span class="line"><span class="cl">
@@ -1489,6 +1490,30 @@ <h1>kmers</h1>
14891490
1 AAAAAAAACCATATTATGTCCGATCCTCACA 4 1 GCF_000392875.1 1060650 + yes
14901491
1 AAAAAAAACCCTTCGTCAAGCATTATGGAAT 4 1 GCF_000392875.1 1139573 - yes
14911492
</code></pre>
1493+
<p>Only forward k-mers.</p>
1494+
<pre><code> $ lexicmap utils kmers --quiet -d demo.lmi/ -f | head -n 20 | csvtk pretty -t
1495+
mask kmer prefix number ref pos strand reversed
1496+
---- ------------------------------- ------ ------ --------------- ------- ------ --------
1497+
1 AAAACACCAAAAGCCTCTCCGATAACACCAG 9 1 GCF_002949675.1 2046311 + no
1498+
1 AAAACACCAAAGTTAAAGTGCCGTTTAGCGT 9 1 GCF_003697165.2 1085073 + no
1499+
1 AAAACACCAATTAGTGATTGTGTTTCCTCAA 9 1 GCF_000392875.1 2785764 - no
1500+
1 AAAACACCACAGTGAAAGACAACATTTAATA 9 1 GCF_000392875.1 1132052 - no
1501+
1 AAAACACCACCACAAATGCATAAGAAAACTT 9 1 GCF_003697165.2 2862670 + no
1502+
1 AAAACACCACTCAATCCTTTAAATAAAAACA 9 1 GCF_002949675.1 2467828 - no
1503+
1 AAAACACCACTTTACGGGCGTTTTGTGCAAT 9 1 GCF_003697165.2 4241904 - no
1504+
1 AAAACACCAGCACGTTCAGCACCGCCACCAG 9 1 GCF_000017205.1 4399207 - no
1505+
1 AAAACACCAGCGAACGGAAGAACATCGCGAT 9 1 GCF_003697165.2 248663 + no
1506+
1 AAAACACCAGGCCGGAGCAGAAGGTTATTCT 9 1 GCF_003697165.2 4139632 + no
1507+
1 AAAACACCATAAACGATTGTTGGAATACCCG 10 1 GCF_009759685.1 268158 + no
1508+
1 AAAACACCATCATACACTAAATCAGTAAGTT 10 4 GCF_002949675.1 496925 + no
1509+
1 AAAACACCATCATACACTAAATCAGTAAGTT 10 4 GCF_002949675.1 2254974 + no
1510+
1 AAAACACCATCATACACTAAATCAGTAAGTT 10 4 GCF_002949675.1 2495183 + no
1511+
1 AAAACACCATCATACACTAAATCAGTAAGTT 10 4 GCF_002949675.1 4009312 + no
1512+
1 AAAACACCATGAACGCCAACGCCGCCGAGCT 11 1 GCF_000742135.1 2707622 + no
1513+
1 AAAACACCATGAGCAAACTCCAGCATATCGG 11 1 GCF_000017205.1 2490011 - no
1514+
1 AAAACACCATGCAAAAAACTTCTTTTAGAAA 11 1 GCF_000006945.2 1324151 - no
1515+
1 AAAACACCATGCAGCATGTCATAGCGCTGGA 11 1 GCF_003697165.2 422685 + no
1516+
</code></pre>
14921517
</li>
14931518
<li>
14941519
<p>Specify the mask.</p>

0 commit comments

Comments
 (0)