|
59 | 59 | "url" : "https://bioinf.shenwei.me/LexicMap/faqs/",
|
60 | 60 | "headline": "FAQs",
|
61 | 61 | "description": "Table of contents Table of contents Does LexicMap support short reads? Does LexicMap support fungi genomes? How’s the hardware requirement? Can I extract the matched sequences? How can I extract the upstream and downstream flanking sequences of matched regions? Why isn’t the pident 100% when aligning with a sequence from the reference genomes? Why is LexicMap slow for batch searching? Does LexicMap support short reads? LexicMap is mainly designed for sequence alignment with a small number of queries (gene\/plasmid\/virus\/phage sequences) longer than 200 bp by default.",
|
62 |
| - "wordCount" : "725", |
| 62 | + "wordCount" : "731", |
63 | 63 | "inLanguage": "en",
|
64 | 64 | "isFamilyFriendly": "true",
|
65 | 65 | "mainEntityOfPage": {
|
@@ -1794,16 +1794,15 @@ <h1>FAQs</h1>
|
1794 | 1794 | </a>
|
1795 | 1795 | </div>
|
1796 | 1796 | <p>LexicMap is mainly designed for sequence alignment with a small number of queries against a database with a huge number (up to 17 million) of genomes.
|
1797 |
| -There are some ways to improve the search speed.</p> |
| 1797 | +There are some ways to improve the search speed of <code>lexicmap search</code>.</p> |
1798 | 1798 | <ul>
|
1799 |
| -<li><code>lexicmap search</code> has a flag <code>-n/--top-n-genomes</code> to keep top N genome matches for a query (0 for all) in chaining phase. For queries with a large number of genome hits, a resonable value such as 1000 would reduce the computation time.</li> |
1800 |
| -<li><code>lexicmap search</code> has a flag <code>-w/--load-whole-seeds</code> to load the whole seed data into memory for |
1801 |
| -faster search. |
1802 |
| -<ul> |
1803 |
| -<li>For example, for ~85,000 GTDB representative genomes, the memory would be ~260 GB with default parameters.</li> |
1804 |
| -</ul> |
1805 |
| -</li> |
1806 |
| -<li><code>lexicmap search</code> also has a flag <code>--pseudo-align</code> to only perform pseudo alignment, which is slightly faster and uses less memory. |
| 1799 | +<li>Increasing the value of <code>--max-open-files</code> (default 512). You might need to <a |
| 1800 | + class="gdoc-markdown__link" |
| 1801 | + href="https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux" |
| 1802 | +>change the open files limit</a>.</li> |
| 1803 | +<li>Setting <code>-n/--top-n-genomes</code> to keep top N genome matches for a query (0 for all) in chaining phase. For queries with a large number of genome hits, a resonable value such as 1000 would reduce the computation time.</li> |
| 1804 | +<li>Setting <code>-w/--load-whole-seeds</code> to load the whole seed data into memory for faster search. For example, for ~85,000 GTDB representative genomes, the memory would be ~260 GB with default parameters.</li> |
| 1805 | +<li>Setting <code>--pseudo-align</code> to only perform pseudo alignment, which is slightly faster and uses less memory. |
1807 | 1806 | It can be used in searching with long and divergent query sequences like nanopore long-reads.</li>
|
1808 | 1807 | </ul>
|
1809 | 1808 | <p>
|
|
0 commit comments