add citation

shenwei356 · shenwei356 · commit 758eae6b083e · 2024-09-01T09:44:19.000+01:00
diff --git a/faqs/index.html b/faqs/index.html
@@ -59,7 +59,7 @@
       "url" : "https://bioinf.shenwei.me/LexicMap/faqs/",
       "headline": "FAQs",
       "description": "Table of contents Table of contents Does LexicMap support short reads? Does LexicMap support fungi genomes? How’s the hardware requirement? Can I extract the matched sequences? How can I extract the upstream and downstream flanking sequences of matched regions? Why isn’t the pident 100% when aligning with a sequence from the reference genomes? Why is LexicMap slow for batch searching? Does LexicMap support short reads? LexicMap is mainly designed for sequence alignment with a small number of queries (gene\/plasmid\/virus\/phage sequences) longer than 200 bp by default.",
-      "wordCount" : "725",
+      "wordCount" : "731",
       "inLanguage": "en",
       "isFamilyFriendly": "true",
       "mainEntityOfPage": {
@@ -1794,16 +1794,15 @@ <h1>FAQs</h1>
     </a>
 </div>
 <p>LexicMap is mainly designed for sequence alignment with a small number of queries against a database with a huge number (up to 17 million) of genomes.
-There are some ways to improve the search speed.</p>
+There are some ways to improve the search speed of <code>lexicmap search</code>.</p>
 <ul>
-<li><code>lexicmap search</code> has a flag <code>-n/--top-n-genomes</code> to keep top N genome matches for a query (0 for all) in chaining phase. For queries with a large number of genome hits, a resonable value such as 1000 would reduce the computation time.</li>
-<li><code>lexicmap search</code> has a flag <code>-w/--load-whole-seeds</code> to load the whole seed data into memory for
-faster search.
-<ul>
-<li>For example, for ~85,000 GTDB representative genomes, the memory would be ~260 GB with default parameters.</li>
-</ul>
-</li>
-<li><code>lexicmap search</code> also has a flag <code>--pseudo-align</code> to only perform pseudo alignment, which is slightly faster and uses less memory.
+<li>Increasing the value of <code>--max-open-files</code> (default 512). You might need to <a
+  class="gdoc-markdown__link"
+  href="https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux"
+>change the open files limit</a>.</li>
+<li>Setting <code>-n/--top-n-genomes</code> to keep top N genome matches for a query (0 for all) in chaining phase. For queries with a large number of genome hits, a resonable value such as 1000 would reduce the computation time.</li>
+<li>Setting <code>-w/--load-whole-seeds</code> to load the whole seed data into memory for faster search. For example, for ~85,000 GTDB representative genomes, the memory would be ~260 GB with default parameters.</li>
+<li>Setting <code>--pseudo-align</code> to only perform pseudo alignment, which is slightly faster and uses less memory.
 It can be used in searching with long and divergent query sequences like nanopore long-reads.</li>
 </ul>
 <p>
diff --git a/introduction/index.html b/introduction/index.html
@@ -62,7 +62,7 @@
       "url" : "https://bioinf.shenwei.me/LexicMap/introduction/",
       "headline": "Introduction",
       "description": "LexicMap is a nucleotide sequence alignment tool for efficiently querying gene, plasmid, viral, or long-read sequences against up to millions of prokaryotic genomes.\nTable of contents Table of contents Features Introduction Quick start Performance Indexing Searching Installation Algorithm overview Citation Support License Related projects Features LexicMap is scalable to up to millions of prokaryotic genomes. The sensitivity of LexicMap is comparable with Blastn. The alignment is fast and memory-efficient. LexicMap is easy to install, we provide binary files with no dependencies for Linux, Windows, MacOS (x86 and arm CPUs).",
-      "wordCount" : "1634",
+      "wordCount" : "1649",
       "inLanguage": "en",
       "isFamilyFriendly": "true",
       "mainEntityOfPage": {
@@ -2398,7 +2398,12 @@ <h1>Introduction</h1>
         <svg class="gdoc-icon gdoc_link"><use xlink:href="#gdoc_link"></use></svg>
     </a>
 </div>
-<p>In preparation.</p>
+<p>Wei Shen and Zamin Iqbal.
+(2024) LexicMap: efficient sequence alignment against millions of prokaryotic genomes.
+bioRxiv. <a
+  class="gdoc-markdown__link"
+  href="https://doi.org/10.1101/2024.08.30.610459"
+>https://doi.org/10.1101/2024.08.30.610459</a></p>
 <div class="flex align-center gdoc-page__anchorwrap">
     <h2 id="support"
     >
diff --git a/search/en.data.min.json b/search/en.data.min.json