PacificBiosciences
diff --git a/‎README.md
+4-405 b/‎README.md
+4-405
diff --git a/‎docs/CNAME
+1 b/‎docs/CNAME
+1
diff --git a/‎docs/_config.yml
+28 b/‎docs/_config.yml
+28
diff --git a/‎docs/_sass/color_schemes/custom.scss
+5 b/‎docs/_sass/color_schemes/custom.scss
+5
diff --git a/‎docs/changelog.md
+66 b/‎docs/changelog.md
+66
diff --git a/‎docs/faq/accuracy-vs-passes.md
+53 b/‎docs/faq/accuracy-vs-passes.md
+53
diff --git a/‎docs/faq/bam-output.md
+55 b/‎docs/faq/bam-output.md
+55
diff --git a/‎docs/faq/bioconda-binary.md
+25 b/‎docs/faq/bioconda-binary.md
+25
diff --git a/‎docs/faq/chemistry.md
+56 b/‎docs/faq/chemistry.md
+56
diff --git a/‎docs/faq/index.md
+8 b/‎docs/faq/index.md
+8
diff --git a/‎docs/faq/kinetics.md
+25 b/‎docs/faq/kinetics.md
+25
diff --git a/‎docs/faq/licenses.md
+12 b/‎docs/faq/licenses.md
+12
diff --git a/‎docs/faq/low-complexity.md
+22 b/‎docs/faq/low-complexity.md
+22
@@ -0,0 +1 @@
+ccs.how
@@ -0,0 +1,28 @@
+remote_theme: armintoepfer/just-the-docs
+
+# Aux links for the upper right navigation
+aux_links:
+  "File an issue":
+    - "https://github.com/PacificBiosciences/pbbioconda/issues/new?template=bug_report.md"
+
+# Makes Aux links open in a new tab. Default is false
+aux_links_new_tab: true
+
+color_scheme: custom
+
+# Footer content
+# appears at the bottom of every page's main content
+footer_content: "THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED \"AS IS,\" WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES."
+
+# Footer last edited timestamp
+last_edit_timestamp: true # show or hide edit time - page must have `last_modified_date` defined in the frontmatter
+last_edit_time_format: "%b %e %Y at %I:%M %p" # uses ruby's time format: https://ruby-doc.org/stdlib-2.7.0/libdoc/time/rdoc/Time.html
+
+# Footer "Edit this page on GitHub" link text
+gh_edit_link: false # show or hide edit this page link
+
+
+title: "CCS Docs"
+tagline: "Generate Highly Accurate Single-Molecule Consensus Reads (HiFi Reads)"
+
+search_enabled: false
@@ -0,0 +1,5 @@
+$link-color: $blue-000;
+$content-width: 900px;
+$nav-width: 224px;
+$nav-width-md: 200px;
+$sidebar-color: $grey-lt-000;
@@ -0,0 +1,66 @@
+---
+layout: default
+title: Changelog
+nav_order: 99
+---
+
+# Version changelog
+
+**5.0.0**
+   * SMRT Link v10.0 release
+   * Add `--hifi-kinetics` to average kinetic information for polished reads
+   * Add `--all-kinetics` to add kinetic information for all ZMWs, except for unpolished draft consensus
+   * Add `--subread-fallback`, combined with `--all`, use a subread instead of a draft as representative consensus
+   * Use sDUST to identify tandem repeats
+   * Output HiFi yield (>= Q20) and Unique Molecular Yield as INFO log
+   * Set `--top-passes 60` default
+   * Abort if chemistry information is missing in BAM header
+   * Add non-blocking temporary file writing
+   * Add `--input-buffer` to smooth IO fluctations
+   * Add `--all` to generate one representative read per ZMW
+   * Reuse prefix of output file for report files to avoid unintentional clobbering
+   * Add `zmw_metrics.json`, metrics about each ZMW; file name can be set with `--metrics-json`
+   * Add JSON output of ccs_reports via `--report-json`
+   * Add `--suppress-reports` to suppress generating default report and metric files
+
+4.2.0
+   * SMRT Link v9.0 release
+   * Speed improvements
+   * Minor yield improvements, by requiring a percentage of subreads mapping back to draft instead of `--min-passes`
+   * Add effective coverage `ec` tag
+   * Lowering `--min-passes` does no longer reduce yield
+   * Add `--batch-size` to better saturate machine with high core counts
+   * Simplify log output
+   * Fix bug in predicted accuracy calculation
+   * Improved `ccs_report.txt` summary
+
+4.1.0
+   * Minor speed improvements
+   * Fix `--by-strand` logic, see more [here](https://ccs.how/faq/mode-by-strand)
+   * Allow vanilla `.xml` output without specifying dataset type
+   * Compute wall start/end for each output read (future basecaller functionality)
+
+4.0.0
+   * SMRT Link v8.0 release
+   * Speed improvements
+   * Removed support for legacy python Genomic Consensus, please use [gcpp](https://github.com/PacificBiosciences/gcpp)
+   * New command-line interface
+   * New report file
+
+3.4.1
+   * SMRT Link v7.0 release
+   * Log used chemistry model to INFO level
+
+3.4.0
+   * Fixes to unpolished mode for IsoSeq
+   * Improve runtime when `--minPredictedAccuracy` has been increased
+
+3.3.0
+   * Add a windowing approach to reduce computational complexity from quadratic to linear
+   * Improve multi-threading framework to increase throughput
+   * Enhance XML output, propagate `CollectionMetadata`
+   * Includes latest chemistry parameters
+
+3.1.0
+   * Add `--maxPoaCoverage` to decrease runtime for unpolished output, special parameter for IsoSeq workflow
+   * Chemistry parameters for SMRT Link v6.0
@@ -0,0 +1,53 @@
+---
+layout: default
+parent: FAQ
+title: Accuracy vs. passes
+---
+
+## What impacts the number and quality of HiFi reads that are generated?
+The longer the polymerase read gets, more passes of the SMRTbell
+are produced and consequently more evidence is accumulated per molecule.
+This increase in evidence translates into higher consensus accuracy, as
+depicted in the following plot:
+
+<p align="center"><img width="600px" src="../img/ccs-acc.png"/></p>
+
+## How is number of passes computed?
+Each read is annotated with a `np` tag that contains the number of
+full-length subreads used for polishing. Full-length subreads are flanked by
+adapters and thus cover the full insert.
+Since the first version of _ccs_, number of passes has only accounted for
+full-length subreads. In version v3.3.0 windowing has been added, which
+takes the minimum number of full-length subreads across all windows.
+Starting with version v4.0.0, minimum has been replaced with mode to get a
+better representation across all windows. Only subreads that pass the subread
+length filter (please see next FAQ about filters) and were not dropped during
+polishing are counted.
+
+Similarly, the tag `ec` reports effective coverage, the average subread coverage
+across all windows. This metric includes all subreads, independent of being
+full- or partial-length subreads, that pass length filters and did not fail
+during polishing. In most cases `ec` will be roughly `np + 1`.
+
+## Why do I get more yield if I increase `--min-passes`?
+For versions newer than 3.0.0 and older than 4.2.0, we required that after
+draft generation, at least `--min-passes` subreads map back to the draft.
+Imagine the following scenario, a ZMW with 10 subreads generates a draft to which
+only a single subread aligns. This draft is of low quality and does not
+represent the ZMW, yet if you ask for `--min-passes 1`, this low-quality draft
+is being used. Starting with version 4.2.0, we switch to an additional
+percentage threshold of more than 50% aligning subreads to avoid this problem.
+This fixes the majority of discrepancies for fewer than three passes.
+
+Why do we have this problem at all, shouldn't the draft stage be robust enough?
+Robustness comes with inherent speed trade-offs. We have a cascade of different draft
+generators, from very fast and unstable to slow and robust. If a ZMW fails
+to generate a draft for a fast generator, it falls back multiple times until it
+reaches the slower and more robust generator. This approach is still much faster
+than always relying on the robust generator.
+
+## Is there an upper limit on number of passes used?
+Per default, _ccs_ uses at most the top 60 full-length passes after sorting
+by median length.
+Beyond this threshold, it has been shown that quality does not improve.
+You can change this limit with `--top-passes`, whereas `0` means unlimited.
@@ -0,0 +1,55 @@
+---
+layout: default
+parent: FAQ
+title: BAM output
+---
+
+## What BAM tags are generated?
+
+|  Tag  | Type  | Description |
+| :---: | :---: | ----------- |
+| `ec`  | `f`   | [Effective coverage](/faq/accuracy-vs-passes#how-is-number-of-passes-computed)|
+| `fi`  | `B,C` | [Forward IPD (codec V1)](/faq/kinetics)|
+| `fn`  | `i`   | [Forward number of complete passes (zero or more)](/faq/kinetics)|
+| `fp`  | `B,C` | [Forward PulseWidth (codec V1)](/faq/kinetics)|
+| `np`  | `i`   | [Number of full-length subreads](/faq/accuracy-vs-passes#how-is-number-of-passes-computed)|
+| `ri`  | `B,C` | [Reverse IPD (codec V1)](/faq/kinetics)|
+| `rn`  | `i`   | [Reverse number of complete passes (zero or more)](/faq/kinetics)|
+| `rp`  | `B,C` | [Reverse PulseWidth (codec V1)](/faq/kinetics)|
+| `rq`  | `f`   | [Predicted average read accuracy](/how-does-ccs-work#9-qv-calculation)|
+| `sn`  | `B,f` | Signal-to-noise ratios for each nucleotide|
+| `zm`  | `i`   | ZMW hole number |
+| `RG`  | `z`   | Read group |
+
+
+## How does the output BAM file size scale with yield?
+For each base, the output BAM file size scales as follows
+ - 0.5 byte/base for the actual base (4-bit encoding)
+ - 1 byte/base for the QV
+ - 1 byte/base for the forward PW
+ - 1 byte/base for the forward IPD
+ - 1 byte/base for the reverse PW
+ - 1 byte/base for the reverse IPD
+
+For a normal _ccs_ run without kinetics, the upper bound is 1.5 bytes/base.
+If _ccs_ is run **with** kinetics, the upper bound is 5.5 bytes/base.
+
+Per-read meta information add a fixed amount of 32 bytes per read:
+ - `ec`,`rq` : float, each 4 bytes
+ - `sn`: float array, 4x4 bytes
+ - `np`, `zm`: int32_t, 4 byte
+ - `RG`: string of length 8, 8x1 bytes
+
+The actual output BAM that _ccs_ generates is compressed. Compression is
+data-dependent and because of that, upper bounds can't be provided.
+For a 19kb insert library and 30h movie time, the _ccs_ BAM files scale on
+average with:
+
+| BAM name             |        Options                             | Bytes/<br>Base | Bytes/<br>HiFiBase | Example<br>(GBytes) | Example<br>(GBytes) |
+| -------------------- | ------------------------------------------ | :------------: | :----------------: | :-----------------: | :-----------------: |
+| hifi.bam             |                                            | 0.7            | 0.7                | 100                 | 63                  |
+| hifi.hifikin.bam     | `--hifi-kinetics`                          | 3.7            | 3.7                | 528                 | 336                 |
+| reads.bam            | `--all`                                    | 0.55           | 1.1                | 157                 | 100                 |
+| reads.hifikin.bam    | `--all --hifi-kinetics`                    | 2.3            | 4.5                | 642                 | 409                 |
+| reads.allkin.bam     | `--all --all-kinetics`                     | 2.9            | 5.7                | 814                 | 518                 |
+| reads.allkin.sub.bam | `--all --all-kinetics --subread-fallback`  | 3.0            | 5.8                | 828                 | 527                 |
@@ -0,0 +1,25 @@
+---
+layout: default
+parent: FAQ
+title: Bioconda binary
+---
+
+## The binary does not work on my linux system!
+Contrary to official SMRT Link releases, the `ccs` binary distributed via bioconda
+is tuned for performance while sacrificing backward compatibility.
+We are aware of following errors and limitations. If yours is not listed, please
+file an issue on our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda).
+
+**`Illegal instruction`** Your CPU is not supported.
+A modern (post-2008) CPU with support for
+[SSE4.1 instructions](https://en.wikipedia.org/wiki/SSE4#SSE4.1) is required.
+SMRT Link also has this requirement.
+
+**`FATAL: kernel too old`** Your OS or rather your kernel version is not supported.
+Since CCS v4.2 we also ship a second binary via bioconda `ccs-alt`, which does
+not bundle a newer `glibc`. Please use this alternative binary.
+
+For CCS v5.0, we offer two binaries in bioconda:
+
+ * `ccs`, statically links `glibc` v2.32 and `mimalloc` v1.3.0.
+ * `ccs-alt`, was build by dynamically linking `glibc` v2.12, but statically links `mimalloc` v1.3.0.
@@ -0,0 +1,56 @@
+---
+layout: default
+parent: FAQ
+title: Chemistry
+---
+
+## Help! I am getting "Unsupported ..."!
+If you encounter the error `Unsupported chemistries found: (...)` or
+`unsupported sequencing chemistry combination`, your _ccs_ binaries do not
+support the used sequencing chemistry kit, from here on referred to as "chemistry".
+This may be because we removed support of an older chemistry or your binary predates
+release of the used chemistry.
+This is unlikely to happen with _ccs_ from SMRT Link installations, as SMRT Link
+is able to automatically update and install new chemistries.
+Thus, the easiest solution is to always use _ccs_ from the SMRT Link version that
+shipped with the release of the sequencing chemistry kit.
+
+**Old chemistries:**
+With _ccs_ 4.0.0, we have removed support for the last RSII chemistry `P6-C4`.
+The only option is to downgrade _ccs_ with `conda install pbccs==3.4`.
+
+**New chemistries:**
+It might happen that your _ccs_ version predates the sequencing chemistry kit.
+To fix this, install the latest version of _ccs_ with `conda update --all`.
+If you are an early access user, follow the [monkey patch tutorial](/faq/chemistry#monkey-patch-ccs-to-support-additional-sequencing-chemistry-kits).
+
+## Monkey patch _ccs_ to support additional sequencing chemistry kits
+Please create a directory that is used to inject new chemistry information
+into _ccs_:
+
+```sh
+mkdir -p /path/to/persistent/dir/
+cd /path/to/persistent/dir/
+export SMRT_CHEMISTRY_BUNDLE_DIR="${PWD}"
+mkdir -p arrow
+```
+
+Execute the following step by step instructions to fix the error you are observing
+and afterwards proceed using _ccs_ as you would normally do. Additional chemistry
+information is automatically loaded from the `${SMRT_CHEMISTRY_BUNDLE_DIR}`
+environmental variable.
+
+### Error: "unsupported sequencing chemistry combination"
+Please download the latest out-of-band `chemistry.xml`:
+
+```sh
+wget https://raw.githubusercontent.com/PacificBiosciences/pbcore/develop/pbcore/chemistry/resources/mapping.xml -O "${SMRT_CHEMISTRY_BUNDLE_DIR}"/chemistry.xml
+```
+
+### Error: "Unsupported chemistries found: (...)"
+Please get the latest consensus model `.json` from PacBio and
+copy it to:
+
+```sh
+cp /some/download/dir/model.json "${SMRT_CHEMISTRY_BUNDLE_DIR}"/arrow/
+```
@@ -0,0 +1,8 @@
+---
+layout: default
+title: FAQ
+nav_order: 4
+has_children: true
+---
+
+# FAQ
@@ -0,0 +1,25 @@
+---
+layout: default
+parent: FAQ
+title: Kinetics
+---
+
+## Is it possible to use HiFi reads to call base modifications?
+Base modifications can be inferred from per-base pulse width (PW) and
+inter-pulse duration (IPD) kinetics.
+Running _ccs_ with `--hifi-kinetics` generates averaged kinetic information
+for polished reads, independently for both strands of the insert.
+Forward is defined with respect to the orientation represented in ``SEQ`` and
+is considered to be the native orientation. As with other PacBio-specific
+tags, aligners will not re-orient these fields.
+
+Minor cases exist where a certain orientation may get filtered out entirely
+from a ZMW, preventing valid values from being passed for that record. In
+these cases, empty lists will be passed for the respective record/orientation
+and number of passes will be set to zero.
+
+In order to facilitate the use of HiFi reads with base modifications workflows,
+we have added an executable in pbbam called `ccs-kinetics-bystrandify` which
+creates a pseudo `--by-strand` BAM with corresponding `pw` and `ip` tags
+that imitates a normal, unaligned subreads BAM. You can install pbbam from
+Bioconda by calling `conda install pbbam`.
@@ -0,0 +1,12 @@
+---
+layout: default
+parent: FAQ
+title: Licenses
+---
+
+# Licenses
+PacBio® tool _ccs_, distributed via Bioconda, is licensed under
+[BSD-3-Clause-Clear](https://spdx.org/licenses/BSD-3-Clause-Clear.html)
+and statically links GNU C Library v2.32 licensed under [LGPL](https://spdx.org/licenses/LGPL-2.1-only.html).
+Per LPGL 2.1 subsection 6c, you are entitled to request the complete
+machine-readable work that uses glibc in object code.
@@ -0,0 +1,22 @@
+---
+layout: default
+parent: FAQ
+title: Low complexity
+---
+
+## Does CCS dislike low-complexity regions?
+Low-complexity comes in many shapes and forms.
+A particular challenge for _ccs_ are highly enriched tandem repeats, like
+hundreds of copies of `AGGGGT`.
+Prior _ccs_ v5.0, inserts with many copies of a small repeat likely not generate
+a consensus sequence.
+Since _ccs_ v5.0, every ZMW is tested if it contains a tandem repeat
+of length `--min-tandem-repeat-length 1000`.
+For this, we use [symmetric DUST](https://doi.org/10.1089/cmb.2006.13.1028)
+and in particular this [sdust](https://github.com/lh3/sdust) implementation,
+but slightly modified.
+If a ZMW is flagged as a tandem repeat, internally `--disable-heuristics`
+is activated for only this ZMW, and various filters that are known to exclude
+low-complexity sequences are disabled.
+This recovers most of the low-complexity consensus sequences, without impacting
+run time performance.