Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: OpenGene/gencore
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.9.0
Choose a base ref
...
head repository: OpenGene/gencore
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref

Commits on Dec 4, 2018

  1. Copy the full SHA
    129eacd View commit details
  2. Copy the full SHA
    400b2af View commit details

Commits on Dec 5, 2018

  1. fix a possible mem leak

    sfchen committed Dec 5, 2018
    Copy the full SHA
    15e7c9d View commit details
  2. Copy the full SHA
    3ea81e8 View commit details
  3. restore O0 to O3 optimization

    sfchen committed Dec 5, 2018
    Copy the full SHA
    e97fb25 View commit details
  4. Copy the full SHA
    cc5483f View commit details
  5. revise Cluster::addRead

    sfchen committed Dec 5, 2018
    Copy the full SHA
    122e64f View commit details
  6. Copy the full SHA
    c7cd590 View commit details
  7. optimize createCluster

    sfchen committed Dec 5, 2018
    Copy the full SHA
    6ceeb9a View commit details

Commits on Dec 18, 2018

  1. Update README.md

    sfchen authored Dec 18, 2018
    Copy the full SHA
    fcf7010 View commit details

Commits on Dec 20, 2018

  1. Update README.md

    sfchen authored Dec 20, 2018
    Copy the full SHA
    150e26e View commit details
  2. Update README.md

    sfchen authored Dec 20, 2018
    Copy the full SHA
    76b80ce View commit details
  3. Update README.md

    sfchen authored Dec 20, 2018
    Copy the full SHA
    70d94b9 View commit details
  4. Update README.md

    sfchen authored Dec 20, 2018
    Copy the full SHA
    86af0cf View commit details

Commits on Apr 22, 2019

  1. Copy the full SHA
    dbf065b View commit details

Commits on Jun 23, 2019

  1. Copy the full SHA
    70e7e85 View commit details

Commits on Jun 25, 2019

  1. revise duplication plot

    sfchen committed Jun 25, 2019
    Copy the full SHA
    e522b47 View commit details
  2. report coverage in JSON

    sfchen committed Jun 25, 2019
    Copy the full SHA
    98e3dc5 View commit details
  3. report HTML

    sfchen committed Jun 25, 2019
    Copy the full SHA
    cb32f12 View commit details

Commits on Jun 26, 2019

  1. Copy the full SHA
    d544631 View commit details

Commits on Jun 27, 2019

  1. add BED coverage report

    sfchen committed Jun 27, 2019
    Copy the full SHA
    78b87c2 View commit details
  2. report bed coverage in HTML

    sfchen committed Jun 27, 2019
    Copy the full SHA
    6129779 View commit details
  3. improve HTML reporting

    sfchen committed Jun 27, 2019
    Copy the full SHA
    bcce033 View commit details

Commits on Jun 28, 2019

  1. update reporting and README

    sfchen committed Jun 28, 2019
    Copy the full SHA
    1d30443 View commit details
  2. update README

    sfchen committed Jun 28, 2019
    Copy the full SHA
    d47357d View commit details
  3. fix README

    sfchen committed Jun 28, 2019
    Copy the full SHA
    62330b6 View commit details

Commits on Jul 6, 2019

  1. output sorted BAM

    sfchen committed Jul 6, 2019
    Copy the full SHA
    77baf86 View commit details

Commits on Jul 8, 2019

  1. finish ordered output

    sfchen committed Jul 8, 2019
    Copy the full SHA
    4768652 View commit details
  2. Copy the full SHA
    40eac96 View commit details
  3. add a warning for SE data

    sfchen committed Jul 8, 2019
    Copy the full SHA
    83cec73 View commit details
  4. Update README.md

    sfchen authored Jul 8, 2019
    Copy the full SHA
    361f25e View commit details

Commits on Sep 5, 2019

  1. Copy the full SHA
    399ffb7 View commit details
  2. Copy the full SHA
    5fdb321 View commit details
  3. Copy the full SHA
    853305c View commit details
  4. Copy the full SHA
    c9839e3 View commit details

Commits on Dec 30, 2019

  1. Update README.md

    sfchen authored Dec 30, 2019
    Copy the full SHA
    2a113dd View commit details
  2. Update README.md

    sfchen authored Dec 30, 2019
    Copy the full SHA
    6207845 View commit details

Commits on Jan 21, 2020

  1. Update README.md

    sfchen authored Jan 21, 2020
    Copy the full SHA
    658fb51 View commit details

Commits on Mar 16, 2020

  1. better support for UMI prefix

    sfchen committed Mar 16, 2020
    Copy the full SHA
    c2e9e0c View commit details

Commits on Mar 18, 2020

  1. Copy the full SHA
    088ef5b View commit details

Commits on Aug 11, 2020

  1. add umi_diff_threshold

    sfchen committed Aug 11, 2020
    Copy the full SHA
    c5ec0f1 View commit details

Commits on Aug 31, 2020

  1. bam records comperator fixed to satisfy the strick weak orderings, th…

    …at is if b1 < b2, then b2 > b1, vice versa
    wulj2 committed Aug 31, 2020
    Copy the full SHA
    8598d4b View commit details

Commits on Sep 27, 2020

  1. Merge pull request #26 from wulj2/master

    bam records comperator fixed to satisfy the strick weak orderings, th…
    sfchen authored Sep 27, 2020
    Copy the full SHA
    eb0d5b6 View commit details

Commits on Oct 9, 2021

  1. support duplex merging

    sfchen committed Oct 9, 2021
    Copy the full SHA
    551454d View commit details
  2. Copy the full SHA
    e6d962e View commit details
  3. Copy the full SHA
    89b2bd5 View commit details
  4. Copy the full SHA
    3ad6896 View commit details
  5. Copy the full SHA
    51f9f23 View commit details
  6. update README

    sfchen committed Oct 9, 2021
    Copy the full SHA
    d51a6e4 View commit details

Commits on Oct 11, 2021

  1. update README

    sfchen committed Oct 11, 2021
    Copy the full SHA
    ea1a6e1 View commit details
Showing with 2,503 additions and 770 deletions.
  1. +112 −34 README.md
  2. +68 −14 src/bamutil.cpp
  3. +7 −6 src/bamutil.h
  4. +169 −0 src/bed.cpp
  5. +65 −0 src/bed.h
  6. +150 −535 src/cluster.cpp
  7. +4 −5 src/cluster.h
  8. +1 −1 src/common.h
  9. +65 −3 src/fastareader.cpp
  10. +12 −8 src/fastareader.h
  11. +225 −62 src/gencore.cpp
  12. +51 −8 src/gencore.h
  13. +609 −0 src/group.cpp
  14. +50 −0 src/group.h
  15. +485 −0 src/htmlreporter.cpp
  16. +48 −0 src/htmlreporter.h
  17. +2 −1 src/jsonreporter.cpp
  18. +1 −0 src/jsonreporter.h
  19. +19 −4 src/main.cpp
  20. +30 −11 src/options.cpp
  21. +15 −8 src/options.h
  22. +92 −42 src/pair.cpp
  23. +8 −0 src/pair.h
  24. +6 −6 src/reference.cpp
  25. +2 −2 src/reference.h
  26. +174 −11 src/stats.cpp
  27. +33 −9 src/stats.h
146 changes: 112 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,63 @@
A tool to GENerate COnsensus REads.
[![install with conda](
https://anaconda.org/bioconda/gencore/badges/version.svg)](https://anaconda.org/bioconda/gencore)
# gencore
An efficient tool to remove sequencing duplications and eliminate sequencing errors by generating consensus reads.
* [What's gencore](#whats-gencore)
* [A quick example](#a-quick-example)
* [Download, compile and install](#get-gencore)
* [Why to use gencore](#why-to-use-gencore)
* [Understand the output](#understand-the-output)
* [How it works](#how-it-works)
* [Command examples](#command-examples)
* [UMI format](#umi-format)
* [All options](#all-options)
* [Read/cite gencore paper](#citation)

# what's gencore?
`gencore` is a tool to generate consensus reads from paired-end data. It groups the reads derived from the same original DNA template, merges them and generates a consensus read, which contains much less errors than the original reads.
`gencore` is a tool for fast and powerful deduplication for paired-end next-generation sequencing (NGS) data. It is much faster and uses much less memory than Picard and other tools. It generates very informative reports in both HTML and JSON formats. It's based on an algorithm for `generating consensus reads`, and that's why it's named `gencore`.

This tool groups the reads of same origin by their mapping positions and unique molecular identifiers (UMI). It can run with or without UMI. If your FASTQ data has UMI integrated, you can use [fastp](https://github.com/OpenGene/fastp) to shift the UMI to read query names, and use `gencore` to generate consensus reads.
Basically, `gencore` groups the reads derived from the same original DNA template, merges them by generating a consensus read, which contains much less errors than the original reads.

`gencore` supports the data with unique molecular identifiers (UMI). If your FASTQ data has UMI integrated, you can use [fastp](https://github.com/OpenGene/fastp) to shift the UMI to read query names, and use `gencore` to generate consensus reads.

This tool can eliminate the errors introduced by library preparation and sequencing processes, and consenquently reduce the false positives for downstream variant calling. This tool can also be used to remove duplicated reads. Since it generates consensus reads from duplicated reads, it outputs much cleaner data than conventional duplication remover. ***Due to these advantages, it is especially useful for processing ultra-deep sequencing data for cancer samples.***

`gencore` accepts a sorted BAM/SAM with its corresponding reference fasta as input, and outputs an unsorted BAM/SAM.

# a quick example
# take a quick glance of the informative report
* Sample HTML report: http://opengene.org/gencore/gencore.html
* Sample JSON report: http://opengene.org/gencore/gencore.json

# try gencore to generate above reports
* BAM file for testing: http://opengene.org/gencore/input.sorted.bam
* BED file for testing: http://opengene.org/gencore/test.bed
* Reference genome file: [ftp://ftp.ncbi.nlm.nih.gov/sra/reports/Assembly/GRCh37-HG19_Broad_variant/Homo_sapiens_assembly19.fasta](ftp://ftp.ncbi.nlm.nih.gov/sra/reports/Assembly/GRCh37-HG19_Broad_variant/Homo_sapiens_assembly19.fasta)
* Command for testing:
```shell
gencore -i input.sorted.bam -o output.bam -r Homo_sapiens_assembly19.fasta -b test.bed --coverage_sampling=50000
```
* After the processing is finished, check the `gencore.html` and `gencore.json` in the working directory. The option `--coverage_sampling=50000` is to change the default setting `(coverage_sampling=10000)` to generate smaller report files by reducing the coverage sampling rate.

# quick examples
The simplest way
```shell
gencore -i input.sorted.bam -o output.bam -r hg19.fasta
```
With a BED file to specify the capturing regions
```shell
gencore -i input.sorted.bam -o output.bam -r hg19.fasta -b test.bed
```
Only output the fragment with >=2 supporting reads (useful for aggressive denoising)
```shell
gencore -i input.sorted.bam -o output.bam -r hg19.fasta -b test.bed -s 2
```

# get gencore
## install with Bioconda
[![install with conda](
https://anaconda.org/bioconda/gencore/badges/version.svg)](https://anaconda.org/bioconda/gencore)
```shell
conda install -c bioconda gencore
```
## download binary
This binary is only for Linux systems: http://opengene.org/gencore/gencore
```shell
@@ -48,14 +83,39 @@ sudo make install
As described above, gencore can eliminate the errors introduced by library preparation and sequencing processes, and consenquently it can greatly reduce the false positives for downstream variant calling. Let me show your an example.

## original BAM
![image](http://www.opengene.org/gencore/original.png)  
![image](http://www.opengene.org/gencore/original.png)

***This is an image showing a pileup of the original BAM. A lot of sequencing errors can be observed.***


## gencore processed BAM
![image](http://www.opengene.org/gencore/gencore.png)  
![image](http://www.opengene.org/gencore/processed.png)

***This is the image showing the result of gencore processed BAM. It becomes much cleaner. Cheers!***

# QC result reported by gencore
gencore also performs some quality control when processing deduplication and generating consensus reads. Basically it reports mapping rate, duplication rate, mismatch rate and some statisticical results. Especially, gencore reports the coverate statistics of input BAM file in genome scale, and in capturing regions (if a BED file is specified).

gencore reports the results both in HTML format and JSON format for manually checking and downstream analysis. See the examples of interactive [HTML](http://opengene.org/gencore/gencore.html) report and [JSON](http://opengene.org/gencore/gencore.json) reports.

## coverate statistics in genome scale
![image](http://www.opengene.org/gencore/coverage-genome.jpeg)

## coverate statistics in capturing regions
![image](http://www.opengene.org/gencore/coverage-bed.jpeg)

# understand the output
gencore outputs following files:
1. the processed BAM. In this BAM, each consensus read will have a tag `FR`, which means `forward read number of this consensus read`. If the read is a duplex consensus read, it will also has a tag `RR`, which means `reverse read number of this consensus read`. Downstream tools can read the `FR` and `RR` tags for further processing or variant calling. In following example, the first read is a single-stranded consensus sequence (only has a `FR` tag), and the second read is a duplex consensus sequence (has both `FR` and `RR` tags):
```
A00250:28:H2HC3DSX2:1:1117:3242:5321:UMI_GCT_CTA 161 chr12 25377992 60 143M = 25378431 582
GCAATAATTTTTGTCAGAAAAATGCATTAAATGAATAACAGAATTTCTGTTGGCTTTCTGGGTATTGTCTTTCTTTAATGAGACCTTTCTCCAGAAATAAACACATCCTCAAAAAAATTCTGCCAAAGTAAAATTCTTCAAAT FFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:1 MD:Z:34G108 AS:i:138 XS:i:21 RG:Z:cfdna FR:i:2
A00250:28:H2HC3DSX2:1:2316:10547:25989:UMI_AAC_AGA 161 chr12 25377993 60 143M = 25378462 612
CAATAATTTTTGTCAGAAAAATGCATTAAATGAATAACAGAATTTCTGTTGGCTTTCTGGGTATTGTCTTTCTTTAATGAGACCTTTCTCCAGAAATAAACACATCCTCAAAAAAATTCTGCCAAAGTAAAATTCTTCAAATA FFFFF:FFFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFF,FFFFFFFFFFFF,:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFF,!FF:F:F:F,FFF,F:FFFF,,:F,FFFF:FF:,:FF:F,:, NM:i:1 MD:Z:33G67A41 AS:i:133 XS:i:21 RG:Z:cfdna FR:i:1 RR:i:5
```
2. the JSON report. A json file contains lots of statistical informations.
3. the HTML report. A html file visualizes the information of the JSON.
4. the plain text output.

# how it works
important steps:
@@ -80,14 +140,18 @@ important steps:

| in overlapped region? | matched with its pair? | condition? | score for this base |
| - | - | - | - |
| NO | N/A | NO | 6 |
| YES | YES | this_qual + pair_qual >= 2 * MODERATE_QUAL | 8 |
| YES | YES | this_qual + pair_qual < 2 * MODERATE_QUAL | 7 |
| YES | NO | this_qual >= HIGH_QUAL, pair_qual <= LOW_QUAL | 5 |
| YES | NO | this_qual >= HIGH_QUAL, pair_qual >= HIGH_QUAL | 4 |
| YES | NO | LOW_QUAL < this_qual < HIGH_QUAL, LOW_QUAL < pair_qual < HIGH_QUAL | 3 |
| YES | NO | this_qual <= LOW_QUAL, pair_qual <= LOW_QUAL | 2 |
| YES | NO | this_qual <= LOW_QUAL, pair_qual >= HIGH_QUAL | 1 |
| NO | N/A | HIGH_QUAL <= this_qual | 8 |
| NO | N/A | MODERATE_QUAL <= this_qual < HIGH_QUAL | 6 |
| NO | N/A | LOW_QUAL <= this_qual < MODERATE_QUAL | 4 |
| NO | N/A | this_qual < LOW_QUAL | 2 |
| YES | YES | 2 * HIGH_QUAL <= this_qual + pair_qual | 12 |
| YES | YES | 2 * MODERATE_QUAL <= this_qual + pair_qual < 2 * HIGH_QUAL | 10 |
| YES | YES | 2 * LOW_QUAL <= this_qual + pair_qual < 2 * MODERATE_QUAL | 8 |
| YES | YES | this_qual + pair_qual < 2 * LOW_QUAL | 6 |
| YES | NO | HIGH_QUAL <= this_qual - pair_qual | 5 |
| YES | NO | MODERATE_QUAL <= this_qual - pair_qual < HIGH_QUAL | 3 |
| YES | NO | LOW_QUAL <= this_qual - pair_qual < MODERATE_QUAL | 1 |
| YES | NO | this_qual - pair_qual < LOW_QUAL | 0 |

In this table:
* `this_qual` is the quality of this base
@@ -96,6 +160,8 @@ In this table:
* `MODERATE_QUAL` is the quality threshold that can be specified by `--moderate_qual`
* `LOW_QUAL` is the quality threshold that can be specified by `--low_qual`

In the overlapped region, if a base and its pair are mismatched, its quality score will be adjusted to: `max(0, this_qual - pair_qual)`

# command examples
If you want to get very clean data, we can only keep the clusters with 2 or more supporting reads (recommended for ultra-deep sequencing with higher dup-rate):
```
@@ -107,39 +173,51 @@ gencore -i in.bam -o out.bam -r hg19.fa -s 1
```
(Recommanded) If you want to keep all the DNA fragments, and for each output read you want to discard all the low quality unoverlapped mutations to obtain a relative clean data (recommended for dup-rate < 50%):
```
gencore -i in.bam -o out.bam -r hg19.fa -s 1 --score_threshold=8
gencore -i in.bam -o out.bam -r hg19.fa -s 1 --score_threshold=9
```
If you want to obtain fewer but ultra clean data, you can both increase the `supporting_reads` and the `ratio_threshold`:
If you want to obtain fewer but ultra clean data, and your data has UMI, you can enable the `duplex_only` option, and increase the `supporting_reads` and the `ratio_threshold`:
```
gencore -i in.bam -o out.bam -r hg19.fa -s 3 --ratio_threshold=0.9
gencore -i in.bam -o out.bam -r hg19.fa --duplex_only -s 3 --ratio_threshold=0.9
```
Please note that only UMI-integrated paired-end data can be used to generate duplex consensuses sequences.

# UMI format
`gencore` supports calling consensus reads with or without UMI. Although UMI is not required, it is strongly recommended. If your FASTQ data has UMI integrated, you can use [fastp](https://github.com/OpenGene/fastp) to shift the UMI to read query names. 

The UMI should in the tail of query names. It can have a prefix like `UMI`, followed by an underscore. If the UMI has a prefix, it should be specified by `--umi_prefix` or `-u`. It can also have two parts, which are connected by an underscore.  
The UMI should in the tail of query names. It can have a prefix like `UMI`, followed by an underscore. If the UMI has a prefix, it should be specified by `--umi_prefix` or `-u`. If the UMI prefix is `umi` or `UMI`, it can be automatically detected. The UMI can also have two parts, which are connected by an underscore.  

## UMI examples
* Read query name = `"NB551106:8:H5Y57BGX2:1:13304:3538:1404:UMI_GAGCATAC"`, prefix = `"UMI"`, umi = `"GAGCATAC"`
* Read query name = `"NB551106:8:H5Y57BGX2:1:13304:3538:1404:UMI_GAGC_ATAC"`, prefix = `"UMI"`, umi = `"GAGC_ATAC"`
* Read query name = `"NB551106:8:H5Y57BGX2:1:13304:3538:1404:umi_GAGC_ATAC"`, prefix = `"umi"`, umi = `"GAGC_ATAC"`
* Read query name = `"NB551106:8:H5Y57BGX2:1:13304:3538:1404:GAGCATAC"`, prefix = `""`, umi = `"GAGCATAC"`
* Read query name = `"NB551106:8:H5Y57BGX2:1:13304:3538:1404:GAGC_ATAC"`, prefix = `""`, umi = `"GAGC_ATAC"`

# all options
```
options:
-i, --in input sorted bam/sam file. STDIN will be read from if it's not specified (string [=-])
-o, --out output bam/sam file. STDOUT will be written to if it's not specified (string [=-])
-r, --ref reference fasta file name (should be an uncompressed .fa/.fasta file) (string)
-u, --umi_prefix the prefix for UMI, if it has. None by default. Check the README for the defails of UMI formats. (string [=])
-s, --supporting_reads only output consensus reads/pairs that merged by >= <supporting_reads> reads/pairs. The valud should be 1~10, and the default value is 2. (int [=2])
-a, --ratio_threshold if the ratio of the major base in a cluster is less than <ratio_threshold>, it will be further compared to the reference. The valud should be 0.5~1.0, and the default value is 0.8 (double [=0.8])
-c, --score_threshold if the score of the major base in a cluster is less than <score_threshold>, it will be further compared to the reference. The valud should be 1~20, and the default value is 6 (int [=6])
--high_qual the threshold for a quality score to be considered as high quality. Default 30 means Q30. (int [=30])
--moderate_qual the threshold for a quality score to be considered as moderate quality. Default 20 means Q20. (int [=20])
--low_qual the threshold for a quality score to be considered as low quality. Default 15 means Q15. (int [=15])
-j, --json the json format report file name (string [=gencore.json])
--debug output some debug information to STDERR.
--quit_after_contig stop when <quit_after_contig> contigs are processed. Only used for fast debugging. Default 0 means no limitation. (int [=0])
-?, --help print this message
-i, --in input sorted bam/sam file. STDIN will be read from if it's not specified (string [=-])
-o, --out output bam/sam file. STDOUT will be written to if it's not specified (string [=-])
-r, --ref reference fasta file name (should be an uncompressed .fa/.fasta file) (string)
-b, --bed bed file to specify the capturing region, none by default (string [=])
-x, --duplex_only only output duplex consensus sequences, which means single stranded consensus sequences will be discarded.
--no_duplex don't merge single stranded consensus sequences to duplex consensus sequences.
-u, --umi_prefix the prefix for UMI, if it has. None by default. Check the README for the defails of UMI formats. (string [=auto])
-s, --supporting_reads only output consensus reads/pairs that merged by >= <supporting_reads> reads/pairs. The valud should be 1~10, and the default value is 1. (int [=1])
-a, --ratio_threshold if the ratio of the major base in a cluster is less than <ratio_threshold>, it will be further compared to the reference. The valud should be 0.5~1.0, and the default value is 0.8 (double [=0.8])
-c, --score_threshold if the score of the major base in a cluster is less than <score_threshold>, it will be further compared to the reference. The valud should be 1~20, and the default value is 6 (int [=6])
-d, --umi_diff_threshold if two reads with identical mapping position have UMI difference <= <umi_diff_threshold>, then they will be merged to generate a consensus read. Default value is 1. (int [=1])
-D, --duplex_diff_threshold if the forward consensus and reverse consensus sequences have <= <duplex_diff_threshold> mismatches, then they will be merged to generate a duplex consensus sequence, otherwise will be discarded. Default value is 2. (int [=2])
--high_qual the threshold for a quality score to be considered as high quality. Default 30 means Q30. (int [=30])
--moderate_qual the threshold for a quality score to be considered as moderate quality. Default 20 means Q20. (int [=20])
--low_qual the threshold for a quality score to be considered as low quality. Default 15 means Q15. (int [=15])
--coverage_sampling the sampling rate for genome scale coverage statistics. Default 10000 means 1/10000. (int [=10000])
-j, --json the json format report file name (string [=gencore.json])
-h, --html the html format report file name (string [=gencore.html])
--debug output some debug information to STDERR.
--quit_after_contig stop when <quit_after_contig> contigs are processed. Only used for fast debugging. Default 0 means no limitation. (int [=0])
-?, --help print this message
```
# citation
The gencore paper has been published in BMC Bioinformatics: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3280-9. If you used gencore in your research work, please cite it as:

Chen, S., Zhou, Y., Chen, Y. et al. Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data. BMC Bioinformatics 20, 606 (2019) doi:10.1186/s12859-019-3280-9
Loading