diff --git a/README.md b/README.md
index 889612d..c4c3cd9 100644
--- a/README.md
+++ b/README.md
@@ -2,14 +2,15 @@
Paraphase
-# Paraphase: HiFi-based caller for highly homologous genes
+HiFi-based caller for highly similar paralogous genes
Many medically relevant genes fall into 'dark' regions where variant calling is limited due to high sequence homology with paralogs or pseudogenes. Paraphase is a Python tool that takes HiFi aligned BAMs as input (whole-genome or enrichment), phases haplotypes for genes of the same family, determines copy numbers and makes phased variant calls.

-Paraphase takes all reads from a gene family, realigns to just the gene of interest and then phases them into haplotypes. This solves the problem of alignment difficulty due to sequence homology and allows us to examine all copies of genes in a gene family and call copy number changes and other variants.
+Paraphase takes all reads from a gene family, realigns to one representative gene of the family and then phases them into haplotypes. This approach bypasses the error-prone process of aligning reads to multiple similar regions and allows us to examine all copies of genes in a gene family. This gene-family-centered approach allows Paraphase to perform well when there is a copy number difference between an individual and the reference, as is often the case in segmental duplications.
+Futhermore, this approach also streamlines sequence comparisons between genes within the same family, making it straightforward to conduct analyses such as identifying non-allelic gene conversions.
-Paraphase supports 161 segmental duplication [regions](docs/regions.md) in GRCh38. Among these, there are 11 medically relevant regions that are also supported in GRCh37/hg19:
+Paraphase supports 160 segmental duplication [regions](docs/regions.md) in GRCh38. Among these, there are 11 medically relevant regions that are also supported in GRCh37/hg19:
- SMN1/SMN2 (spinal muscular atrophy)
- RCCX module
- CYP21A2 (21-Hydroxylase-Deficient Congenital Adrenal Hyperplasia)
@@ -24,6 +25,9 @@ Paraphase supports 161 segmental duplication [regions](docs/regions.md) in GRCh3
- CFC1 (heterotaxy syndrome)
- OPN1LW/OPN1MW (color vision deficiencies)
- HBA1/HBA2 (Alpha-Thalassemia)
+- GBA (Gaucher disease and Parkison's disease)
+- CYP11B1/CYP11B2 (Glucocorticoid-remediable aldosteronism)
+- CFH/CFHR1/CFHR2/CFHR3/CFHR4 (large deletions/duplications, atypical hemolytic uremic syndrome and age-related macular degeneration)
Please check out our [paper](https://www.cell.com/ajhg/fulltext/S0002-9297(23)00001-0) on its application to the gene SMN1 for more details about Paraphase.
Chen X, Harting J, Farrow E, et al. Comprehensive SMN1 and SMN2 profiling for spinal muscular atrophy analysis using long-read PacBio HiFi sequencing. The American Journal of Human Genetics. 2023;0(0). doi:10.1016/j.ajhg.2023.01.001
@@ -78,32 +82,28 @@ Please note that the input BAM should be one that's aligned to the ENTIRE refere
Optional parameters:
- `-g`: Region(s) to analyze, separated by comma. All supported [regions](docs/regions.md) will be analyzed if not specified. Please use region name, i.e. first column in the doc.
- `-t`: Number of threads.
+- `-p`: Prefix of output files when the input is a single sample, i.e. use with `-b`. If not provided, prefix will be extracted from the name of the input BAM.
- `--genome`: Genome reference build. Default is `38`. If `37` or `19` is specified, Paraphase will run the analysis for GRCh37 or hg19, respectively (note that only 11 medically relevant [regions](docs/regions.md) are supported now for GRCh37/hg19).
-- `gene1only`: If specified, variants calls will be made against the main gene only for SMN1, PMS2, STRC, NCF1 and IKBKG, see [below](#interpreting-the-output).
+- `--gene1only`: If specified, variants calls will be made against the main gene only for SMN1, PMS2, STRC, NCF1 and IKBKG, see more information [here](docs/vcf.md).
- `--novcf`: If specified, no VCF files will be produced.
- `--samtools`: path to samtools. If the paths to samtools or minimap2 are not already in the PATH environment variable, they can be provided through the `--samtools` and `--minimap2` parameters.
- `--minimap2`: path to minimap2
## Interpreting the output
-Paraphase produces a few output files in the directory specified by `-o`, with the sample ID as the prefix.
+Paraphase produces a few output files in the directory specified by `-o`, with the specified or default prefix.
-1. `.vcf` in `sampleID_vcfs` folder. A VCF file is written for each haplotype per gene family. There is also a `_variants.vcf` file containing merged variants from all haplotypes for each gene family. Note that this is not a diploid vcf as there are usually more than two copies of genes in a gene family in a sample.
+1. `.vcf` in `${prefix}_paraphase_vcfs` folder. A VCF file is written for each region (gene family). More descriptions on the VCF can be found [here](docs/vcf.md).
-As genes of the same family can be highly similar to each other in sequence and not easy to differentiate (at the sequence level or even at the functional level), variant calls are made against one selected "main" gene from the gene family (e.g. the functional gene is selected when the family has a gene and a pseudogene). In this way, all copies of the gene family can be evaluated for pathogenic variants and one can calculate the copy number of the functional genes in the family and hence infer the disease/carrier status.
+2. `.paraphase.bam`: This BAM file can be loaded into IGV for visualization of haplotypes (group reads by `HP` tag and color alignments by `YC` tag). All haplotypes are aligned against the main gene of interest. Tutorials/Examples are provided for medically relevant genes (See below).
-Exceptions are SMN1 (paralog SMN2), PMS2 (pseudogene PMS2CL), STRC (pseudogene STRCP1), NCF1 (pseudogenes NCF1B and NCF1C) and IKBKG (pseudogene IKBKGP1), where gene differentiation is possible. In these families, haplotypes are assigned to each gene in the family, i.e. gene or paralog/pseudogene, and variants are called against the gene (or paralog/pseudogene) for the gene (or paralog/pseudogene) haplotypes, respectively. Variants calls can be made against the main gene only for these five families if `--gene1only` is specified.
-
-2. `_realigned_tagged.bam`: This BAM file can be loaded into IGV for visualization of haplotypes (group reads by `HP` tag and color alignments by `YC` tag). All haplotypes are aligned against the main gene of interest. Tutorials/Examples are provided for medically relevant genes (See below).
-
-3. `.json`: Output file summarizing haplotypes and variant calls for each gene family in each sample. In brief, a few generally used fields are explained below.
+3. `.paraphase.json`: Output file summarizing haplotypes and variant calls for each gene family in each sample. In brief, a few generally used fields are explained below.
- `final_haplotypes`: phased haplotypes for all gene copies in a gene family
- `total_cn`: total copy number of the family (sum of gene and paralog/pseudogene)
- `two_copy_haplotypes`: haplotypes that are present in two copies based on depth. This happens when (in a small number of cases) two haplotypes are identical and we infer that there exist two of them instead of one by checking the read depth.
- `haplotype_details`: lists information about each haplotype
- `boundary`: the boundary of the region that is resolved on the haplotype. This is useful when a haplotype is only partially phased.
- `alleles_final`: haplotypes phased into alleles. This is possible when the segmental duplication is in tandem.
-- `region_depth`: median depth of the gene family (include all copies of gene and paralog/pseudogene)
Tutorials/Examples are provided for interpreting the `json` output and visualizing haplotypes for medically relevant genes listed below:
- [SMN1/SMN2](docs/SMN1_SMN2.md)
@@ -116,3 +116,6 @@ Tutorials/Examples are provided for interpreting the `json` output and visualizi
- [F8](docs/F8.md)
- [NEB](docs/NEB.md)
- [NCF1](docs/NCF1.md)
+- [GBA](docs/GBA.md)
+- [CFH gene cluster](docs/CFH.md)
+
diff --git a/docs/CFH.md b/docs/CFH.md
new file mode 100644
index 0000000..0ad8363
--- /dev/null
+++ b/docs/CFH.md
@@ -0,0 +1,25 @@
+# CFH gene cluster
+
+The CFH gene cluster is a ~250kb genomic region that contains several genes CFH/CFHR1/CFHR2/CFHR3/CFHR4. This region is divided into two pairs of homology regions, where unequal crossing overs can lead to large deletions or duplications. These SVs are related to diseases such as atypical hemolytic uremic syndrome and age-related macular degeneration. Some of these SVs are quite common in the population.
+
+Paraphase resolves gene copies in two homology regions (named `CFH` and `CFHR3` in the config), and summarizes results under `CFHclust` in the `json` file. To analyze this region specifically, use `-g CFH,CFHR3` in the command. Note that only SVs/fusions are called in this region. No VCF is produced, as the sequence similarity is low enough so variant calling should be accurate using standard variant callers.
+
+The `CFH` region contains the end of the CFH gene and the intergenic region between CFH and CFHR3. The `CFHR3` region contains part of CFHR3 all the way to part of CFHR1. In the genome, the order of these homology regions is as follows (also see examples below):
+
+`CFH`, followed by `CFHR3`, followed by `CFH(paralog)`, and followed by `CFHR3(paralog)`.
+
+## Fields in the `json` file
+
+- `fusions_called`: fusions created by deletion or duplication of the region betweeen two breakpoints. Reports the SV type (deletion or duplication) and the breakpoint coordinates.
+
+## Visualizing haplotypes
+
+To visualize phased haplotypes, load the output bam file in IGV, group reads by the `HP` tag and color alignments by `YC` tag.
+
+Reads in gray are either unassigned or consistent with more than one possible haplotype. When two haplotypes are identical over a region, there can be more than one haplotype consistent with a read, and the read is randomly assigned to a haplotype and colored in gray.
+
+
+
+- The top panel shows a sample with no CNV. Left is the `CFH` region/module analyzed by Paraphase and the right is the `CFHR3` region. Paraphase resolves four copies for each region. In either region, two of the four copies are shorter with more mismatches, representing the paralogs that can no longer align beyond the end of the homology region.
+- The middle panel shows a sample with a deletion (`CFH_hap1`) in the `CFH` region (left), where the 5' end is longer and the 3' end is shorter. The red arrow marks the deletion breakpoint. The other side of the breakpoint can be found in the `fusions_called` field under `CFHclust` in the `json`. The `CFHR3` region is covered by the deletion so there are also only three copies found in this region. This SV is a deletion of CFHR3+CFHR1.
+- The bottom panel shows a sample with a different deletion (`CFHR3_hap2`) in the `CFHR3` region (right), where the 5' end is longer and the 3' end is shorter. The red arrow marks the deletion breakpoint. The other side of the breakpoint can be found in the `fusions_called` field under `CFHclust` in the `json`. The `CFH` region (the paralogous side) is covered by the deletion so there are also only three copies found in this region. This SV is a deletion of CFHR1+CFHR4 (CFHR4 is the gene downstream of CFHR1).
diff --git a/docs/GBA.md b/docs/GBA.md
new file mode 100644
index 0000000..57f97e9
--- /dev/null
+++ b/docs/GBA.md
@@ -0,0 +1,19 @@
+# GBA
+
+Pathogenic variants in GBA cause Gaucher disease and leads to an increased risk of Parkinson’s disease. GBA has sequence homology with its pseudogene GBAP1 particularly in its last three exons, where gene conversion can bring pseudogene-like variants into GBA. Unequal crossing overs can result in fusion genes between GBA and GBAP1.
+
+## Fields in the `json` file
+
+- `fusions_called`: fusions created by deletion or duplication of the region betweeen two breakpoints. Reports the SV type (deletion or duplication) and the breakpoints.
+
+## Visualizing haplotypes
+
+To visualize phased haplotypes, load the output bam file in IGV, group reads by the `HP` tag and color alignments by `YC` tag.
+
+Reads in gray are either unassigned or consistent with more than one possible haplotype. When two haplotypes are identical over a region, there can be more than one haplotype consistent with a read, and the read is randomly assigned to a haplotype and colored in gray.
+
+
+
+- The top panel shows a sample with two copies of GBA and two copies of GBAP1. The GBAP1 copies are shorter because they can no longer align beyond the end of the homology region.
+- The middle panel shows a sample with a deletion (`hap3`), where the 5' end is consistent with the shorter GBAP1 and the 3' end is consistent with the longer GBA. The red arrow marks the deletion breakpoint. The other side of the breakpoint can be found in the `fusions_called` field in the `json`.
+- The bottom panel shows a sample with a duplicaton (`hap2`), where the 5' end is consistent with the longer GBA and the 3' end is consistent with the shorter GBAP1. The red arrow marks the duplication breakpoint. The other side of the breakpoint can be found in the `fusions_called` field in the `json`.
diff --git a/docs/figures/CFH.png b/docs/figures/CFH.png
new file mode 100755
index 0000000..5ad5d80
Binary files /dev/null and b/docs/figures/CFH.png differ
diff --git a/docs/figures/GBA.png b/docs/figures/GBA.png
new file mode 100755
index 0000000..49c1901
Binary files /dev/null and b/docs/figures/GBA.png differ
diff --git a/docs/regions.md b/docs/regions.md
index 5c904c1..e273478 100755
--- a/docs/regions.md
+++ b/docs/regions.md
@@ -1,164 +1,162 @@
-| Region name | Genes encoded | Supported in GRCh37/hg19 |
-| :---------- | :--------------------------------------------: | :----------------------: |
-| smn1 | SMN1,SMN2 | x |
-| pms2 | PMS2 | x |
-| rccx | CYP21A2,C4A,C4B,TNXB | x |
-| strc | STRC | x |
-| cfc1 | CFC1,CFC1B | x |
-| ikbkg | IKBKG | x |
-| ncf1 | NCF1 | x |
-| neb | NEB | x |
-| f8 | F8A1,F8A2,F8A3,H2AB1,H2AB2,H2AB3 | x |
-| opn1lw | OPN1LW,OPN1MW,OPN1MW2,OPN1MW3,TEX28 | x |
-| hba | HBA1,HBA2 | x |
-| SSX2 | SSX2,SSX2B | |
-| SSX4 | SSX4,SSX4B | |
-| CR1 | CR1 | |
-| CENPVL2 | CENPVL1,CENPVL2 | |
-| DMRTC1 | DMRTC1,DMRTC1B,FAM236A,FAM236B,FAM236C,FAM236D | |
-| XAGE1A | XAGE1A,XAGE1B | |
-| TRIM49D1 | TRIM49D1,TRIM49D2 | |
-| CXorf51A | CXorf51A,CXorf51B | |
-| DDT | DDT,DDTL | |
-| H3C14 | H3C14,H3C15,H2AC18,H2AC19,H4C14,H4C15 | |
-| MBD3L2 | MBD3L2,MBD3L2B | |
-| DEFA1 | DEFA1,DEFA1B,DEFA3 | |
-| CCZ1 | CCZ1 | |
-| CSAG2 | CSAG2,CSAG3,MAGEA2,MAGEA2B,MAGEA3,MAGEA6 | |
-| OR4M2 | OR4M2,OR4N4,OR4M2B,OR4N4C | |
-| RIMBP3B | RIMBP3B,RIMBP3,RIMBP3C | |
-| AMY2A | AMY2A | |
-| AGAP9 | AGAP9 | |
-| ANXA8 | ANXA8,ANXA8L1 | |
-| C21orf140 | C21orf140 | |
-| CDY1 | CDY1,CDY1B | |
-| CDY2A | CDY2A,CDY2B | |
-| HSFY1 | HSFY1,HSFY2 | |
-| PRY | PRY,PRY2 | |
-| BPY2 | BPY2,BPY2B,BPY2C | |
-| CHRNA7 | CHRNA7 | |
-| CNTNAP3C | CNTNAP3,CNTNAP3C | |
-| CTAGE6 | CTAGE6,CTAGE15 | |
-| CTAGE8 | CTAGE8,CTAGE9,OR2A4,OR2A7,ENPP3 | |
-| CXorf49 | CXorf49,CXorf49B | |
-| DEFB109B | DEFB109B,USP17L1,USP17L3,USP17L4,USP17L8 | |
-| PRR23D1 | PRR23D1,PRR23D2 | |
-| DEFB107A | DEFB107A | |
-| DEFB105A | DEFB105A,DEFB105B | |
-| DEFB106A | DEFB106A,DEFB106B | |
-| DEFB104A | DEFB104A,DEFB104B | |
-| SPAG11A | SPAG11A,SPAG11B | |
-| DEFB103A | DEFB103A,DEFB103B | |
-| DEFB4A | DEFB4A,DEFB4B | |
-| DEFB130A | DEFB130A,DEFB130B | |
-| ZNF705D | ZNF705D | |
-| FAM86B1 | FAM86B1,FAM86B2 | |
-| DHX40 | DHX40 | |
-| EIF3C | EIF3C,EIF3CL | |
-| ETDA | ETDA,ETDB | |
-| FAM156A | FAM156A,FAM156B | |
-| FAM246A | FAM246A,FAM246B | |
-| FCGR2C | FCGR2C | |
-| FRG2 | FRG2,FRG2B | |
-| FRMPD2 | FRMPD2 | |
-| TRAPPC10 | TRAPPC10 | |
-| PWP2 | PWP2 | |
-| GATD3 | GATD3 | |
-| GOLGA6L1 | GOLGA6L1,GOLGA6L26 | |
-| GOLGA6L24 | GOLGA6L24,GOLGA6L25 | |
-| GOLGA8A | GOLGA8A,GOLGA8B | |
-| GOLGA8F | GOLGA8F,GOLGA8G | |
-| GOLGA8K | GOLGA8K,GOLGA8T | |
-| GPAT2 | GPAT2 | |
-| HSF2BP | HSF2BP,H2BC12L | |
-| HSFX1 | HSFX1,HSFX2 | |
-| IQCK | IQCK | |
-| LGALS9B | LGALS9B,LGALS9C | |
-| LIMS3 | LIMS3,LIMS4 | |
-| MAGEA9 | MAGEA9,MAGEA9B | |
-| MAGEH1 | MAGEH1 | |
-| MALL | MALL | |
-| MRPL23 | MRPL23 | |
-| NOTCH2 | NOTCH2,NOTCH2NLR | |
-| NPIPA1 | NPIPA1 | |
-| NPIPA2 | NPIPA2,NPIPA3 | |
-| NPY4R | NPY4R,NPY4R2 | |
-| NUDT4B | NUDT4B | |
-| NXF2 | NXF2,NXF2B | |
-| OCLN | OCLN | |
-| OR1D5 | OR1D5 | |
-| OR2A1 | OR2A1,OR2A42 | |
-| OR2T5 | OR2T5,OR2T29 | |
-| OTOA | OTOA | |
-| PABPC1L2A | PABPC1L2A,PABPC1L2B | |
-| PDPK1 | PDPK1 | |
-| POTEE | POTEE,POTEF | |
-| POTEI | POTEI,POTEJ | |
-| PPIAL4C | PPIAL4C,PPIAL4H | |
-| PPIP5K1 | PPIP5K1 | |
-| PRAMEF10 | PRAMEF10,PRAMEF33 | |
-| PRAMEF7 | PRAMEF7,PRAMEF8 | |
-| PRAMEF5 | PRAMEF5,PRAMEF6 | |
-| PRAMEF25 | PRAMEF25,PRAMEF26,HNRNPCL3,HNRNPCL4 | |
-| PRAMEF13 | PRAMEF13 | |
-| PRAMEF18 | PRAMEF18 | |
-| POTED | POTED | |
-| PRODH | PRODH | |
-| DGCR6 | DGCR6 | |
-| PTPN20 | PTPN20 | |
-| PWWP4 | PWWP4 | |
-| RBPMS | RBPMS | |
-| RGPD1 | RGPD1,RGPD2 | |
-| PLGLB1 | PLGLB1,PLGLB2 | |
-| RGPD3 | RGPD3,RGPD4 | |
-| RHOXF2 | RHOXF2,RHOXF2B | |
-| RMND5A | RMND5A | |
-| RSPH10B | RSPH10B,RSPH10B2 | |
-| SERF1A | SERF1A,SERF1B | |
-| SIK1 | SIK1 | |
-| SMIM11 | SMIM11 | |
-| SMIM34 | SMIM34 | |
-| SYT15 | SYT15,SYT15B | |
-| TCAF1 | TCAF1 | |
-| TCP11X1 | TCP11X1,TCP11X2 | |
-| THOC3 | THOC3 | |
-| TMEM191B | TMEM191B,TMEM191C | |
-| TMLHE | TMLHE | |
-| TRIM49 | TRIM49,TRIM49C | |
-| TRIM64 | TRIM64,TRIM64B | |
-| TRIM73 | TRIM73,TRIM74 | |
-| U2AF1 | U2AF1 | |
-| UPK3BL1 | UPK3BL1,UPK3BL2 | |
-| POLR2J2 | POLR2J2,POLR2J3 | |
-| ZNF595 | ZNF595 | |
-| FOXD4L4 | FOXD4L4,FOXD4L5 | |
-| FOXD4L3 | FOXD4L3,FOXD4L6 | |
-| GTF2IRD2B | GTF2IRD2B,GTF2IRD2 | |
-| NOTCH2NLA | NOTCH2NLA,NOTCH2NLB | |
-| ICOSLG | ICOSLG | |
-| MAGED4 | MAGED4,MAGED4B | |
-| CBS | CBS | |
-| CRYAA | CRYAA | |
-| KCNE1 | KCNE1 | |
-| ARHGEF5 | ARHGEF5,ARHGEF35 | |
-| CASTOR2 | CASTOR2 | |
-| CLEC18C | CLEC18A,CLEC18B,CLEC18C | |
-| GTF2H2 | GTF2H2,GTF2H2C | |
-| GTF2I | GTF2I | |
-| HIC2 | HIC2 | |
-| NBPF4 | NBPF4,NBPF6 | |
-| PPIAL4D | PPIAL4D,PPIAL4E,PPIAL4F | |
-| RGPD5 | RGPD5,RGPD6,RGPD8 | |
-| FAM72A | FAM72A | |
-| GOLGA8N | GOLGA8N,GOLGA8O,GOLGA8Q,GOLGA8R | |
-| POTEB | POTEB,POTEB2,POTE3 | |
-| SPATA31A1 | SPATA31A1,SPATA31A3,SPATA31A5,SPATA31A7 | |
-| NUTM2A | NUTM2A,NUTM2B,NUTM2D,NUTM2E | |
-| ANKRD20A1 | ANKRD20A1 | |
-| GRAPL | GRAPL,GRAP | |
-| ARL17A | ARL17A,ARL17B | |
-| NSF | NSF | |
-| AMY1A | AMY1A,AMY1B,AMY1C | |
-| CTAG1A | CTAG1A,CTAG1B | |
-| BOLA2 | BOLA2,BOLA2B,SLX1A,SLX1B,SULT1A3,SULT1A4 | |
-| | | |
\ No newline at end of file
+| Region name | Genes encoded | Supported in GRCh37/hg19 |
+| :------------------ | :--------------------------------------------- | :----------------------: |
+| smn1 | SMN1,SMN2 | x |
+| pms2 | PMS2 | x |
+| rccx | CYP21A2,C4A,C4B,TNXB | x |
+| strc | STRC | x |
+| cfc1 | CFC1,CFC1B | x |
+| ikbkg | IKBKG | x |
+| ncf1 | NCF1 | x |
+| neb | NEB | x |
+| f8 | F8A1,F8A2,F8A3,H2AB1,H2AB2,H2AB3 | x |
+| opn1lw | OPN1LW,OPN1MW,OPN1MW2,OPN1MW3,TEX28 | x |
+| hba | HBA1,HBA2 | x |
+| SSX2 | SSX2,SSX2B | |
+| SSX4 | SSX4,SSX4B | |
+| CENPVL2 | CENPVL1,CENPVL2 | |
+| DMRTC1 | DMRTC1,DMRTC1B,FAM236A,FAM236B,FAM236C,FAM236D | |
+| XAGE1A | XAGE1A,XAGE1B | |
+| TRIM49D1 | TRIM49D1,TRIM49D2 | |
+| CXorf51A | CXorf51A,CXorf51B | |
+| DDT | DDT,DDTL | |
+| H3C14 | H3C14,H3C15,H2AC18,H2AC19,H4C14,H4C15 | |
+| MBD3L2 | MBD3L2,MBD3L2B,MBD3L3,MBD3L4,MBD3L5 | |
+| DEFA1 | DEFA1,DEFA1B,DEFA3 | |
+| CCZ1 | CCZ1 | |
+| CSAG2 | CSAG2,CSAG3,MAGEA2,MAGEA2B,MAGEA3,MAGEA6 | |
+| OR4M2 | OR4M2,OR4N4,OR4M2B,OR4N4C | |
+| RIMBP3B | RIMBP3B,RIMBP3,RIMBP3C | |
+| AMY2A | AMY2A | |
+| AGAP9 | AGAP9 | |
+| ANXA8 | ANXA8,ANXA8L1 | |
+| C21orf140 | C21orf140 | |
+| CDY1 | CDY1,CDY1B | |
+| CDY2A | CDY2A,CDY2B | |
+| HSFY1 | HSFY1,HSFY2 | |
+| PRY | PRY,PRY2 | |
+| BPY2 | BPY2,BPY2B,BPY2C | |
+| CHRNA7 | CHRNA7 | |
+| CNTNAP3 | CNTNAP3,CNTNAP3C | |
+| CTAGE6 | CTAGE6,CTAGE15 | |
+| CTAGE8 | CTAGE8,CTAGE9,OR2A4,OR2A7,ENPP3 | |
+| CXorf49 | CXorf49,CXorf49B | |
+| DEFB109B | DEFB109B,USP17L1,USP17L3,USP17L4,USP17L8 | |
+| PRR23D1 | PRR23D1,PRR23D2 | |
+| DEFB107A | DEFB107A | |
+| DEFB105A | DEFB105A,DEFB105B | |
+| DEFB106A | DEFB106A,DEFB106B | |
+| DEFB104A | DEFB104A,DEFB104B | |
+| SPAG11A | SPAG11A,SPAG11B | |
+| DEFB103A | DEFB103A,DEFB103B | |
+| DEFB4A | DEFB4A,DEFB4B | |
+| DEFB130A | DEFB130A,DEFB130B | |
+| ZNF705D | ZNF705D | |
+| FAM86B1 | FAM86B1,FAM86B2 | |
+| DHX40 | DHX40 | |
+| EIF3C | EIF3C,EIF3CL | |
+| ETDA | ETDA,ETDB | |
+| FAM156A | FAM156A,FAM156B | |
+| FAM246A | FAM246A,FAM246B | |
+| FCGR2C | FCGR2C,FCGR2B | |
+| FRG2 | FRG2,FRG2B | |
+| FRMPD2 | FRMPD2 | |
+| TRAPPC10 | TRAPPC10 | |
+| PWP2 | PWP2 | |
+| GATD3 | GATD3 | |
+| GOLGA6L24 | GOLGA6L24,GOLGA6L25 | |
+| GOLGA8A | GOLGA8A,GOLGA8B | |
+| GOLGA8F | GOLGA8F,GOLGA8G | |
+| GOLGA8K | GOLGA8K,GOLGA8T | |
+| GPAT2 | GPAT2 | |
+| HSF2BP | HSF2BP,H2BC12L | |
+| HSFX1 | HSFX1,HSFX2 | |
+| IQCK | IQCK | |
+| LGALS9B | LGALS9B,LGALS9C | |
+| LIMS3 | LIMS3,LIMS4 | |
+| MAGEA9 | MAGEA9,MAGEA9B | |
+| MAGEH1 | MAGEH1 | |
+| MALL | MALL | |
+| MRPL23 | MRPL23 | |
+| NOTCH2 | NOTCH2,NOTCH2NLR | |
+| NPIPA2 | NPIPA2,NPIPA3 | |
+| NPY4R | NPY4R,NPY4R2 | |
+| NUDT4B | NUDT4B | |
+| NXF2 | NXF2,NXF2B | |
+| OCLN | OCLN | |
+| OR1D5 | OR1D5 | |
+| OR2A1 | OR2A1,OR2A42 | |
+| OR2T5 | OR2T5,OR2T29 | |
+| OTOA | OTOA | |
+| PABPC1L2A | PABPC1L2A,PABPC1L2B | |
+| PDPK1 | PDPK1 | |
+| POTEE | POTEE,POTEF | |
+| POTEI | POTEI,POTEJ | |
+| PPIAL4C | PPIAL4C,PPIAL4H | |
+| PPIP5K1 | PPIP5K1 | |
+| PRAMEF10 | PRAMEF10,PRAMEF33 | |
+| PRAMEF7 | PRAMEF7,PRAMEF8 | |
+| PRAMEF25 | PRAMEF25,PRAMEF26,HNRNPCL3,HNRNPCL4 | |
+| PRAMEF13 | PRAMEF13 | |
+| PRAMEF18 | PRAMEF18 | |
+| POTED | POTED | |
+| PRODH | PRODH | |
+| DGCR6 | DGCR6 | |
+| PTPN20 | PTPN20 | |
+| PWWP4 | PWWP4 | |
+| RBPMS | RBPMS | |
+| RGPD1 | RGPD1,RGPD2 | |
+| PLGLB1 | PLGLB1,PLGLB2 | |
+| RGPD3 | RGPD3,RGPD4 | |
+| RHOXF2 | RHOXF2,RHOXF2B | |
+| RMND5A | RMND5A | |
+| RSPH10B | RSPH10B,RSPH10B2 | |
+| SERF1A | SERF1A,SERF1B | |
+| SIK1 | SIK1 | |
+| SMIM11 | SMIM11 | |
+| SMIM34 | SMIM34 | |
+| SYT15 | SYT15,SYT15B | |
+| TCAF1 | TCAF1 | |
+| TCP11X1 | TCP11X1,TCP11X2 | |
+| THOC3 | THOC3 | |
+| TMEM191B | TMEM191B,TMEM191C | |
+| TMLHE | TMLHE | |
+| TRIM49 | TRIM49,TRIM49C | |
+| TRIM64 | TRIM64,TRIM64B | |
+| TRIM73 | TRIM73,TRIM74 | |
+| U2AF1 | U2AF1 | |
+| UPK3BL1 | UPK3BL1,UPK3BL2 | |
+| POLR2J2 | POLR2J2,POLR2J3 | |
+| ZNF595 | ZNF595 | |
+| FOXD4L4 | FOXD4L4,FOXD4L5 | |
+| FOXD4L3 | FOXD4L3,FOXD4L6 | |
+| GTF2IRD2B | GTF2IRD2B,GTF2IRD2 | |
+| NOTCH2NLA | NOTCH2NLA,NOTCH2NLB | |
+| ICOSLG | ICOSLG | |
+| MAGED4 | MAGED4,MAGED4B | |
+| CBS | CBS | |
+| CRYAA | CRYAA | |
+| KCNE1 | KCNE1 | |
+| ARHGEF5 | ARHGEF5,ARHGEF35 | |
+| CASTOR2 | CASTOR2 | |
+| CLEC18C | CLEC18A,CLEC18B,CLEC18C | |
+| GTF2H2 | GTF2H2,GTF2H2C | |
+| GTF2I | GTF2I | |
+| HIC2 | HIC2 | |
+| NBPF4 | NBPF4,NBPF6 | |
+| PPIAL4D | PPIAL4D,PPIAL4E,PPIAL4F | |
+| RGPD5 | RGPD5,RGPD6,RGPD8 | |
+| FAM72A | FAM72A | |
+| GOLGA8N | GOLGA8N,GOLGA8O,GOLGA8Q,GOLGA8R | |
+| POTEB | POTEB,POTEB2,POTEB3 | |
+| NUTM2A | NUTM2A,NUTM2B,NUTM2D,NUTM2E | |
+| ANKRD20A1 | ANKRD20A1 | |
+| GRAPL | GRAPL,GRAP | |
+| ARL17A | ARL17A,ARL17B | |
+| NSF | NSF | |
+| AMY1A | AMY1A,AMY1B,AMY1C | |
+| CTAG1A | CTAG1A,CTAG1B | |
+| BOLA2 | BOLA2,BOLA2B,SLX1A,SLX1B,SULT1A3,SULT1A4 | |
+| CYP2D6 | CYP2D6 | |
+| GBA | GBA1 | |
+| CYP11B1 | CYP11B1,CYP11B2 | |
+| CFHclust(CFH,CFHR3) | CFH,CFHR1,CFHR2,CFHR3,CFHR4 | |
diff --git a/docs/regions.txt b/docs/regions.txt
deleted file mode 100644
index d8f9792..0000000
--- a/docs/regions.txt
+++ /dev/null
@@ -1,162 +0,0 @@
-Region_name Genes_encoded Supported_in_GRCh37/hg19
-smn1 SMN1,SMN2 x
-pms2 PMS2 x
-rccx CYP21A2,C4A,C4B,TNXB x
-strc STRC x
-cfc1 CFC1,CFC1B x
-ikbkg IKBKG x
-ncf1 NCF1 x
-neb NEB x
-f8 F8A1,F8A2,F8A3,H2AB1,H2AB2,H2AB3 x
-opn1lw OPN1LW,OPN1MW,OPN1MW2,OPN1MW3,TEX28 x
-hba HBA1,HBA2 x
-SSX2 SSX2,SSX2B
-SSX4 SSX4,SSX4B
-CR1 CR1
-CENPVL2 CENPVL1,CENPVL2
-DMRTC1 DMRTC1,DMRTC1B,FAM236A,FAM236B,FAM236C,FAM236D
-XAGE1A XAGE1A,XAGE1B
-TRIM49D1 TRIM49D1,TRIM49D2
-CXorf51A CXorf51A,CXorf51B
-DDT DDT,DDTL
-H3C14 H3C14,H3C15,H2AC18,H2AC19,H4C14,H4C15
-MBD3L2 MBD3L2,MBD3L2B
-DEFA1 DEFA1,DEFA1B,DEFA3
-CCZ1 CCZ1
-CSAG2 CSAG2,CSAG3,MAGEA2,MAGEA2B,MAGEA3,MAGEA6
-OR4M2 OR4M2,OR4N4,OR4M2B,OR4N4C
-RIMBP3B RIMBP3B,RIMBP3,RIMBP3C
-AMY2A AMY2A
-AGAP9 AGAP9
-ANXA8 ANXA8,ANXA8L1
-C21orf140 C21orf140
-CDY1 CDY1,CDY1B
-CDY2A CDY2A,CDY2B
-HSFY1 HSFY1,HSFY2
-PRY PRY,PRY2
-BPY2 BPY2,BPY2B,BPY2C
-CHRNA7 CHRNA7
-CNTNAP3C CNTNAP3,CNTNAP3C
-CTAGE6 CTAGE6,CTAGE15
-CTAGE8 CTAGE8,CTAGE9,OR2A4,OR2A7,ENPP3
-CXorf49 CXorf49,CXorf49B
-DEFB109B DEFB109B,USP17L1,USP17L3,USP17L4,USP17L8
-PRR23D1 PRR23D1,PRR23D2
-DEFB107A DEFB107A
-DEFB105A DEFB105A,DEFB105B
-DEFB106A DEFB106A,DEFB106B
-DEFB104A DEFB104A,DEFB104B
-SPAG11A SPAG11A,SPAG11B
-DEFB103A DEFB103A,DEFB103B
-DEFB4A DEFB4A,DEFB4B
-DEFB130A DEFB130A,DEFB130B
-ZNF705D ZNF705D
-FAM86B1 FAM86B1,FAM86B2
-DHX40 DHX40
-EIF3C EIF3C,EIF3CL
-ETDA ETDA,ETDB
-FAM156A FAM156A,FAM156B
-FAM246A FAM246A,FAM246B
-FCGR2C FCGR2C
-FRG2 FRG2,FRG2B
-FRMPD2 FRMPD2
-TRAPPC10 TRAPPC10
-PWP2 PWP2
-GATD3 GATD3
-GOLGA6L1 GOLGA6L1,GOLGA6L26
-GOLGA6L24 GOLGA6L24,GOLGA6L25
-GOLGA8A GOLGA8A,GOLGA8B
-GOLGA8F GOLGA8F,GOLGA8G
-GOLGA8K GOLGA8K,GOLGA8T
-GPAT2 GPAT2
-HSF2BP HSF2BP,H2BC12L
-HSFX1 HSFX1,HSFX2
-IQCK IQCK
-LGALS9B LGALS9B,LGALS9C
-LIMS3 LIMS3,LIMS4
-MAGEA9 MAGEA9,MAGEA9B
-MAGEH1 MAGEH1
-MALL MALL
-MRPL23 MRPL23
-NOTCH2 NOTCH2,NOTCH2NLR
-NPIPA1 NPIPA1
-NPIPA2 NPIPA2,NPIPA3
-NPY4R NPY4R,NPY4R2
-NUDT4B NUDT4B
-NXF2 NXF2,NXF2B
-OCLN OCLN
-OR1D5 OR1D5
-OR2A1 OR2A1,OR2A42
-OR2T5 OR2T5,OR2T29
-OTOA OTOA
-PABPC1L2A PABPC1L2A,PABPC1L2B
-PDPK1 PDPK1
-POTEE POTEE,POTEF
-POTEI POTEI,POTEJ
-PPIAL4C PPIAL4C,PPIAL4H
-PPIP5K1 PPIP5K1
-PRAMEF10 PRAMEF10,PRAMEF33
-PRAMEF7 PRAMEF7,PRAMEF8
-PRAMEF5 PRAMEF5,PRAMEF6
-PRAMEF25 PRAMEF25,PRAMEF26,HNRNPCL3,HNRNPCL4
-PRAMEF13 PRAMEF13
-PRAMEF18 PRAMEF18
-POTED POTED
-PRODH PRODH
-DGCR6 DGCR6
-PTPN20 PTPN20
-PWWP4 PWWP4
-RBPMS RBPMS
-RGPD1 RGPD1,RGPD2
-PLGLB1 PLGLB1,PLGLB2
-RGPD3 RGPD3,RGPD4
-RHOXF2 RHOXF2,RHOXF2B
-RMND5A RMND5A
-RSPH10B RSPH10B,RSPH10B2
-SERF1A SERF1A,SERF1B
-SIK1 SIK1
-SMIM11 SMIM11
-SMIM34 SMIM34
-SYT15 SYT15,SYT15B
-TCAF1 TCAF1
-TCP11X1 TCP11X1,TCP11X2
-THOC3 THOC3
-TMEM191B TMEM191B,TMEM191C
-TMLHE TMLHE
-TRIM49 TRIM49,TRIM49C
-TRIM64 TRIM64,TRIM64B
-TRIM73 TRIM73,TRIM74
-U2AF1 U2AF1
-UPK3BL1 UPK3BL1,UPK3BL2
-POLR2J2 POLR2J2,POLR2J3
-ZNF595 ZNF595
-FOXD4L4 FOXD4L4,FOXD4L5
-FOXD4L3 FOXD4L3,FOXD4L6
-GTF2IRD2B GTF2IRD2B,GTF2IRD2
-NOTCH2NLA NOTCH2NLA,NOTCH2NLB
-ICOSLG ICOSLG
-MAGED4 MAGED4,MAGED4B
-CBS CBS
-CRYAA CRYAA
-KCNE1 KCNE1
-ARHGEF5 ARHGEF5,ARHGEF35
-CASTOR2 CASTOR2
-CLEC18C CLEC18A,CLEC18B,CLEC18C
-GTF2H2 GTF2H2,GTF2H2C
-GTF2I GTF2I
-HIC2 HIC2
-NBPF4 NBPF4,NBPF6
-PPIAL4D PPIAL4D,PPIAL4E,PPIAL4F
-RGPD5 RGPD5,RGPD6,RGPD8
-FAM72A FAM72A
-GOLGA8N GOLGA8N,GOLGA8O,GOLGA8Q,GOLGA8R
-POTEB POTEB,POTEB2,POTE3
-SPATA31A1 SPATA31A1,SPATA31A3,SPATA31A5,SPATA31A7
-NUTM2A NUTM2A,NUTM2B,NUTM2D,NUTM2E
-ANKRD20A1 ANKRD20A1
-GRAPL GRAPL,GRAP
-ARL17A ARL17A,ARL17B
-NSF NSF
-AMY1A AMY1A,AMY1B,AMY1C
-CTAG1A CTAG1A,CTAG1B
-BOLA2 BOLA2,BOLA2B,SLX1A,SLX1B,SULT1A3,SULT1A4
diff --git a/docs/vcf.md b/docs/vcf.md
new file mode 100644
index 0000000..121b8df
--- /dev/null
+++ b/docs/vcf.md
@@ -0,0 +1,29 @@
+# Paraphase VCF
+
+Paraphase produces a VCF file for each region per sample.
+
+As genes of the same family can be highly similar to each other in sequence and not easy to differentiate (at the sequence level or even at the functional level), variant calls are made against one selected "main" gene from the gene family (e.g. the functional gene is selected when the family has a gene and a pseudogene). In this way, all copies of the gene family can be evaluated for pathogenic variants and one can calculate the copy number of the functional genes in the family and hence infer the disease/carrier status.
+
+If provided with prior knowledge on paralog differentiation, Paraphase can assign haplotypes into different genes in the family and give them a label. This is currently done for SMN1 (paralog SMN2), PMS2 (pseudogene PMS2CL), STRC (pseudogene STRCP1), NCF1 (pseudogenes NCF1B and NCF1C) and IKBKG (pseudogene IKBKGP1). In these families, haplotypes are assigned to each gene in the family, i.e. gene or paralog/pseudogene, and variants are called against the gene (or paralog/pseudogene) for the gene (or paralog/pseudogene) haplotypes, respectively. If desired, you can ask Paraphase to call all variants against just the main gene for these five families with the `--gene1only` option.
+
+## VCF format
+
+We have repurposed the sample column to report haplotypes (gene copies) found in a region (gene family). Each column represents a haplotype. The haplotype names are consistent with those reported in the `json` file.
+
+In the INFO field, we report the boundaries of haplotypes with `HPBOUND`, which is a pair of numbers representing the start and end coordinates of the haplotype, i.e. phase block. In the case of complete phasing, these numbers represent the start and end of the region that Paraphase is designed to phase. Otherwise, sometimes Paraphase can only phase part of the region and the start and end of the phase blocks are reflected by these numbers. The coordinates are sometimes prefixed or appended by the word `truncated`, which means that the haplotypes are clipped right before or after. This marks the end of the homology and these truncated haplotypes are often those from the paralog or the pseudogene. The `HPBOUND` field is useful when annotating variants in Paraphase VCFs as the boundaries and the truncated status can be compared against the gene/transcript coordinates to determine if we have full information for the complete gene.
+
+When Paraphase is able to phase haplotypes into two chromosomes, this information is reported in the VCF under `ALLELE` in the INFO field. Haplotypes on the same chromosome are grouped together, separated by `+`, and the two chomosomes are separated by `,`.
+
+
+## Example
+
+| #CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | rccx_hap1 | rccx_hap2 | rccx_hap3 | rccx_hap4 | rccx_hap5 |
+| :------| :------- | :- | :-- | :-- | :--- | :----- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :--------| :---------| :---------| :---------| :---------| :---------|
+| chr6 | 32013369 | . | T | C | . | PASS | HPBOUND=32013300-32046127truncated,32013300-32046200,32013300-32046200,32013300-32046127truncated,32013300-32046127truncated;ALLELE=rccx_hap2+rccx_hap4,rccx_hap1+rccx_hap3+rccx_hap5 | GT:DP:AD | 1:19:0,19 | 1:24:0,24 | 1:23:0,23 | 0:21:21,0 | 0:23:23,0 |
+
+Here is one line from the VCF of the bottom sample in the figure below. Based on the `HPBOUND` and `ALLELE` INFO fields, we can infer the following:
+
+There are five copies of the RCCX repeat, phased into two chomosomes `rccx_hap2+rccx_hap4` and `rccx_hap1+rccx_hap3+rccx_hap5`. `hap2` and `hap3` are phased all the way through the region end `32046200`, and the other three copies are clipped slightly earlier at `32046127`, which is the end of the homology region between RCCX and the paralogous copy. These three copies only carry a truncated version of TNXB.
+
+
+
diff --git a/paraphase/__init__.py b/paraphase/__init__.py
index 528787c..f5f41e5 100755
--- a/paraphase/__init__.py
+++ b/paraphase/__init__.py
@@ -1 +1 @@
-__version__ = "3.0.0"
+__version__ = "3.1.0"
diff --git a/paraphase/data/19/config.yaml b/paraphase/data/19/config.yaml
index a2b3561..dd6a9f7 100644
--- a/paraphase/data/19/config.yaml
+++ b/paraphase/data/19/config.yaml
@@ -30,13 +30,14 @@ pms2:
genes: PMS2
check_nm: 0.1
realign_region: chr7:6004631-6049631
- extract_regions: chr7:6004631-6028693 chr7:6775260-6799427
- gene2_region: chr7:6775260-6799427
- right_boundary: 6027131
+ extract_regions: chr7:6004631-6033041 chr7:6773174-6799427
+ gene2_region: chr7:6773174-6799427
+ right_boundary: 6033081
pivot_site: 6026200
- noisy_region: [[6020511, 6020611], [6019294, 6019300], [6015581, 6015711]]
+ noisy_region: [[6020511, 6020611], [6019294, 6019300], [6015581, 6015711], [6028131, 6031981], [6032411, 6033031]]
is_reverse: True
add_sites: ["6026200_G_A"]
+ clip_3p_positions: [6033041]
rccx:
genes: CYP21A2,C4A,C4B,TNXB
is_tandem: True
@@ -54,7 +55,7 @@ rccx:
deletion1_name: "31985208_del_6367"
deletion2_name: "32011495_del_120"
left_boundary: 31981077
- right_boundary: 32013777
+ right_boundary: 32013977
# 120bp sv
del2_3p_pos1: 32011477
del2_3p_pos2: 32011500
@@ -86,8 +87,8 @@ strc:
del1_5p_pos2: 43895146
cfc1:
genes: CFC1,CFC1B
- realign_region: chr2:131349573-131367573
- extract_regions: chr2:131268326-131291314 chr2:131344573-131367573
+ realign_region: chr2:131349573-131362573
+ extract_regions: chr2:131268326-131291314 chr2:131344573-131362573
pivot_site: 131350634
is_reverse: True
add_sites: ["131350634_A_G"]
@@ -162,8 +163,6 @@ hba:
extract_regions: chr16:218655-227540
clip_5p_positions: [221988]
clip_3p_positions: [223728]
- gene_start: 221988
- gene_end: 223728
is_tandem: True
depth_region: [[209999, 239999]]
noisy_region: [[222416, 222417]]
diff --git a/paraphase/data/38/config.yaml b/paraphase/data/38/config.yaml
index f3d1dbe..c170da0 100644
--- a/paraphase/data/38/config.yaml
+++ b/paraphase/data/38/config.yaml
@@ -30,13 +30,14 @@ pms2:
genes: PMS2
check_nm: 0.1
realign_region: chr7:5965000-6010000
- extract_regions: chr7:5965000-5989062 chr7:6735629-6759796
- gene2_region: chr7:6735629-6759796
- right_boundary: 5987500
+ extract_regions: chr7:5965000-5993410 chr7:6733543-6759796
+ gene2_region: chr7:6733543-6759796
+ right_boundary: 5993450
pivot_site: 5986569
- noisy_region: [[5980880, 5980980], [5979663, 5979669], [5975950, 5976080]]
+ noisy_region: [[5980880, 5980980], [5979663, 5979669], [5975950, 5976080], [5988500, 5992350], [5992780, 5993400]]
is_reverse: True
add_sites: ["5986569_G_A"]
+ clip_3p_positions: [5993410]
rccx:
genes: CYP21A2,C4A,C4B,TNXB
is_tandem: True
@@ -54,7 +55,7 @@ rccx:
deletion1_name: "32017431_del_6367"
deletion2_name: "32043718_del_120"
left_boundary: 32013300
- right_boundary: 32046000
+ right_boundary: 32046200
# 120bp sv
del2_3p_pos1: 32043700
del2_3p_pos2: 32043723
@@ -86,8 +87,8 @@ strc:
del1_5p_pos2: 43602948
cfc1:
genes: CFC1,CFC1B
- realign_region: chr2:130592000-130610000
- extract_regions: chr2:130510753-130533741 chr2:130587000-130610000
+ realign_region: chr2:130592000-130605000
+ extract_regions: chr2:130510753-130533741 chr2:130587000-130605000
pivot_site: 130593061
is_reverse: True
add_sites: ["130593061_A_G"]
@@ -162,8 +163,6 @@ hba:
extract_regions: chr16:168656-177541
clip_5p_positions: [171989]
clip_3p_positions: [173729]
- gene_start: 171989
- gene_end: 173729
is_tandem: True
depth_region: [[160000, 190000]]
noisy_region: [[172417, 172418]]
@@ -172,27 +171,25 @@ SSX2:
genes: SSX2,SSX2B
is_tandem: True
is_reverse: True
+ is_palindrome: True
realign_region: chrX:52696000-52729000
extract_regions: chrX:52696000-52762432
SSX4:
genes: SSX4,SSX4B
is_tandem: True
is_reverse: True
+ is_palindrome: True
realign_region: chrX:48380000-48396314
extract_regions: chrX:48380000-48417000
left_boundary: 48380500
-CR1:
- genes: CR1
- realign_region: chr1:207527100-207545700
- extract_regions: chr1:207529087-207566209
- is_tandem: True
- check_nm: 0.02
+ noisy_region: [[48381096, 48381555]]
CENPVL2:
genes: CENPVL1,CENPVL2
realign_region: chrX:51675000-51692963
extract_regions: chrX:51675000-51692963 chrX:51700364-51718343
is_tandem: True
is_reverse: True
+ is_palindrome: True
noisy_region: [[51691779, 51691984]]
DMRTC1:
genes: DMRTC1,DMRTC1B,FAM236A,FAM236B,FAM236C,FAM236D
@@ -203,6 +200,7 @@ DMRTC1:
noisy_region: [[72943757, 72943880]]
is_tandem: True
is_reverse: True
+ is_palindrome: True
XAGE1A:
genes: XAGE1A,XAGE1B
realign_region: chrX:52490000-52502000
@@ -216,7 +214,6 @@ TRIM49D1:
left_boundary: 89909500
right_boundary: 89922500
clip_3p_positions: [89920623]
- gene_end: 89920623
is_tandem: True
is_reverse: True
CXorf51A:
@@ -225,6 +222,8 @@ CXorf51A:
extract_regions: chrX:146812425-146822425 chrX:146802090-146812090
is_tandem: True
is_reverse: True
+ is_palindrome: True
+ noisy_region: [[146816220, 146816278]]
DDT:
genes: DDT,DDTL
realign_region: chr22:23971000-23982905
@@ -234,20 +233,22 @@ DDT:
is_reverse: True
H3C14:
genes: H3C14,H3C15,H2AC18,H2AC19,H4C14,H4C15
- realign_region: chr1:149828816-149845484
- extract_regions: chr1:149828816-149843536 chr1:149850276-149865000
- left_boundary: 149828816
- right_boundary: 149843536
+ realign_region: chr1:149832000-149845484
+ extract_regions: chr1:149832000-149843536 chr1:149850276-149865000
+ clip_3p_positions: [149843518]
+ noisy_region: [[149837998, 149838086]]
is_tandem: True
is_reverse: True
MBD3L2:
- genes: MBD3L2,MBD3L2B
- realign_region: chr19:7037528-7060528
- extract_regions: chr19:7037528-7051624 chr19:7019148-7033227
+ genes: MBD3L2,MBD3L2B,MBD3L3,MBD3L4,MBD3L5
+ realign_region: chr19:7037528-7051735
+ extract_regions: chr19:7037528-7062000 chr19:7019148-7033227
left_boundary: 7037528
right_boundary: 7051623
is_reverse: True
is_tandem: True
+ clip_3p_positions: [7043226]
+ noisy_region: [[7042002, 7042106]]
DEFA1:
genes: DEFA1,DEFA1B,DEFA3
realign_region: chr8:6974238-6991000
@@ -255,9 +256,8 @@ DEFA1:
left_boundary: 6974300
is_tandem: True
check_nm: 0.1
- noisy_region: [[6984000, 6985910]]
+ noisy_region: [[6975823, 6975828], [6984000, 6985910]]
clip_3p_positions: [6985908]
- keep_truncated: True
CCZ1:
genes: CCZ1
realign_region: chr7:5898000-5930000
@@ -268,6 +268,7 @@ CSAG2:
realign_region: chrX:152690000-152720620
extract_regions: chrX:152690000-152720620 chrX:152747848-152778471
is_reverse: True
+ is_palindrome: True
OR4M2:
genes: OR4M2,OR4N4,OR4M2B,OR4N4C
realign_region: chr15:22071068-22105192
@@ -336,32 +337,26 @@ PRY:
realign_region: chrY:22489897-22515137
extract_regions: chrY:22489898-22515137 chrY:22071256-22096507 chrY:23680940-23702786 chrY:25967288-25989136
left_boundary: 22493307
- right_boundary:
- expect_cn2: True
BPY2:
genes: BPY2,BPY2B,BPY2C
realign_region: chrY:22983763-23005965
extract_regions: chrY:22983764-23005965 chrY:24617505-24639707 chrY:25030401-25052603
left_boundary: 22983765
right_boundary: 23005964
- expect_cn2: True
CHRNA7:
genes: CHRNA7
- realign_region: chr15:32030015-32173518
- extract_regions: chr15:32030015-32173518 chr15:30360306-30375956
- left_boundary: 32030015
- right_boundary: 32173518
+ realign_region: chr15:32100015-32173518
+ extract_regions: chr15:32100015-32173518 chr15:30360306-30375956
is_reverse: True
clip_5p_positions: [32153206]
-CNTNAP3C:
+ noisy_region: [[32114013, 32114163]]
+CNTNAP3:
genes: CNTNAP3,CNTNAP3C
- realign_region: chr9:61330217-61453530
+ realign_region: chr9:39162500-39288814
extract_regions: chr9:61330217-61453530 chr9:39165433-39288814
- left_boundary: 61330217
- right_boundary: 61453530
is_reverse: True
expect_cn2: True
- noisy_region: [[61409346, 61409399], [61367691, 61367733]]
+ noisy_region: [[39209580, 39209791], [39251285, 39251389], [39253010, 39253051], [39257549, 39257559]]
CTAGE6:
genes: CTAGE6,CTAGE15
realign_region: chr7:143746392-143762392
@@ -382,6 +377,7 @@ CXorf49:
left_boundary: 71706330
right_boundary: 71726327
is_reverse: True
+ is_palindrome: True
DEFB109B:
genes: DEFB109B,USP17L1,USP17L3,USP17L4,USP17L8
@@ -398,6 +394,7 @@ PRR23D1:
left_boundary: 7531040
right_boundary: 7546000
is_reverse: True
+ noisy_region: [[7538337, 7538352]]
DEFB107A:
genes: DEFB107A
@@ -413,6 +410,7 @@ DEFB105A:
left_boundary: 7813694
right_boundary: 7832441
is_reverse: True
+ noisy_region: [[7826969,7827750]]
DEFB106A:
genes: DEFB106A,DEFB106B
realign_region: chr8:7817096-7837096
@@ -420,6 +418,7 @@ DEFB106A:
left_boundary: 7817098
right_boundary: 7837095
is_reverse: True
+ noisy_region: [[7826969,7827750]]
DEFB104A:
genes: DEFB104A,DEFB104B
realign_region: chr8:7828839-7848839
@@ -463,8 +462,9 @@ ZNF705D:
extract_regions: chr8:12088839-12116016 chr8:12335087-12362253 chr8:7930859-7955249
left_boundary: 12088840
right_boundary: 12116015
- check_nm: 0.2
+ check_nm: 0.1
noisy_region: [[12091626, 12101750]]
+ clip_5p_positions: [12091626]
FAM86B1:
genes: FAM86B1,FAM86B2
realign_region: chr8:12180000-12200000
@@ -494,7 +494,6 @@ ETDA:
right_boundary: 135262822
is_reverse: True
clip_3p_positions: [135256413]
- gene_end: 135256413
FAM156A:
genes: FAM156A,FAM156B
realign_region: chrX:52942886-52962886
@@ -502,6 +501,7 @@ FAM156A:
left_boundary: 52942888
right_boundary: 52962885
is_reverse: True
+ is_palindrome: True
FAM246A:
genes: FAM246A,FAM246B
realign_region: chr22:21350950-21370950
@@ -511,7 +511,7 @@ FAM246A:
is_reverse: True
FCGR2C:
- genes: FCGR2C
+ genes: FCGR2C,FCGR2B
realign_region: chr1:161580571-161602000
extract_regions: chr1:161580572-161594218 chr1:161662376-161676049
left_boundary: 161580573
@@ -558,13 +558,6 @@ GATD3:
right_boundary: 44147500
is_reverse: True
expect_cn2: True
-
-GOLGA6L1:
- genes: GOLGA6L1,GOLGA6L26
- realign_region: chr15:23121944-23141944
- extract_regions: chr15:23123715-23141944 chr15:23321180-23339370
- left_boundary: 23123716
- right_boundary: 23141943
GOLGA6L24:
genes: GOLGA6L24,GOLGA6L25
realign_region: chr15:28342967-28362967
@@ -578,6 +571,7 @@ GOLGA8A:
extract_regions: chr15:34378799-34431301 chr15:34525011-34577508
left_boundary: 34380000
right_boundary: 34420000
+ noisy_region: [[34417325, 34417360]]
GOLGA8F:
genes: GOLGA8F,GOLGA8G
realign_region: chr15:28375342-28395342
@@ -587,10 +581,8 @@ GOLGA8F:
is_reverse: True
GOLGA8K:
genes: GOLGA8K,GOLGA8T
- realign_region: chr15:32386445-32406445
+ realign_region: chr15:32389000-32406445
extract_regions: chr15:32386446-32406445 chr15:30131900-30151905
- left_boundary: 32386447
- right_boundary: 32406444
is_reverse: True
GPAT2:
genes: GPAT2
@@ -623,6 +615,7 @@ IQCK:
left_boundary: 19760000
right_boundary: 19830000
expect_cn2: True
+ noisy_region: [[19786695, 19786930], [19792759, 19792771]]
LGALS9B:
genes: LGALS9B,LGALS9C
realign_region: chr17:20448467-20468467
@@ -652,6 +645,7 @@ MAGEH1:
right_boundary: 55462845
clip_5p_positions: [55453508]
is_reverse: True
+ is_palindrome: True
MALL:
genes: MALL
realign_region: chr2:110083370-110118500
@@ -673,14 +667,8 @@ NOTCH2:
extract_regions: chr1:120008822-120070162 chr1:120723414-120784768
left_boundary: 120008823
right_boundary: 120069700
- noisy_region: [[120024344, 120024352]]
- is_reverse: True
-NPIPA1:
- genes: NPIPA1
- realign_region: chr16:14934749-14954749
- extract_regions: chr16:14934750-14952002 chr16:16333237-16350465
- left_boundary: 14934751
- right_boundary: 14952001
+ noisy_region: [[120024344, 120024352], [120019785, 120019790]]
+ is_reverse: True
NPIPA2:
genes: NPIPA2,NPIPA3
realign_region: chr16:14742033-14765962
@@ -707,6 +695,7 @@ NXF2:
left_boundary: 102300000
right_boundary: 102327221
is_reverse: True
+ is_palindrome: True
OCLN:
genes: OCLN
realign_region: chr5:69492290-69558604
@@ -751,7 +740,7 @@ PABPC1L2A:
right_boundary: 73086957
clip_5p_positions: [73077480]
is_reverse: True
- gene_start: 73077480
+ is_palindrome: True
PDPK1:
genes: PDPK1
realign_region: chr16:2537521-2603688
@@ -760,6 +749,8 @@ PDPK1:
right_boundary: 2599224
clip_3p_positions: [2584851]
is_reverse: True
+ use_r2k: True
+ noisy_region: [[2551657, 2552272]]
POTEE:
genes: POTEE,POTEF
realign_region: chr2:131209036-131265778
@@ -803,13 +794,7 @@ PRAMEF7:
left_boundary: 12908682
right_boundary: 12928679
is_reverse: True
-PRAMEF5:
- genes: PRAMEF5,PRAMEF6
- realign_region: chr1:13248816-13268816
- extract_regions: chr1:13248817-13268816 chr1:12932973-12952975
- left_boundary: 13248818
- right_boundary: 13268815
- is_reverse: True
+ noisy_region: [[12915463, 12915676]]
PRAMEF25:
genes: PRAMEF25,PRAMEF26,HNRNPCL3,HNRNPCL4
realign_region: chr1:13058000-13084329
@@ -817,6 +802,8 @@ PRAMEF25:
left_boundary: 13058000
right_boundary: 13084329
is_reverse: True
+ check_nm: 0.02
+ clip_5p_positions: [13075113]
PRAMEF13:
genes: PRAMEF13
realign_region: chr1:13191500-13208798
@@ -824,13 +811,15 @@ PRAMEF13:
left_boundary: 13191500
right_boundary: 13208797
is_reverse: True
+ noisy_region: [[13203300, 13203576]]
PRAMEF18:
genes: PRAMEF18
- realign_region: chr1:13220000-13234424
+ realign_region: chr1:13220900-13234424
extract_regions: chr1:13220000-13234424 chr1:13004385-13014535
left_boundary: 13220900
right_boundary: 13234424
is_reverse: True
+ clip_5p_positions: [13224057]
POTED:
genes: POTED
@@ -900,6 +889,7 @@ RGPD3:
left_boundary: 106402908
right_boundary: 106466000
is_reverse: True
+ noisy_region: [[106425157, 106425170]]
RHOXF2:
genes: RHOXF2,RHOXF2B
realign_region: chrX:120152121-120172121
@@ -995,6 +985,7 @@ TMLHE:
left_boundary: 155488513
right_boundary: 155510000
is_reverse: True
+ is_palindrome: True
TRIM49:
genes: TRIM49,TRIM49C
realign_region: chr11:89793115-89813115
@@ -1026,10 +1017,9 @@ U2AF1:
UPK3BL1:
genes: UPK3BL1,UPK3BL2
- realign_region: chr7:102629908-102649908
+ realign_region: chr7:102629908-102646500
extract_regions: chr7:102629909-102649908 chr7:102530825-102550815
- left_boundary: 102629910
- right_boundary: 102649907
+ noisy_region: [[102639951, 102640023]]
POLR2J2:
genes: POLR2J2,POLR2J3
realign_region: chr7:102658854-102678854
@@ -1047,12 +1037,11 @@ ZNF595:
FOXD4L4:
genes: FOXD4L4,FOXD4L5
- realign_region: chr9:65727771-65747771
- extract_regions: chr9:65727772-65747771 chr9:65273748-65293756 chr9:68293705-68301609
+ realign_region: chr9:65727771-65740000
+ extract_regions: chr9:65727772-65740000 chr9:65281523-65293756 chr9:68293705-68301609
left_boundary: 65727773
right_boundary: 65738784
is_reverse: True
- check_nm: 0.2
noisy_region: [[65735600, 65735970]]
FOXD4L3:
genes: FOXD4L3,FOXD4L6
@@ -1061,6 +1050,7 @@ FOXD4L3:
left_boundary: 68293977
right_boundary: 68313974
is_reverse: True
+ check_nm: 0.012
GTF2IRD2B:
genes: GTF2IRD2B,GTF2IRD2
@@ -1086,11 +1076,10 @@ ICOSLG:
expect_cn2: True
MAGED4:
genes: MAGED4,MAGED4B
- realign_region: chrX:52178578-52198578
+ realign_region: chrX:52182000-52198578
extract_regions: chrX:52178579-52198578 chrX:52055517-52075516
- left_boundary: 52178580
- right_boundary: 52198577
is_reverse: True
+ is_palindrome: True
CBS:
genes: CBS
realign_region: chr21:43052691-43076335
@@ -1127,12 +1116,14 @@ CASTOR2:
left_boundary: 74965884
right_boundary: 75032028
clip_3p_positions: [74984489]
+ noisy_region: [[74976525, 74976574]]
CLEC18C:
genes: CLEC18A,CLEC18B,CLEC18C
- realign_region: chr16:70170341-70190341
+ realign_region: chr16:70170341-70187341
extract_regions: chr16:70170342-70190341 chr16:69947429-69967427 chr16:74405201-74425232
- left_boundary: 70170343
- right_boundary: 70190340
+ clip_5p_positions: [70180660]
+ noisy_region: [[70171285, 70171320], [70174152, 70174163]]
+ use_supplementary: True
GTF2H2:
genes: GTF2H2,GTF2H2C
realign_region: chr5:71034847-71068176
@@ -1160,6 +1151,7 @@ NBPF4:
left_boundary: 108223000
right_boundary: 108244575
clip_5p_positions: [108236178]
+ noisy_region: [[108231294, 108231299]]
PPIAL4D:
genes: PPIAL4D,PPIAL4E,PPIAL4F
realign_region: chr1:145231794-145251794
@@ -1180,34 +1172,30 @@ FAM72A:
right_boundary: 206204508
GOLGA8N:
genes: GOLGA8N,GOLGA8O,GOLGA8Q,GOLGA8R
- realign_region: chr15:32590346-32610346
+ realign_region: chr15:32590346-32607237
extract_regions: chr15:32590347-32607507 chr15:30400208-30417365 chr15:30548871-30566038 chr15:32441569-32458746
left_boundary: 32590348
right_boundary: 32606000
+ check_nm: 0.02
POTEB:
- genes: POTEB,POTEB2,POTE3
+ genes: POTEB,POTEB2,POTEB3
realign_region: chr15:21845829-21878235
extract_regions: chr15:21845830-21878235 chr15:20834873-20867269 chr15:21408465-21439843 chr15_KI270727v1_random:135661-167032
left_boundary: 21845831
right_boundary: 21877188
-SPATA31A1:
- genes: SPATA31A1,SPATA31A3,SPATA31A5,SPATA31A7
- realign_region: chr9:39348815-39368815
- extract_regions: chr9:39348816-39368815 chr9:60907506-60927532 chr9:61184485-61203156 chr9:66979422-66998101
- left_boundary: 39350176
- right_boundary: 39368814
NUTM2A:
genes: NUTM2A,NUTM2B,NUTM2D,NUTM2E
realign_region: chr10:87220213-87240213
extract_regions: chr10:87220214-87240213 chr10:79697989-79718012 chr10:87354483-87369657 chr10:79838172-79853341
- left_boundary: 87222272
- right_boundary: 87237439
+ clip_5p_positions: [87222271]
+ clip_3p_positions: [87236514, 87237440]
ANKRD20A1:
genes: ANKRD20A1
realign_region: chr9:67858431-67902894
extract_regions: chr9:67858432-67902894 chr9:40223065-40266892 chr9:64368680-64413189 chr9:66109347-66153849
left_boundary: 67859093
right_boundary: 67902893
+ noisy_region: [[67859646, 67859662]]
GRAPL:
genes: GRAPL,GRAP
realign_region: chr17:19127089-19159687
@@ -1222,12 +1210,14 @@ ARL17A:
extract_regions: chr17:46552257-46580191 chr17:46334404-46362266
left_boundary: 46552258
right_boundary: 46580190
+ clip_3p_positions: [46565080]
NSF:
genes: NSF
realign_region: chr17:46590169-46757964
extract_regions: chr17:46590170-46707123 chr17:46372249-46489410
left_boundary: 46590171
right_boundary: 46707122
+ noisy_region: [[46609984, 46610171]]
AMY1A:
genes: AMY1A,AMY1B,AMY1C
@@ -1247,6 +1237,7 @@ CTAG1A:
right_boundary: 154591300
clip_3p_positions: [154591326]
is_reverse: True
+ is_palindrome: True
add_sites: ["154591380_A_G"]
BOLA2:
genes: BOLA2,BOLA2B,SLX1A,SLX1B,SULT1A3,SULT1A4
@@ -1256,4 +1247,61 @@ BOLA2:
right_boundary: 29470000
clip_5p_positions: [29449191]
add_sites: ["29449010_T_G"]
-
+
+CYP2D6:
+ genes: CYP2D6
+ realign_region: chr22:42122800-42132500
+ extract_regions: chr22:42123196-42145723
+ gene2_region: chr22:42135344-42145873
+ use_r2k: True
+ is_tandem: True
+ check_nm: 0.07
+ noisy_region: [[42125955, 42126005], [42132023, 42132051]]
+ clip_5p_positions: [42123192]
+ clip_3p_positions: [42132193]
+ call_fusion: 5p
+GBA:
+ genes: GBA1
+ realign_region: chr1:155230000-155242500
+ extract_regions: chr1:155210828-155219269 chr1:155230976-155241500
+ gene2_region: chr1:155210380-155219657
+ check_nm: 0.2
+ use_r2k: True
+ noisy_region: [[155236583, 155237172], [155238947, 155239582], [155240160, 155240489], [155241400, 155241900]]
+ use_supplementary: True
+ clip_5p_positions: [155230976]
+ clip_3p_positions: [155242229]
+ call_fusion: 3p
+CYP11B1:
+ genes: CYP11B1,CYP11B2
+ realign_region: chr8:142872000-142880500
+ extract_regions: chr8:142872000-142880500 chr8:142910559-142917843
+ gene2_region: chr8:142910760-142917977
+ check_nm: 0.2
+ use_r2k: True
+ noisy_region: [[142874825, 142874950]]
+ clip_5p_positions: [142873164]
+ clip_3p_positions: [142879950]
+ call_fusion: 5p
+CFH:
+ genes: CFH,CFHR1
+ realign_region: chr1:196740000-196772000
+ extract_regions: chr1:196740000-196772000 chr1:196827188-196855916
+ gene2_region: chr1:196827188-196855916
+ check_nm: 0.1
+ use_r2k: True
+ clip_5p_positions: [196742575]
+ clip_3p_positions: [196771224]
+ noisy_region: [[196748050, 196748125], [196768556, 196768896]]
+ call_fusion: 5p
+CFHR3:
+ genes: CFHR3,CFHR4,CFHR1,CFHR2
+ realign_region: chr1:196786000-196830000
+ extract_regions: chr1:196786000-196830000 chr1:196911478-196948804
+ gene2_region: chr1:196911478-196948804
+ check_nm: 0.1
+ use_r2k: True
+ clip_5p_positions: [196786955]
+ clip_3p_positions: [196827189]
+ noisy_region: [[196809513, 196812537], [196817000, 196817020], [196826466, 196826515]]
+ call_fusion: 5p
diff --git a/paraphase/data/38/fusion_genes.json b/paraphase/data/38/fusion_genes.json
new file mode 100644
index 0000000..d67880c
--- /dev/null
+++ b/paraphase/data/38/fusion_genes.json
@@ -0,0 +1,1760 @@
+{
+ "CFH": [
+ "196742643_A_C",
+ "196742650_G_A",
+ "196742668_T_C",
+ "196742669_C_A",
+ "196742671_A_G",
+ "196742681_G_A",
+ "196742683_C_G",
+ "196742686_A_G",
+ "196742689_G_A",
+ "196742692_A_G",
+ "196742702_A_T",
+ "196742716_A_C",
+ "196742734_T_G",
+ "196742735_T_C",
+ "196742743_C_T",
+ "196742745_A_T",
+ "196742784_T_C",
+ "196742785_A_C",
+ "196742822_A_G",
+ "196742855_G_A",
+ "196742875_G_A",
+ "196742884_A_T",
+ "196742885_A_T",
+ "196742886_A_T",
+ "196742887_A_T",
+ "196742909_G_A",
+ "196742933_C_T",
+ "196742938_T_A",
+ "196742939_T_C",
+ "196742972_T_A",
+ "196742986_G_A",
+ "196743006_A_T",
+ "196743258_C_A",
+ "196743269_T_G",
+ "196743392_C_T",
+ "196743447_T_C",
+ "196743456_C_T",
+ "196743799_A_G",
+ "196743809_C_T",
+ "196743810_C_G",
+ "196743867_T_C",
+ "196744097_A_T",
+ "196744105_A_G",
+ "196744107_A_G",
+ "196744142_C_T",
+ "196744168_T_G",
+ "196744312_T_A",
+ "196744323_A_C",
+ "196744350_T_C",
+ "196744358_G_A",
+ "196744373_C_T",
+ "196744379_T_C",
+ "196744449_C_T",
+ "196744474_C_G",
+ "196744485_T_C",
+ "196744497_C_T",
+ "196744579_T_A",
+ "196744598_C_G",
+ "196744623_C_T",
+ "196744633_G_A",
+ "196744652_C_G",
+ "196744654_A_C",
+ "196744670_A_G",
+ "196744679_T_A",
+ "196744694_C_T",
+ "196744709_G_A",
+ "196744746_C_G",
+ "196744765_G_T",
+ "196744776_C_A",
+ "196744780_T_C",
+ "196744781_T_G",
+ "196744802_A_G",
+ "196744894_C_T",
+ "196744901_A_T",
+ "196744923_G_A",
+ "196744928_G_T",
+ "196744930_T_C",
+ "196744934_C_T",
+ "196745003_C_T",
+ "196745031_C_T",
+ "196745033_G_C",
+ "196745059_G_A",
+ "196745202_A_G",
+ "196745209_G_T",
+ "196745236_A_C",
+ "196745290_G_T",
+ "196745347_T_C",
+ "196745458_G_A",
+ "196745486_T_C",
+ "196745517_C_T",
+ "196745563_C_T",
+ "196745578_G_A",
+ "196745628_C_A",
+ "196745668_G_A",
+ "196745698_C_T",
+ "196745731_C_A",
+ "196745745_C_T",
+ "196745746_G_A",
+ "196745787_C_T",
+ "196745824_A_G",
+ "196746056_C_T",
+ "196746057_A_G",
+ "196746063_T_G",
+ "196746075_C_A",
+ "196746110_T_C",
+ "196746122_T_A",
+ "196746129_G_T",
+ "196746131_C_T",
+ "196746429_T_C",
+ "196746434_A_G",
+ "196746507_C_A",
+ "196746518_C_T",
+ "196746532_T_C",
+ "196746552_T_C",
+ "196746577_A_T",
+ "196746695_A_G",
+ "196746706_A_G",
+ "196746845_C_T",
+ "196746919_C_A",
+ "196746920_G_A",
+ "196746948_C_T",
+ "196746959_C_T",
+ "196747189_C_T",
+ "196747207_T_C",
+ "196747327_G_A",
+ "196747368_C_T",
+ "196747383_T_A",
+ "196747493_C_T",
+ "196747548_C_T",
+ "196747592_C_T",
+ "196747623_C_T",
+ "196747634_A_T",
+ "196747638_T_C",
+ "196747644_G_A",
+ "196747713_C_T",
+ "196747744_T_A",
+ "196747772_G_A",
+ "196747787_A_T",
+ "196747830_T_A",
+ "196747844_G_A",
+ "196747873_G_A",
+ "196747886_A_T",
+ "196747929_C_T",
+ "196748049_A_T",
+ "196748063_G_A",
+ "196748065_G_A",
+ "196748067_G_A",
+ "196748069_G_A",
+ "196748071_G_A",
+ "196748136_C_G",
+ "196748138_A_C",
+ "196748683_T_A",
+ "196748694_T_C",
+ "196748697_G_A",
+ "196748706_C_T",
+ "196748737_A_G",
+ "196749020_C_T",
+ "196749027_C_T",
+ "196749048_C_T",
+ "196749055_C_T",
+ "196749059_T_A",
+ "196749214_T_C",
+ "196749265_G_A",
+ "196749337_A_C",
+ "196749426_A_G",
+ "196749433_A_G",
+ "196749543_G_T",
+ "196749619_C_T",
+ "196749767_T_A",
+ "196749943_A_G",
+ "196750133_G_A",
+ "196750174_G_A",
+ "196750181_A_G",
+ "196750185_T_C",
+ "196750211_C_T",
+ "196750326_A_G",
+ "196750338_T_C",
+ "196750339_G_A",
+ "196750372_C_T",
+ "196750373_A_C",
+ "196750384_T_C",
+ "196750435_T_G",
+ "196750470_C_T",
+ "196750488_T_G",
+ "196750538_T_C",
+ "196750550_G_C",
+ "196750564_G_A",
+ "196750571_T_C",
+ "196750588_T_G",
+ "196750589_A_G",
+ "196750619_T_C",
+ "196750799_G_A",
+ "196750885_G_A",
+ "196750934_G_A",
+ "196750959_A_G",
+ "196751221_T_A",
+ "196751233_G_A",
+ "196752104_T_C",
+ "196753075_C_T",
+ "196753257_T_C",
+ "196757557_G_T",
+ "196760029_A_G",
+ "196760068_T_C",
+ "196760172_T_G",
+ "196760178_G_A",
+ "196760320_T_A",
+ "196760324_G_T",
+ "196760848_G_A",
+ "196761010_C_T",
+ "196761133_A_G",
+ "196761496_G_A",
+ "196761502_A_T",
+ "196761523_T_A",
+ "196761587_C_T",
+ "196762181_C_T",
+ "196762194_A_T",
+ "196762199_C_T",
+ "196762308_T_C",
+ "196763475_T_C",
+ "196763548_T_C",
+ "196765131_G_A",
+ "196765214_A_T",
+ "196765248_C_T",
+ "196765777_C_T",
+ "196765871_T_C",
+ "196766078_T_C",
+ "196766480_T_A",
+ "196766481_G_A",
+ "196766522_C_T",
+ "196766748_T_C",
+ "196766759_A_G",
+ "196766843_G_A",
+ "196766874_C_T",
+ "196766894_A_G",
+ "196766942_G_A",
+ "196767010_C_T",
+ "196767050_G_A",
+ "196767256_C_T",
+ "196767283_G_A",
+ "196767303_G_A",
+ "196767877_G_A",
+ "196767896_T_C",
+ "196767966_C_T",
+ "196767996_A_G",
+ "196768009_T_G",
+ "196768011_A_G",
+ "196768095_A_G",
+ "196768106_A_G",
+ "196768107_C_T",
+ "196768109_T_A",
+ "196768128_T_C",
+ "196768161_A_C",
+ "196768204_G_C",
+ "196768217_C_A",
+ "196768232_A_C",
+ "196768239_C_T",
+ "196768281_T_A",
+ "196768285_A_G",
+ "196768286_G_T",
+ "196768292_A_G",
+ "196768308_C_G",
+ "196768323_G_C",
+ "196768338_C_G",
+ "196768368_G_A",
+ "196768395_C_T",
+ "196768417_C_T",
+ "196768446_T_A",
+ "196768457_C_T",
+ "196768458_A_G",
+ "196768461_A_G",
+ "196768472_A_C",
+ "196768475_A_C",
+ "196768489_C_G",
+ "196768518_G_A",
+ "196768521_A_G",
+ "196768548_G_T",
+ "196768571_A_T",
+ "196768903_T_G",
+ "196768917_A_G",
+ "196768928_G_C",
+ "196768932_G_C",
+ "196768944_G_A",
+ "196768947_C_T",
+ "196768959_G_A",
+ "196768961_A_G",
+ "196769009_T_G",
+ "196769013_C_A",
+ "196769064_T_C",
+ "196769068_T_C",
+ "196769069_A_C",
+ "196769080_G_A",
+ "196769102_C_T",
+ "196769107_A_C",
+ "196769112_A_G",
+ "196769128_T_C",
+ "196769184_G_A",
+ "196769185_T_G",
+ "196769196_G_T",
+ "196769216_A_G",
+ "196769256_G_A",
+ "196769269_C_T",
+ "196769294_T_C",
+ "196769316_A_G",
+ "196769320_T_C",
+ "196769327_C_T",
+ "196769428_A_T",
+ "196769443_T_C",
+ "196769477_A_C",
+ "196769490_A_G",
+ "196769494_C_T",
+ "196769504_G_A",
+ "196769510_G_A",
+ "196769546_C_T",
+ "196769631_G_C",
+ "196769657_A_T",
+ "196769723_G_A",
+ "196769734_C_T",
+ "196769768_T_C",
+ "196769770_A_G",
+ "196769777_C_T",
+ "196769791_A_T",
+ "196769792_A_T",
+ "196769839_G_A",
+ "196769840_C_T",
+ "196769856_C_T",
+ "196769889_C_T",
+ "196769988_T_C",
+ "196769992_C_T",
+ "196770016_T_C",
+ "196770028_C_A",
+ "196770062_C_T",
+ "196770072_C_T",
+ "196770197_G_A",
+ "196770203_A_G",
+ "196770213_A_G",
+ "196770225_A_G",
+ "196770226_G_A",
+ "196770240_C_T",
+ "196770270_C_T",
+ "196770271_A_G",
+ "196770276_C_G",
+ "196770290_G_C",
+ "196770293_C_T",
+ "196770340_G_A",
+ "196770351_C_T",
+ "196770353_T_C",
+ "196770380_G_A",
+ "196770385_G_T",
+ "196770389_T_G",
+ "196770437_T_C",
+ "196770470_T_C",
+ "196770535_A_C",
+ "196770567_T_G",
+ "196770584_T_C",
+ "196770613_T_C",
+ "196770614_A_T",
+ "196770659_C_T",
+ "196770741_A_G",
+ "196770747_G_T",
+ "196770750_C_A",
+ "196770752_C_T",
+ "196770756_A_G",
+ "196770785_T_C",
+ "196770800_C_T",
+ "196770816_C_T",
+ "196770817_G_A",
+ "196770846_T_A",
+ "196770918_G_C",
+ "196770954_C_G",
+ "196770975_T_C",
+ "196770982_G_A",
+ "196771089_C_A",
+ "196771159_C_A",
+ "196771163_G_A",
+ "196771183_A_T",
+ "196771188_T_C",
+ "196771214_G_T"
+ ],
+ "CFHR3": [
+ "196787004_C_G",
+ "196787016_C_T",
+ "196787027_T_C",
+ "196787036_A_G",
+ "196787046_A_T",
+ "196787055_A_G",
+ "196787072_T_C",
+ "196787084_A_C",
+ "196787106_G_A",
+ "196787107_C_T",
+ "196787120_G_A",
+ "196787121_A_T",
+ "196787132_C_T",
+ "196787139_T_C",
+ "196787153_C_T",
+ "196787157_A_C",
+ "196787160_A_T",
+ "196787163_A_G",
+ "196787166_G_A",
+ "196787197_C_A",
+ "196787230_A_G",
+ "196787286_G_A",
+ "196787304_G_A",
+ "196787328_T_C",
+ "196787465_G_A",
+ "196787537_G_A",
+ "196787539_G_A",
+ "196787562_G_A",
+ "196787743_A_G",
+ "196787748_T_C",
+ "196787805_G_T",
+ "196787846_T_G",
+ "196787919_T_A",
+ "196787932_G_T",
+ "196787941_C_G",
+ "196787958_C_T",
+ "196787965_T_C",
+ "196788005_C_T",
+ "196788171_C_G",
+ "196788180_T_A",
+ "196788198_T_C",
+ "196788199_G_T",
+ "196788349_A_T",
+ "196788488_A_C",
+ "196788672_G_A",
+ "196788675_T_A",
+ "196788685_A_G",
+ "196788698_A_T",
+ "196788708_T_C",
+ "196788714_C_T",
+ "196788763_G_T",
+ "196788771_G_A",
+ "196788832_G_A",
+ "196788948_A_G",
+ "196788977_A_C",
+ "196788981_T_C",
+ "196789036_G_T",
+ "196789050_A_T",
+ "196789126_G_T",
+ "196789144_T_C",
+ "196789145_G_T",
+ "196789163_A_G",
+ "196789205_T_A",
+ "196789233_T_C",
+ "196789275_A_T",
+ "196789290_T_C",
+ "196789293_A_G",
+ "196789342_T_C",
+ "196789358_T_C",
+ "196789364_G_C",
+ "196789403_A_G",
+ "196789486_G_A",
+ "196789604_A_T",
+ "196789735_A_G",
+ "196790505_C_T",
+ "196790508_T_C",
+ "196790552_C_T",
+ "196790555_A_C",
+ "196790608_C_T",
+ "196790641_T_C",
+ "196790671_T_C",
+ "196790716_C_T",
+ "196790882_A_T",
+ "196790955_T_G",
+ "196790979_C_A",
+ "196790988_T_C",
+ "196791005_A_G",
+ "196791016_C_T",
+ "196791020_A_C",
+ "196791027_C_A",
+ "196791043_T_G",
+ "196791051_C_T",
+ "196791163_T_C",
+ "196791267_C_T",
+ "196791332_G_A",
+ "196791366_T_C",
+ "196791409_C_T",
+ "196791425_A_C",
+ "196791652_A_G",
+ "196791815_T_C",
+ "196791828_C_A",
+ "196791831_A_G",
+ "196791924_T_C",
+ "196791936_T_C",
+ "196792069_T_C",
+ "196792138_C_T",
+ "196792578_T_C",
+ "196792634_C_T",
+ "196792668_T_C",
+ "196792737_T_G",
+ "196792756_C_T",
+ "196792760_T_C",
+ "196792773_A_G",
+ "196792810_C_T",
+ "196792818_C_T",
+ "196792857_G_T",
+ "196792866_T_C",
+ "196792951_G_A",
+ "196793044_A_G",
+ "196793052_A_G",
+ "196793190_G_A",
+ "196793278_A_G",
+ "196793279_G_C",
+ "196793282_A_G",
+ "196793361_A_C",
+ "196793374_G_A",
+ "196793383_G_T",
+ "196793444_A_G",
+ "196793478_C_A",
+ "196793486_G_C",
+ "196793670_A_G",
+ "196793684_A_G",
+ "196793737_T_C",
+ "196793753_T_C",
+ "196793784_C_T",
+ "196793786_G_A",
+ "196793811_T_C",
+ "196793847_T_C",
+ "196793870_C_A",
+ "196793884_A_G",
+ "196793902_G_A",
+ "196793931_T_C",
+ "196794037_T_A",
+ "196794045_G_A",
+ "196794060_T_C",
+ "196794112_T_C",
+ "196794163_T_C",
+ "196794214_T_C",
+ "196794246_A_T",
+ "196794261_A_G",
+ "196794273_T_G",
+ "196794274_T_C",
+ "196794280_C_T",
+ "196794320_C_G",
+ "196794326_G_C",
+ "196794339_T_C",
+ "196794340_G_A",
+ "196794412_A_G",
+ "196794865_G_A",
+ "196794885_G_A",
+ "196794918_G_T",
+ "196794922_G_A",
+ "196795134_T_C",
+ "196795155_A_C",
+ "196795199_G_C",
+ "196795200_G_A",
+ "196795219_C_T",
+ "196795238_A_G",
+ "196795269_A_G",
+ "196795365_T_C",
+ "196795660_G_A",
+ "196795728_C_T",
+ "196795745_G_A",
+ "196795768_T_G",
+ "196795806_T_C",
+ "196795818_C_T",
+ "196795832_G_A",
+ "196796708_T_C",
+ "196796712_C_T",
+ "196796865_A_C",
+ "196796962_A_G",
+ "196796989_A_T",
+ "196797024_C_T",
+ "196797027_A_T",
+ "196797043_C_T",
+ "196797044_A_G",
+ "196797054_T_G",
+ "196797066_G_A",
+ "196797116_C_T",
+ "196797117_C_A",
+ "196797239_C_A",
+ "196797254_T_A",
+ "196797322_C_A",
+ "196797323_A_G",
+ "196797500_G_T",
+ "196797580_C_T",
+ "196797604_G_A",
+ "196797609_A_C",
+ "196797624_G_A",
+ "196797694_T_A",
+ "196797913_T_C",
+ "196797973_G_C",
+ "196797976_G_A",
+ "196798002_A_T",
+ "196798077_G_A",
+ "196798081_C_T",
+ "196798114_T_G",
+ "196798127_T_A",
+ "196798138_T_C",
+ "196798185_C_A",
+ "196798205_G_C",
+ "196798283_C_T",
+ "196798284_C_A",
+ "196798287_T_C",
+ "196798342_G_A",
+ "196798622_G_A",
+ "196798644_T_C",
+ "196798661_A_G",
+ "196798737_T_G",
+ "196798771_A_C",
+ "196798977_T_A",
+ "196798984_A_T",
+ "196799048_C_T",
+ "196799054_G_A",
+ "196799066_T_C",
+ "196799356_T_G",
+ "196799357_C_T",
+ "196799371_A_G",
+ "196799382_C_T",
+ "196799392_T_C",
+ "196799454_T_C",
+ "196799461_C_T",
+ "196799470_C_T",
+ "196799499_A_G",
+ "196799534_T_C",
+ "196799543_G_C",
+ "196799547_G_A",
+ "196799609_C_A",
+ "196799646_C_T",
+ "196799668_G_A",
+ "196799697_A_G",
+ "196799855_G_A",
+ "196799899_T_C",
+ "196799902_G_T",
+ "196800265_C_T",
+ "196800272_G_A",
+ "196800441_T_A",
+ "196800488_T_G",
+ "196800548_C_T",
+ "196800601_C_T",
+ "196800714_C_G",
+ "196800734_C_T",
+ "196800758_G_C",
+ "196801118_A_T",
+ "196801393_A_T",
+ "196801395_T_C",
+ "196801537_T_C",
+ "196801550_T_C",
+ "196801559_A_C",
+ "196801803_C_G",
+ "196801806_G_A",
+ "196801837_C_A",
+ "196801839_G_C",
+ "196801855_G_A",
+ "196801856_G_T",
+ "196801887_T_C",
+ "196801982_G_A",
+ "196802017_A_G",
+ "196802019_G_A",
+ "196802039_C_T",
+ "196802151_A_G",
+ "196802261_T_G",
+ "196802268_A_T",
+ "196802527_A_G",
+ "196802652_C_T",
+ "196802668_T_C",
+ "196802828_A_G",
+ "196803040_T_C",
+ "196803044_C_G",
+ "196803203_T_A",
+ "196803212_C_A",
+ "196803322_T_C",
+ "196803323_A_C",
+ "196803370_T_C",
+ "196803417_C_A",
+ "196803454_C_T",
+ "196803512_T_C",
+ "196803788_C_T",
+ "196803981_T_G",
+ "196804513_T_C",
+ "196804654_C_A",
+ "196804705_C_T",
+ "196804760_T_C",
+ "196804811_T_C",
+ "196806211_C_T",
+ "196807040_C_T",
+ "196807070_C_A",
+ "196807233_G_A",
+ "196807258_C_T",
+ "196807344_C_G",
+ "196807368_G_A",
+ "196807372_T_A",
+ "196807446_G_A",
+ "196807460_C_A",
+ "196807532_G_C",
+ "196807684_C_G",
+ "196807731_A_G",
+ "196807757_G_A",
+ "196807772_G_A",
+ "196807787_G_T",
+ "196807812_A_T",
+ "196807855_C_A",
+ "196808022_C_T",
+ "196808146_G_A",
+ "196808196_T_C",
+ "196808606_A_T",
+ "196808631_G_A",
+ "196808686_C_T",
+ "196808698_C_T",
+ "196808723_A_G",
+ "196808762_A_G",
+ "196808767_T_C",
+ "196808783_C_T",
+ "196808792_G_A",
+ "196808804_G_C",
+ "196808811_G_A",
+ "196808826_A_G",
+ "196808871_A_G",
+ "196808883_G_T",
+ "196808890_A_G",
+ "196808896_A_G",
+ "196808915_A_C",
+ "196808918_A_G",
+ "196808924_A_G",
+ "196808929_G_A",
+ "196808954_G_T",
+ "196808973_G_T",
+ "196808974_A_T",
+ "196808977_C_T",
+ "196808996_A_G",
+ "196809008_C_T",
+ "196809009_A_G",
+ "196809015_C_T",
+ "196809019_C_T",
+ "196809025_C_T",
+ "196809026_C_G",
+ "196809027_A_G",
+ "196809093_G_A",
+ "196809099_A_G",
+ "196809129_A_G",
+ "196809134_C_A",
+ "196809135_A_G",
+ "196809142_G_C",
+ "196809154_C_T",
+ "196809192_G_A",
+ "196809202_T_C",
+ "196809203_G_A",
+ "196809236_C_G",
+ "196809254_G_C",
+ "196809258_C_G",
+ "196809295_C_T",
+ "196809346_C_T",
+ "196809377_A_C",
+ "196809403_A_T",
+ "196809420_T_C",
+ "196809451_A_C",
+ "196809485_A_C",
+ "196809512_A_C",
+ "196812600_G_A",
+ "196812638_G_T",
+ "196812656_G_A",
+ "196813342_A_G",
+ "196813817_G_T",
+ "196815503_C_T",
+ "196815994_C_T",
+ "196816058_A_G",
+ "196816203_G_A",
+ "196816237_C_T",
+ "196816363_G_A",
+ "196816568_A_G",
+ "196816577_G_A",
+ "196816614_G_C",
+ "196816855_A_G",
+ "196816990_A_G",
+ "196817033_G_A",
+ "196817060_C_T",
+ "196817090_T_A",
+ "196817126_A_T",
+ "196817332_T_C",
+ "196817333_G_A",
+ "196817343_G_A",
+ "196817352_C_T",
+ "196817365_G_A",
+ "196817391_A_G",
+ "196817413_G_A",
+ "196817459_C_T",
+ "196817466_A_T",
+ "196817492_G_A",
+ "196817506_G_C",
+ "196817600_T_C",
+ "196817641_G_A",
+ "196817736_A_G",
+ "196817766_C_T",
+ "196817898_C_T",
+ "196817960_A_G",
+ "196817991_A_G",
+ "196818007_C_T",
+ "196818152_T_A",
+ "196818167_A_G",
+ "196818182_T_A",
+ "196818232_T_G",
+ "196818363_T_G",
+ "196818439_A_T",
+ "196818456_G_A",
+ "196818487_T_C",
+ "196818516_C_T",
+ "196818599_A_G",
+ "196818637_C_T",
+ "196818648_C_T",
+ "196818693_T_C",
+ "196818747_A_G",
+ "196818757_T_C",
+ "196818767_A_T",
+ "196818773_A_C",
+ "196818816_A_G",
+ "196819171_C_T",
+ "196819227_C_T",
+ "196820632_A_T",
+ "196821197_T_C",
+ "196822014_A_C",
+ "196822091_C_T",
+ "196822105_A_G",
+ "196822374_A_G",
+ "196822560_C_T",
+ "196822570_G_A",
+ "196822584_C_T",
+ "196822613_C_T",
+ "196822635_G_A",
+ "196822642_C_G",
+ "196822695_A_G",
+ "196822713_C_T",
+ "196822734_T_C",
+ "196822757_T_C",
+ "196822760_A_G",
+ "196822836_A_G",
+ "196822853_C_T",
+ "196822911_G_A",
+ "196822935_T_C",
+ "196822952_G_A",
+ "196822959_A_T",
+ "196822966_C_T",
+ "196823027_T_C",
+ "196823030_A_G",
+ "196823072_A_T",
+ "196823079_G_A",
+ "196823149_G_A",
+ "196823156_C_T",
+ "196823169_A_G",
+ "196823173_A_G",
+ "196823207_A_G",
+ "196823305_A_G",
+ "196823308_C_A",
+ "196823309_T_A",
+ "196823379_G_A",
+ "196823431_T_C",
+ "196823433_G_A",
+ "196823483_C_T",
+ "196823533_A_G",
+ "196823539_G_C",
+ "196823549_G_A",
+ "196823596_T_C",
+ "196823618_C_G",
+ "196823652_T_C",
+ "196823688_G_A",
+ "196823705_A_C",
+ "196823749_A_C",
+ "196823783_C_T",
+ "196823849_G_A",
+ "196823863_A_G",
+ "196823933_C_T",
+ "196823948_C_T",
+ "196823952_A_T",
+ "196823959_G_A",
+ "196823970_A_T",
+ "196823997_T_A",
+ "196824011_A_G",
+ "196824061_A_T",
+ "196824063_A_T",
+ "196824066_T_C",
+ "196824075_A_C",
+ "196824080_G_A",
+ "196824092_T_A",
+ "196824107_G_C",
+ "196824119_C_T",
+ "196824145_G_T",
+ "196824160_C_T",
+ "196824182_C_T",
+ "196824183_T_G",
+ "196824189_G_C",
+ "196824197_G_C",
+ "196824204_G_A",
+ "196824218_C_T",
+ "196824229_C_T",
+ "196824241_T_A",
+ "196824279_G_A",
+ "196824306_C_G",
+ "196824308_A_T",
+ "196824340_C_A",
+ "196824384_G_A",
+ "196824396_C_T",
+ "196824402_C_T",
+ "196824450_G_C",
+ "196824455_A_G",
+ "196824481_T_C",
+ "196824498_C_T",
+ "196824503_T_C",
+ "196824504_C_T",
+ "196824537_C_T",
+ "196824552_T_C",
+ "196824555_G_A",
+ "196824586_G_T",
+ "196824618_G_A",
+ "196824621_A_C",
+ "196824642_C_T",
+ "196824672_A_G",
+ "196824723_C_G",
+ "196824726_A_T",
+ "196824734_G_C",
+ "196824800_T_A",
+ "196824801_C_T",
+ "196824919_C_T",
+ "196824922_A_C",
+ "196824964_A_G",
+ "196824991_A_C",
+ "196825002_C_A",
+ "196825003_A_C",
+ "196825006_T_C",
+ "196825018_A_T",
+ "196825054_C_T",
+ "196825055_A_G",
+ "196825060_C_T",
+ "196825068_G_A",
+ "196825120_A_G",
+ "196825126_T_C",
+ "196825127_T_C",
+ "196825142_T_A",
+ "196825164_C_G",
+ "196825177_T_C",
+ "196825209_A_C",
+ "196825222_A_T",
+ "196825261_A_G",
+ "196825299_A_G",
+ "196825307_C_T",
+ "196825338_T_G",
+ "196825347_A_G",
+ "196825416_A_G",
+ "196825437_A_G",
+ "196825462_A_G",
+ "196825480_C_T",
+ "196825481_A_G",
+ "196825484_T_C",
+ "196825550_G_A",
+ "196825613_A_C",
+ "196825635_A_G",
+ "196825710_G_A",
+ "196825715_T_C",
+ "196825745_G_A",
+ "196825746_G_C",
+ "196825747_T_A",
+ "196825763_G_A",
+ "196825764_A_G",
+ "196825774_A_G",
+ "196825777_A_G",
+ "196825804_C_T",
+ "196825805_C_T",
+ "196825819_C_G",
+ "196825827_A_G",
+ "196825863_T_C",
+ "196825864_G_A",
+ "196825870_T_C",
+ "196825877_G_T",
+ "196825898_T_C",
+ "196825916_C_G",
+ "196825917_T_A",
+ "196825961_C_G",
+ "196825977_A_T",
+ "196826026_C_A",
+ "196826035_A_T",
+ "196826043_G_A",
+ "196826044_A_C",
+ "196826102_C_A",
+ "196826119_T_C",
+ "196826120_T_C",
+ "196826177_A_C",
+ "196826180_T_C",
+ "196826194_C_G",
+ "196826272_A_T",
+ "196826276_G_T",
+ "196826301_C_T",
+ "196826317_C_T",
+ "196826336_C_T",
+ "196826339_G_C",
+ "196826342_A_C",
+ "196826364_A_G",
+ "196826375_A_G",
+ "196826412_T_A",
+ "196826415_G_C",
+ "196826432_G_A",
+ "196826520_A_G",
+ "196826544_A_G",
+ "196826546_A_G",
+ "196826591_T_C",
+ "196826598_G_A",
+ "196826656_G_A",
+ "196826663_A_T",
+ "196826666_T_C",
+ "196826676_C_A",
+ "196826686_G_A",
+ "196826697_G_A",
+ "196826729_A_C",
+ "196826730_G_T",
+ "196826746_G_T",
+ "196826747_G_A",
+ "196826804_A_C",
+ "196826805_C_T",
+ "196826813_C_T",
+ "196826824_C_T",
+ "196826905_G_A",
+ "196826983_C_T",
+ "196827005_G_A",
+ "196827012_A_G",
+ "196827018_C_T",
+ "196827029_T_G",
+ "196827052_T_G",
+ "196827056_G_A",
+ "196827105_T_C",
+ "196827125_T_C",
+ "196827143_C_T",
+ "196827154_T_G",
+ "196827155_G_C",
+ "196827161_C_T"
+ ],
+ "GBA": [
+ "155231032_G_A",
+ "155231051_A_G",
+ "155231061_G_A",
+ "155231064_G_A",
+ "155231089_C_T",
+ "155231095_G_A",
+ "155231096_A_C",
+ "155231110_T_G",
+ "155231111_C_T",
+ "155231127_T_C",
+ "155231128_T_C",
+ "155231135_G_A",
+ "155231146_C_T",
+ "155231159_T_C",
+ "155231181_C_T",
+ "155231188_A_T",
+ "155231224_G_A",
+ "155231230_T_C",
+ "155231274_G_A",
+ "155231298_T_C",
+ "155231347_T_C",
+ "155231353_G_T",
+ "155231410_C_A",
+ "155231412_G_A",
+ "155231415_C_T",
+ "155231463_T_G",
+ "155231496_C_G",
+ "155231522_A_G",
+ "155231557_T_C",
+ "155231692_G_A",
+ "155231693_C_T",
+ "155231701_T_C",
+ "155231834_G_A",
+ "155231848_A_C",
+ "155231857_T_C",
+ "155231870_A_G",
+ "155231923_T_A",
+ "155231936_A_G",
+ "155231937_G_A",
+ "155231941_A_G",
+ "155231942_G_A",
+ "155231989_G_A",
+ "155232231_G_A",
+ "155232340_C_T",
+ "155232440_T_C",
+ "155232447_A_G",
+ "155232451_A_G",
+ "155232501_C_G",
+ "155232502_C_G",
+ "155232504_A_G",
+ "155232509_T_C",
+ "155232524_A_T",
+ "155232530_C_G",
+ "155232532_A_T",
+ "155232540_G_C",
+ "155232549_C_G",
+ "155232550_G_A",
+ "155232724_C_T",
+ "155232784_A_G",
+ "155232834_A_C",
+ "155232892_G_T",
+ "155232893_A_T",
+ "155232916_G_C",
+ "155232919_A_G",
+ "155232927_G_A",
+ "155232935_A_G",
+ "155232982_G_A",
+ "155233046_A_G",
+ "155233268_C_G",
+ "155233287_G_A",
+ "155233514_A_G",
+ "155233517_C_T",
+ "155233521_T_A",
+ "155233531_G_A",
+ "155233541_G_T",
+ "155233543_G_A",
+ "155233549_C_T",
+ "155233639_G_A",
+ "155234903_C_T",
+ "155235203_C_G",
+ "155235217_C_G",
+ "155235252_A_G",
+ "155235379_A_G",
+ "155235412_G_A",
+ "155235727_C_G",
+ "155235918_A_G",
+ "155236097_G_C",
+ "155236102_T_C",
+ "155236129_T_C",
+ "155236145_G_C",
+ "155236175_G_A",
+ "155236190_A_G",
+ "155236195_G_C",
+ "155236199_G_T",
+ "155236200_A_G",
+ "155236213_G_C",
+ "155236245_C_T",
+ "155236320_G_A",
+ "155236366_C_T",
+ "155236379_C_T",
+ "155236431_G_A",
+ "155236443_T_C",
+ "155236483_G_A",
+ "155236487_C_G",
+ "155236489_G_C",
+ "155236491_T_C",
+ "155236500_T_C",
+ "155236504_A_C",
+ "155236532_G_C",
+ "155236549_G_A",
+ "155236552_G_A",
+ "155236557_A_G",
+ "155236558_C_T",
+ "155236575_G_C",
+ "155237176_A_G",
+ "155237187_C_T",
+ "155237188_G_C",
+ "155237189_A_C",
+ "155237259_G_A",
+ "155237265_G_C",
+ "155237284_G_A",
+ "155237412_T_C",
+ "155237419_G_A",
+ "155237607_G_A",
+ "155237982_T_C",
+ "155237989_G_A",
+ "155237990_C_T",
+ "155237996_G_T",
+ "155237997_C_T",
+ "155238088_T_C",
+ "155238092_T_G",
+ "155238141_A_T",
+ "155238174_C_T",
+ "155238192_A_G",
+ "155238206_A_C",
+ "155238214_A_C",
+ "155238754_G_A",
+ "155238882_A_G",
+ "155238929_A_T",
+ "155238930_G_T",
+ "155239597_T_C",
+ "155239615_C_T",
+ "155239633_G_C",
+ "155239650_T_C",
+ "155239683_A_G",
+ "155239687_G_C",
+ "155239689_A_T",
+ "155239700_T_C",
+ "155239750_G_A",
+ "155239796_A_G",
+ "155239802_C_G",
+ "155239831_A_G",
+ "155239870_G_A",
+ "155239893_C_G",
+ "155239897_T_C",
+ "155239898_G_A",
+ "155239908_G_T",
+ "155239914_C_T",
+ "155239917_C_T",
+ "155239918_A_G",
+ "155239934_G_A",
+ "155239935_T_A",
+ "155239936_C_T",
+ "155239946_G_T",
+ "155239948_G_C",
+ "155239950_A_G",
+ "155239955_C_T",
+ "155239956_A_G",
+ "155239970_T_C",
+ "155239974_A_T",
+ "155240031_C_G",
+ "155240048_C_T",
+ "155240074_G_C",
+ "155240093_T_C",
+ "155240114_T_C",
+ "155240115_G_A",
+ "155240117_A_G",
+ "155240120_C_T",
+ "155240121_A_G",
+ "155240129_T_C",
+ "155240137_T_G",
+ "155240141_C_T",
+ "155240143_A_T",
+ "155240156_C_G",
+ "155240629_C_T",
+ "155240699_T_C",
+ "155240701_A_G",
+ "155240752_C_T",
+ "155240779_T_C",
+ "155240815_C_T",
+ "155241114_C_A",
+ "155241126_T_C",
+ "155241134_A_G",
+ "155241142_C_G",
+ "155241212_T_A",
+ "155241233_G_C",
+ "155241234_C_T",
+ "155241249_C_T",
+ "155241288_C_A",
+ "155241316_C_T",
+ "155241317_A_G",
+ "155241351_T_C",
+ "155241352_C_G",
+ "155241354_T_G",
+ "155241355_C_T",
+ "155241370_T_C",
+ "155241371_G_A",
+ "155241382_C_T",
+ "155241392_C_T",
+ "155241395_G_A",
+ "155241397_A_G",
+ "155241521_A_T",
+ "155241924_A_G",
+ "155241930_G_T",
+ "155241945_C_A",
+ "155241968_C_A",
+ "155241971_A_G",
+ "155241979_C_G",
+ "155241982_C_T",
+ "155241991_C_T",
+ "155241992_A_G",
+ "155241993_A_T",
+ "155241995_G_A",
+ "155242002_C_T",
+ "155242015_G_T",
+ "155242072_T_A",
+ "155242094_C_T",
+ "155242102_T_C",
+ "155242103_G_A",
+ "155242107_G_A",
+ "155242164_G_A",
+ "155242176_A_G",
+ "155242178_C_A",
+ "155242185_T_G",
+ "155242190_A_G",
+ "155242202_C_T",
+ "155242209_C_G"
+ ],
+ "CYP2D6": [
+ "42123196_G_A",
+ "42123210_A_G",
+ "42123213_C_T",
+ "42123231_G_A",
+ "42125946_A_G",
+ "42126101_G_A",
+ "42126633_C_G",
+ "42126635_T_G",
+ "42126636_G_A",
+ "42126658_A_G",
+ "42126660_T_C",
+ "42126663_G_C",
+ "42126667_C_G",
+ "42127634_C_A",
+ "42127696_T_C",
+ "42127707_A_G",
+ "42127718_G_A",
+ "42127721_C_T",
+ "42127734_T_G",
+ "42127740_T_C",
+ "42127743_A_C",
+ "42127753_T_G",
+ "42127755_G_A",
+ "42127774_A_T",
+ "42127778_T_C",
+ "42127779_T_C",
+ "42127782_T_G",
+ "42127791_G_A",
+ "42127792_C_G",
+ "42127811_G_A",
+ "42127813_C_T",
+ "42127821_C_T",
+ "42127824_T_G",
+ "42127825_T_G",
+ "42127826_T_C",
+ "42127832_A_C",
+ "42127853_G_A",
+ "42127855_A_G",
+ "42127916_G_A",
+ "42127917_A_G",
+ "42127973_T_C",
+ "42128001_C_G",
+ "42128012_C_T",
+ "42128181_A_T",
+ "42128185_C_T",
+ "42128321_A_G",
+ "42128406_A_C",
+ "42128433_T_A",
+ "42128542_T_C",
+ "42128543_G_A",
+ "42128555_A_G",
+ "42128706_T_C",
+ "42128711_C_G",
+ "42129036_A_C",
+ "42129037_G_T",
+ "42129042_T_C",
+ "42129193_T_C",
+ "42129208_C_G",
+ "42129213_G_T",
+ "42129225_A_C",
+ "42129228_A_G",
+ "42129237_A_C",
+ "42129241_A_G",
+ "42129260_T_C",
+ "42129269_T_A",
+ "42129287_G_A",
+ "42129290_T_C",
+ "42129292_T_G",
+ "42129303_T_G",
+ "42129322_T_C",
+ "42129323_G_T",
+ "42129326_T_C",
+ "42129331_C_A",
+ "42129343_G_C",
+ "42129355_C_T",
+ "42129358_T_G",
+ "42129388_G_A",
+ "42129394_G_C",
+ "42129401_G_A",
+ "42129422_C_T",
+ "42129423_G_C",
+ "42129426_A_G",
+ "42129439_T_C",
+ "42129440_T_C",
+ "42129441_A_G",
+ "42129447_C_T",
+ "42129451_T_C",
+ "42129508_A_G",
+ "42129513_T_C",
+ "42129514_T_C",
+ "42129520_C_T",
+ "42129524_C_G",
+ "42129526_A_G",
+ "42129532_C_G",
+ "42129539_G_A",
+ "42129564_C_A",
+ "42129575_C_A",
+ "42129578_T_G",
+ "42129579_G_A",
+ "42129583_A_G",
+ "42129592_T_C",
+ "42129593_A_C",
+ "42129608_T_C",
+ "42129615_G_A",
+ "42129642_T_C",
+ "42129648_C_G",
+ "42129655_G_A",
+ "42129662_C_G",
+ "42129665_T_G",
+ "42129668_C_G",
+ "42129670_A_T",
+ "42129674_C_T",
+ "42129677_A_G",
+ "42129687_T_C",
+ "42129688_C_T",
+ "42129689_A_G",
+ "42129690_C_T",
+ "42129700_T_G",
+ "42129713_T_G",
+ "42129731_T_C",
+ "42129757_A_G",
+ "42129765_T_C",
+ "42129771_T_A",
+ "42129779_A_G",
+ "42130469_C_T",
+ "42130475_T_C",
+ "42130538_C_A",
+ "42130690_T_C",
+ "42130715_C_T",
+ "42130792_A_G",
+ "42130793_C_G",
+ "42130809_A_C",
+ "42131016_T_C",
+ "42131023_C_G",
+ "42131058_C_G",
+ "42131059_C_T",
+ "42131063_G_A",
+ "42131066_G_A",
+ "42131067_G_A",
+ "42131111_T_C",
+ "42131112_G_C",
+ "42131118_T_C",
+ "42131119_G_A",
+ "42131122_A_C",
+ "42131125_C_G",
+ "42131145_T_C",
+ "42131528_T_C",
+ "42131610_G_C",
+ "42131631_A_T",
+ "42131657_C_T",
+ "42131687_T_C",
+ "42131696_T_C",
+ "42131739_G_T",
+ "42131766_A_G",
+ "42131775_C_A",
+ "42131795_C_A",
+ "42131797_C_T",
+ "42131809_T_C",
+ "42131844_G_A",
+ "42131848_G_A",
+ "42131865_G_A",
+ "42131868_C_T",
+ "42131923_A_G",
+ "42131960_G_A",
+ "42131965_G_A",
+ "42132017_C_G",
+ "42132022_T_C",
+ "42132060_A_T",
+ "42132064_C_G",
+ "42132072_T_C",
+ "42132111_A_G",
+ "42132125_A_G",
+ "42132132_T_C",
+ "42132139_T_C",
+ "42132143_C_G",
+ "42132151_A_G",
+ "42132172_T_C"
+ ],
+ "CYP11B1": [
+ "142873172_T_A",
+ "142873180_C_A",
+ "142873184_C_A",
+ "142873185_A_C",
+ "142873196_G_T",
+ "142873197_C_A",
+ "142873199_A_C",
+ "142873204_G_T",
+ "142873230_T_C",
+ "142873249_C_T",
+ "142873268_T_A",
+ "142873281_G_A",
+ "142873292_A_C",
+ "142873293_T_A",
+ "142873305_C_T",
+ "142873306_A_G",
+ "142873307_G_A",
+ "142873309_C_G",
+ "142873318_T_C",
+ "142873321_A_G",
+ "142873322_C_T",
+ "142873328_A_C",
+ "142873329_A_C",
+ "142873343_C_A",
+ "142873437_C_T",
+ "142873441_A_G",
+ "142873506_A_G",
+ "142873525_G_A",
+ "142873645_C_T",
+ "142873658_A_T",
+ "142873681_C_T",
+ "142873683_A_G",
+ "142873686_C_G",
+ "142873689_G_T",
+ "142873697_G_A",
+ "142873714_T_C",
+ "142873739_C_T",
+ "142873760_T_C",
+ "142873805_C_T",
+ "142873829_T_G",
+ "142873844_C_T",
+ "142873973_G_A",
+ "142873977_G_A",
+ "142873992_G_C",
+ "142873993_A_G",
+ "142873999_A_G",
+ "142874041_C_T",
+ "142874062_A_T",
+ "142874065_C_T",
+ "142874084_A_G",
+ "142874098_G_A",
+ "142874100_G_T",
+ "142874104_A_G",
+ "142874108_C_T",
+ "142874116_A_G",
+ "142874119_G_C",
+ "142874134_A_G",
+ "142874171_T_G",
+ "142874174_C_T",
+ "142874191_A_C",
+ "142874197_G_A",
+ "142874198_C_T",
+ "142874211_A_G",
+ "142874215_T_C",
+ "142874216_G_A",
+ "142874222_A_G",
+ "142874231_C_G",
+ "142874249_T_A",
+ "142874252_G_A",
+ "142874258_T_C",
+ "142874260_A_G",
+ "142874304_G_C",
+ "142874305_G_A",
+ "142874333_G_A",
+ "142874366_G_T",
+ "142874367_A_G",
+ "142874369_G_A",
+ "142874370_T_A",
+ "142874373_T_C",
+ "142874379_G_A",
+ "142874382_G_C",
+ "142874391_G_A",
+ "142874407_A_G",
+ "142874411_T_C",
+ "142874412_G_A",
+ "142874454_G_A",
+ "142874470_T_A",
+ "142874474_G_A",
+ "142874478_T_C",
+ "142874497_A_G",
+ "142874516_T_C",
+ "142874522_C_A",
+ "142874524_C_T",
+ "142874533_C_G",
+ "142874536_G_C",
+ "142874540_T_G",
+ "142874546_C_T",
+ "142874548_T_C",
+ "142874552_A_C",
+ "142874553_G_C",
+ "142874555_C_T",
+ "142874557_C_G",
+ "142874560_A_C",
+ "142874561_A_C",
+ "142874564_T_G",
+ "142874569_G_A",
+ "142874570_A_C",
+ "142874571_G_A",
+ "142874589_A_G",
+ "142874601_C_G",
+ "142874603_T_C",
+ "142874611_G_A",
+ "142874613_T_C",
+ "142874614_T_C",
+ "142874619_C_T",
+ "142874639_A_G",
+ "142874641_C_A",
+ "142874653_A_C",
+ "142874655_T_G",
+ "142874656_C_A",
+ "142874661_A_T",
+ "142874670_A_C",
+ "142874685_C_T",
+ "142874689_C_T",
+ "142874707_G_C",
+ "142874711_T_C",
+ "142874715_C_T",
+ "142874719_T_C",
+ "142874727_G_T",
+ "142874728_A_T",
+ "142874730_C_A",
+ "142874731_C_A",
+ "142874732_G_T",
+ "142874739_C_A",
+ "142874757_G_C",
+ "142874758_G_C",
+ "142874762_G_C",
+ "142874770_C_G",
+ "142874778_A_G",
+ "142874779_C_A",
+ "142874788_A_T",
+ "142874791_A_T",
+ "142874797_A_G",
+ "142874799_A_C",
+ "142874811_G_A",
+ "142874814_C_A",
+ "142874954_C_T",
+ "142874957_A_G",
+ "142875002_A_G",
+ "142875040_A_G",
+ "142875089_G_C",
+ "142875115_G_C",
+ "142875116_G_A",
+ "142875128_A_C",
+ "142875140_C_A",
+ "142875143_G_C",
+ "142875144_C_T",
+ "142875146_C_T",
+ "142875173_G_A",
+ "142875290_G_A",
+ "142875818_C_T",
+ "142875830_T_C",
+ "142875874_A_G",
+ "142875876_C_T",
+ "142875881_G_A",
+ "142875885_G_A",
+ "142875888_A_C",
+ "142875891_C_T",
+ "142875892_A_G",
+ "142875893_G_T",
+ "142875960_G_C",
+ "142875970_C_G",
+ "142875977_G_C",
+ "142875981_A_G",
+ "142876003_A_G",
+ "142876009_C_G",
+ "142876013_T_C",
+ "142876015_C_T",
+ "142876017_A_G",
+ "142876021_C_A",
+ "142876029_A_G",
+ "142876044_G_C",
+ "142876046_C_T",
+ "142876049_C_T",
+ "142876052_C_G",
+ "142876053_A_G",
+ "142876060_A_T",
+ "142876099_A_C",
+ "142876109_A_C",
+ "142876113_C_G",
+ "142876134_C_T",
+ "142876135_A_G",
+ "142876151_A_G",
+ "142876160_C_T",
+ "142876169_A_G",
+ "142876232_C_T",
+ "142876289_A_T",
+ "142876293_G_A",
+ "142876295_C_T",
+ "142876307_A_C",
+ "142876333_T_C",
+ "142876334_G_T",
+ "142876340_C_G",
+ "142876995_A_G",
+ "142877003_T_C",
+ "142877010_A_G",
+ "142877012_G_A",
+ "142877159_A_G",
+ "142877162_G_C",
+ "142877177_T_A",
+ "142877183_A_G",
+ "142877758_T_C",
+ "142877916_T_A",
+ "142877920_A_G",
+ "142877960_A_G",
+ "142877966_A_G",
+ "142877970_A_G",
+ "142877979_G_A",
+ "142878156_T_A",
+ "142878174_G_A",
+ "142878180_T_C",
+ "142878192_T_C",
+ "142878199_T_C",
+ "142878221_C_T",
+ "142878234_G_A",
+ "142878238_G_A",
+ "142878268_T_A",
+ "142878269_T_C",
+ "142878312_T_C",
+ "142878314_C_T",
+ "142878319_G_A",
+ "142878339_G_A",
+ "142878356_G_A",
+ "142878388_T_C",
+ "142878401_T_C",
+ "142878452_G_C",
+ "142878749_T_G",
+ "142878806_A_G",
+ "142878869_G_T",
+ "142878876_A_G",
+ "142878886_G_A",
+ "142878888_A_G",
+ "142878905_T_C",
+ "142878965_G_A",
+ "142879033_G_A",
+ "142879092_C_A",
+ "142879101_T_C",
+ "142879102_G_A",
+ "142879139_G_A",
+ "142879168_C_G",
+ "142879171_C_G",
+ "142879183_C_T",
+ "142879203_C_G",
+ "142879218_T_C",
+ "142879222_A_G",
+ "142879223_C_T",
+ "142879236_A_G",
+ "142879284_C_T",
+ "142879285_T_G",
+ "142879320_T_C",
+ "142879326_A_G",
+ "142879329_G_C",
+ "142879332_C_T",
+ "142879340_C_T",
+ "142879342_T_C",
+ "142879348_C_T",
+ "142879354_C_T",
+ "142879378_C_T",
+ "142879387_A_G",
+ "142879391_G_T",
+ "142879400_C_T",
+ "142879419_G_A",
+ "142879433_C_T",
+ "142879455_T_C",
+ "142879508_C_T",
+ "142879528_A_G",
+ "142879537_C_A",
+ "142879592_T_C",
+ "142879610_T_C",
+ "142879612_C_T",
+ "142879613_T_C",
+ "142879627_C_G",
+ "142879683_C_T",
+ "142879703_G_C",
+ "142879712_T_C",
+ "142879718_G_A",
+ "142879722_A_G",
+ "142879727_G_A",
+ "142879736_C_A",
+ "142879749_T_C",
+ "142879776_A_G",
+ "142879783_T_C",
+ "142879841_T_C",
+ "142879842_G_C",
+ "142879843_C_A",
+ "142879858_G_C",
+ "142879861_T_A",
+ "142879864_T_A",
+ "142879865_G_A",
+ "142879872_C_T",
+ "142879873_T_C",
+ "142879875_C_G",
+ "142879878_G_A",
+ "142879880_G_C",
+ "142879881_G_A",
+ "142879882_G_T",
+ "142879905_C_T",
+ "142879911_T_C",
+ "142879922_G_C",
+ "142879923_A_T",
+ "142879925_C_T",
+ "142879926_G_C",
+ "142879929_G_A",
+ "142879932_A_G",
+ "142879933_G_A",
+ "142879940_G_A"
+ ]
+}
\ No newline at end of file
diff --git a/paraphase/data/genes.yaml b/paraphase/data/genes.yaml
index 9a996b6..ed69701 100644
--- a/paraphase/data/genes.yaml
+++ b/paraphase/data/genes.yaml
@@ -1,5 +1,4 @@
genes_to_call: []
no_genome_depth_genes: ["pms2", "neb", "cfc1", "ikbkg", "opn1lw", "rccx"]
-no_vcf_genes: []
-check_sex_genes: ["opn1lw", "f8", "ikbkg"]
+no_vcf_genes: ["CFH", "CFHR3"]
two_reference_regions_genes: ["smn1", "pms2", "strc", "ikbkg", "ncf1"]
diff --git a/paraphase/genes/__init__.py b/paraphase/genes/__init__.py
index 19e3676..3cec856 100755
--- a/paraphase/genes/__init__.py
+++ b/paraphase/genes/__init__.py
@@ -9,3 +9,4 @@
from .f8_phaser import F8Phaser
from .opn1lw_phaser import Opn1lwPhaser
from .hba_phaser import HbaPhaser
+from .cfhclust import CfhClust
diff --git a/paraphase/genes/cfc1_phaser.py b/paraphase/genes/cfc1_phaser.py
index 4e5a900..a2fa926 100644
--- a/paraphase/genes/cfc1_phaser.py
+++ b/paraphase/genes/cfc1_phaser.py
@@ -35,7 +35,7 @@ def call(self):
tmp = {}
for i, hap in enumerate(ass_haps):
- tmp.setdefault(hap, f"hap{i+1}")
+ tmp.setdefault(hap, f"{self.gene}_hap{i+1}")
ass_haps = tmp
haplotypes = None
@@ -63,6 +63,10 @@ def call(self):
two_cp_haps.append(ass_haps[cp2_hap])
total_cn = len(ass_haps) + len(two_cp_haps)
+
+ if self.het_sites == []:
+ total_cn = 4
+
if total_cn < 4:
total_cn = None
@@ -87,4 +91,5 @@ def call(self):
self.mdepth,
self.region_avg_depth._asdict(),
self.sample_sex,
+ None,
)
diff --git a/paraphase/genes/cfhclust.py b/paraphase/genes/cfhclust.py
new file mode 100644
index 0000000..76e4bf2
--- /dev/null
+++ b/paraphase/genes/cfhclust.py
@@ -0,0 +1,61 @@
+# paraphase
+# Author: Xiao Chen
+
+
+from ..phaser import Phaser
+
+
+class CfhClust(Phaser):
+ def __init__(
+ self,
+ sample_id,
+ outdir,
+ cfh,
+ cfhr3,
+ ):
+ Phaser.__init__(self, sample_id, outdir)
+ self.cfh = cfh
+ self.cfhr3 = cfhr3
+
+ def call(self):
+ haps = {}
+ haps.update(self.cfh["final_haplotypes"])
+ haps.update(self.cfhr3["final_haplotypes"])
+ fusions = {}
+ fusions.update(self.cfh["fusions_called"])
+ fusions.update(self.cfhr3["fusions_called"])
+ total_cn = None
+ if self.cfh["total_cn"] is not None and self.cfhr3["total_cn"] is not None:
+ total_cn = min(self.cfh["total_cn"], self.cfhr3["total_cn"])
+ if (
+ fusions != {}
+ and len(self.cfh["final_haplotypes"]) >= 2
+ and len(self.cfhr3["final_haplotypes"]) >= 2
+ ):
+ total_cn = min(
+ total_cn,
+ len(self.cfh["final_haplotypes"]),
+ len(self.cfhr3["final_haplotypes"]),
+ )
+
+ return self.GeneCall(
+ total_cn,
+ None,
+ haps,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ None,
+ fusions,
+ )
diff --git a/paraphase/genes/f8_phaser.py b/paraphase/genes/f8_phaser.py
index 3d79a4e..c1a185a 100644
--- a/paraphase/genes/f8_phaser.py
+++ b/paraphase/genes/f8_phaser.py
@@ -131,19 +131,19 @@ def call(self):
for i, hap in enumerate(ass_haps):
if len(hap) < 3:
unknown_count += 1
- hap_name = f"unknown_hap{unknown_count}"
+ hap_name = f"{self.gene}_unknown_hap{unknown_count}"
elif hap[-2:] == "00":
h1_count += 1
- hap_name = f"int22h1_hap{h1_count}"
+ hap_name = f"{self.gene}_int22h1_hap{h1_count}"
elif hap[-1] == "0" and hap[-2] != "x":
h3_count += 1
- hap_name = f"int22h3_hap{h3_count}"
+ hap_name = f"{self.gene}_int22h3_hap{h3_count}"
elif "x" not in hap[-2:] and "0" not in hap[-2:]:
h2_count += 1
- hap_name = f"int22h2_hap{h2_count}"
+ hap_name = f"{self.gene}_int22h2_hap{h2_count}"
else:
unknown_count += 1
- hap_name = f"unknown_hap{unknown_count}"
+ hap_name = f"{self.gene}_unknown_hap{unknown_count}"
tmp.setdefault(hap, hap_name)
ass_haps = tmp
@@ -192,6 +192,12 @@ def call(self):
elif links == "region1-region3" and "int22h3" in hap:
sv_hap.setdefault(hap, "inversion")
+ if sv_hap == {} and self.sample_sex is not None:
+ if self.sample_sex == "female" and total_cn < 6:
+ total_cn = None
+ if self.sample_sex == "male" and total_cn < 3:
+ total_cn = None
+
self.close_handle()
return self.GeneCall(
diff --git a/paraphase/genes/ncf1_phaser.py b/paraphase/genes/ncf1_phaser.py
index 2aaef62..1770847 100644
--- a/paraphase/genes/ncf1_phaser.py
+++ b/paraphase/genes/ncf1_phaser.py
@@ -83,13 +83,13 @@ def call(self):
hap_var = [var_reads.get(a) for a in hap_reads]
if hap_var.count("alt") > (len(hap_var) - hap_var.count(None)) * 0.7:
counter_pseudo += 1
- hap_rename.setdefault(hap, f"pseudo_hap{counter_pseudo}")
+ hap_rename.setdefault(hap, f"ncf1_pseudo_hap{counter_pseudo}")
else:
counter_gene += 1
hap_rename.setdefault(hap, f"ncf1_hap{counter_gene}")
else:
counter_pseudo += 1
- hap_rename.setdefault(hap, f"pseudo_hap{counter_pseudo}")
+ hap_rename.setdefault(hap, f"ncf1_pseudo_hap{counter_pseudo}")
tmp = {}
for hap, hap_name in ass_haps.items():
diff --git a/paraphase/genes/neb_phaser.py b/paraphase/genes/neb_phaser.py
index 62ac273..b3732dc 100644
--- a/paraphase/genes/neb_phaser.py
+++ b/paraphase/genes/neb_phaser.py
@@ -47,7 +47,7 @@ def call(self):
tmp = {}
for i, hap in enumerate(ass_haps):
- tmp.setdefault(hap, f"hap{i+1}")
+ tmp.setdefault(hap, f"{self.gene}_hap{i+1}")
ass_haps = tmp
haplotypes = None
diff --git a/paraphase/genes/opn1lw_phaser.py b/paraphase/genes/opn1lw_phaser.py
index 654d0de..87a292d 100644
--- a/paraphase/genes/opn1lw_phaser.py
+++ b/paraphase/genes/opn1lw_phaser.py
@@ -43,7 +43,16 @@ class Opn1lwPhaser(Phaser):
fields,
defaults=(None,) * len(fields),
)
- pathogenic_haps = ["LIAVA", "LVAVA", "LIAVS", "MIAVA", "MVVVA", "MVAVA", "LIAIA"]
+ pathogenic_haps = [
+ "LIAVA",
+ "LVAVA",
+ "LIAVS",
+ "MIAVA",
+ "MVVVA",
+ "MVAVA",
+ "LIAIA",
+ "LIVVA",
+ ]
def __init__(
self, sample_id, outdir, genome_depth=None, genome_bam=None, sample_sex=None
@@ -243,8 +252,7 @@ def call(self):
# annotate exon3
hap_annotated = self.call_exon3(var)
gene_annotated = renamed_hap.split("_")[0]
- if hap_annotated in self.pathogenic_haps:
- gene_annotated += "_" + hap_annotated
+ gene_annotated += "_" + hap_annotated
annotated_haps.setdefault(renamed_hap, gene_annotated)
tmp = {}
@@ -376,8 +384,7 @@ def call(self):
hap_vars = haplotypes[each_hap]["variants"]
hap_annotated = self.call_exon3(hap_vars)
gene_annotated = each_hap.split("_")[0]
- if hap_annotated in self.pathogenic_haps:
- gene_annotated += "_" + hap_annotated
+ gene_annotated += "_" + hap_annotated
else:
gene_annotated = None
each_allele_annotated.append(gene_annotated)
@@ -400,8 +407,7 @@ def call(self):
total_cn = baseline_cn
counter_lw = baseline_cn
gene_annotated = "opn1lw"
- if hap_annotated in self.pathogenic_haps:
- gene_annotated += "_" + hap_annotated
+ gene_annotated += "_" + hap_annotated
annotated_alleles.append([gene_annotated])
if self.sample_sex == "female":
annotated_alleles.append([gene_annotated])
diff --git a/paraphase/genes/pms2_phaser.py b/paraphase/genes/pms2_phaser.py
index fd5842b..c63d6cf 100755
--- a/paraphase/genes/pms2_phaser.py
+++ b/paraphase/genes/pms2_phaser.py
@@ -35,7 +35,9 @@ def call(self):
self.het_sites = sorted(list(self.candidate_pos))
self.remove_noisy_sites()
# for distinguishing pms2 from pms2cl
- raw_read_haps = self.get_haplotypes_from_reads(add_sites=self.add_sites)
+ raw_read_haps = self.get_haplotypes_from_reads(
+ check_clip=True, add_sites=self.add_sites
+ )
het_sites = self.het_sites
known_del = {}
@@ -66,20 +68,16 @@ def call(self):
counter_gene = 0
counter_pseudo = 0
counter_unknown = 0
- pivot_index, index_found = self.get_pivot_site_index()
- if index_found is False:
- return self.GeneCall()
for hap in ass_haps:
- start_seq = hap[pivot_index:]
- if start_seq.count("2") >= 15:
+ if hap.endswith("x"):
+ counter_unknown += 1
+ hap_name = f"pms2_unknown_hap{counter_gene}"
+ elif hap.endswith("0"):
counter_pseudo += 1
hap_name = f"pms2clhap{counter_pseudo}"
- elif start_seq.count("2") <= 5:
+ else:
counter_gene += 1
hap_name = f"pms2hap{counter_gene}"
- else:
- counter_unknown += 1
- hap_name = f"pms2_unknown_hap{counter_gene}"
tmp.setdefault(hap, hap_name)
ass_haps = tmp
@@ -133,4 +131,5 @@ def call(self):
self.mdepth,
self.region_avg_depth._asdict(),
self.sample_sex,
+ None,
)
diff --git a/paraphase/genes/rccx_phaser.py b/paraphase/genes/rccx_phaser.py
index f93ec31..f48b19d 100644
--- a/paraphase/genes/rccx_phaser.py
+++ b/paraphase/genes/rccx_phaser.py
@@ -439,7 +439,7 @@ def call(self):
tmp = {}
for i, hap in enumerate(ass_haps):
- hap_name = f"hap{i+1}"
+ hap_name = f"{self.gene}_hap{i+1}"
tmp.setdefault(hap, hap_name)
final_haps = tmp
# get haps that extend into tnxb
diff --git a/paraphase/paraphase.py b/paraphase/paraphase.py
index 5943ea0..c156dc1 100755
--- a/paraphase/paraphase.py
+++ b/paraphase/paraphase.py
@@ -78,15 +78,14 @@ def process_gene(
gdepth,
bam,
sample_sex,
- novcf,
prog_cmd,
- gene1only=False,
+ sample_id,
+ args,
):
"""Workflow for each region"""
phaser_calls = {}
for gene in gene_list:
try:
- sample_id = bam.split("/")[-1].split(".")[0]
if gene == "smn1":
phaser = genes.Smn1Phaser(
sample_id, tmpdir, gdepth, bam, sample_sex
@@ -126,7 +125,7 @@ def process_gene(
elif gene == "hba":
phaser = genes.HbaPhaser(sample_id, tmpdir, gdepth, bam, sample_sex)
else:
- phaser = Phaser(sample_id, tmpdir, gdepth)
+ phaser = Phaser(sample_id, tmpdir, gdepth, sample_sex=sample_sex)
config = configs[gene]
logging.info(
@@ -135,7 +134,7 @@ def process_gene(
logging.info(
f"Realigning reads for {gene} for sample {sample_id} at {datetime.datetime.now()}..."
)
- bam_realigner = BamRealigner(bam, tmpdir, config, prog_cmd)
+ bam_realigner = BamRealigner(bam, tmpdir, config, prog_cmd, sample_id)
bam_realigner.write_realign_bam()
logging.info(
@@ -160,11 +159,14 @@ def process_gene(
bam_realigner.write_realign_bam(gene2=True)
bam_tagger.write_bam(random_assign=True, gene2=True)
- if novcf is False and gene not in self.no_vcf_genes:
+ if args.novcf is False and gene not in self.no_vcf_genes:
logging.info(
f"Generating VCFs for {gene} for sample {sample_id} at {datetime.datetime.now()}..."
)
- if gene in self.two_reference_regions_genes and gene1only is False:
+ if (
+ gene in self.two_reference_regions_genes
+ and args.gene1only is False
+ ):
vcf_generater = TwoGeneVcfGenerater(
sample_id,
outdir,
@@ -183,7 +185,8 @@ def process_gene(
vcf_generater.set_parameter(
config, tmpdir=tmpdir, prog_cmd=prog_cmd
)
- vcf_generater.run_without_realign()
+ vcf_generater.run()
+
except Exception:
logging.error(
f"Error running {gene} for sample {sample_id}...See error message below"
@@ -199,16 +202,23 @@ def process_sample(
configs,
tmpdir,
prog_cmd,
+ args,
num_threads=1,
dcov={},
- novcf=False,
genome_build="38",
- gene1only=False,
):
"""Main workflow"""
for bam in bamlist:
try:
- sample_id = bam.split("/")[-1].split(".")[0]
+ if args.prefix is not None and args.bam is not None:
+ sample_id = args.prefix
+ else:
+ sample_id_from_header = self.get_sample_id_from_header(bam)
+ if sample_id_from_header is not None:
+ sample_id = sample_id_from_header
+ else:
+ sample_id = bam.split("/")[-1].split(".")[0]
+
logging.info(
f"Processing sample {sample_id} at {datetime.datetime.now()}..."
)
@@ -241,7 +251,9 @@ def process_sample(
f"For sample {sample_id}, due to low or highly variable genome coverage, genome coverage is not used for depth correction."
)
gdepth = None
- if set(query_genes).intersection(set(self.check_sex_genes)) != set():
+
+ # call sample sex
+ if True in ["X" in configs[a]["nchr"] for a in configs]:
logging.info(f"Checking sample sex at {datetime.datetime.now()}...")
depth = GenomeDepth(
bam,
@@ -263,9 +275,9 @@ def process_sample(
gdepth,
bam,
sample_sex,
- novcf,
prog_cmd,
- gene1only,
+ sample_id,
+ args,
)
else:
process_gene_partial = partial(
@@ -276,9 +288,9 @@ def process_sample(
gdepth=gdepth,
bam=bam,
sample_sex=sample_sex,
- novcf=novcf,
prog_cmd=prog_cmd,
- gene1only=gene1only,
+ sample_id=sample_id,
+ args=args,
)
gene_groups = [
query_genes[i::num_threads] for i in range(num_threads)
@@ -289,6 +301,19 @@ def process_sample(
pool.join()
for phaser_call_set in phaser_calls:
sample_out.update(phaser_call_set)
+
+ # merge cfh cluster result
+ if "CFH" in sample_out and "CFHR3" in sample_out:
+ cfh_cluster_caller = genes.CfhClust(
+ sample_id,
+ tmpdir,
+ sample_out["CFH"],
+ sample_out["CFHR3"],
+ )
+ sample_out.setdefault(
+ "CFHclust", cfh_cluster_caller.call()._asdict()
+ )
+
sample_out = dict(sorted(sample_out.items()))
logging.info(
@@ -299,7 +324,7 @@ def process_sample(
logging.info(
f"Writing to json for sample {sample_id} at {datetime.datetime.now()}..."
)
- out_json = os.path.join(outdir, sample_id + ".json")
+ out_json = os.path.join(outdir, sample_id + ".paraphase.json")
with open(out_json, "w") as json_output:
json.dump(sample_out, json_output, indent=4)
except Exception:
@@ -308,6 +333,23 @@ def process_sample(
)
traceback.print_exc()
+ @staticmethod
+ def get_sample_id_from_header(bam):
+ """Get sample ID from RG SM from the bam header"""
+ bam_handle = pysam.AlignmentFile(bam, "rb")
+ header = bam_handle.header
+ header = header.to_dict()
+ sample_ids = []
+ rg_lines = header.get("RG")
+ if rg_lines is not None:
+ sample_ids = [a.get("SM") for a in rg_lines if "SM" in a]
+ bam_handle.close()
+ sample_ids = [a for a in sample_ids if a is not None]
+ if len(set(sample_ids)) == 1:
+ return sample_ids[0]
+ else:
+ return None
+
def merge_bams(self, query_genes, outdir, tmpdir, sample_id):
"""Merge realigned tagged bams for each gene into one bam"""
bams = []
@@ -331,7 +373,7 @@ def merge_bams(self, query_genes, outdir, tmpdir, sample_id):
with open(bam_list_file, "w") as fout:
for bam in bams:
fout.write(bam + "\n")
- merged_bam = os.path.join(outdir, f"{sample_id}_realigned_tagged.bam")
+ merged_bam = os.path.join(outdir, f"{sample_id}.paraphase.bam")
tmp_bam = os.path.join(tmpdir, f"{sample_id}_merged.bam")
pysam.merge("-f", "-o", tmp_bam, "-b", bam_list_file)
pysam.sort("-o", merged_bam, tmp_bam)
@@ -382,6 +424,12 @@ def update_config(self, gene_list, ref_dir, genome, genome_build):
data_path, genome_build_dir, gene, old_data_file
)
data_paths[data_entry] = new_data_file
+ # add fusion gene definition file
+ if configs[gene].get("call_fusion") is not None:
+ data_paths.setdefault(
+ "fusion_json",
+ os.path.join(data_path, genome_build_dir, "fusion_genes.json"),
+ )
# add reference fasta
ref_file = os.path.join(ref_dir, f"{gene}_ref.fa")
data_paths.setdefault("reference", ref_file)
@@ -538,12 +586,20 @@ def load_parameters(self):
help="Output directory",
required=True,
)
+ parser.add_argument(
+ "-p",
+ "--prefix",
+ help="Prefix of output files for a single sample. Used with -b.\n"
+ + "If not provided, prefix will be extracted from the name of the input BAM.\n",
+ required=False,
+ )
parser.add_argument(
"-g",
"--gene",
help="Optionally specify which gene(s) to run (separated by comma).\n"
+ "Will run all genes if not specified.\n"
- + "The full set of accepted genes are defined in the config file.\n",
+ + "The full set of accepted regions are defined in the config file.\n"
+ + "Alternatively, you can define genes to call by modifying paraphase/data/genes.yaml\n",
required=False,
)
parser.add_argument(
@@ -556,7 +612,8 @@ def load_parameters(self):
parser.add_argument(
"-d",
"--depth",
- help="Optional path to a file listing average depth for each sample",
+ help=argparse.SUPPRESS,
+ # help="Optional path to a file listing average depth for each sample",
required=False,
)
parser.add_argument(
@@ -652,11 +709,10 @@ def run(self):
configs,
tmpdir,
prog_cmd,
+ args,
num_threads,
dcov,
- args.novcf,
genome_build_dir,
- args.gene1only,
)
else:
logging.warning(f"{args.bam} bam or bai file doesn't exist")
@@ -676,10 +732,9 @@ def run(self):
configs=configs,
tmpdir=tmpdir,
prog_cmd=prog_cmd,
+ args=args,
dcov=dcov,
- novcf=args.novcf,
genome_build=genome_build_dir,
- gene1only=args.gene1only,
)
bam_groups = [bamlist[i::num_threads] for i in range(num_threads)]
pool = mp.Pool(num_threads)
diff --git a/paraphase/phaser.py b/paraphase/phaser.py
index b4a1a42..e6adf8c 100755
--- a/paraphase/phaser.py
+++ b/paraphase/phaser.py
@@ -10,6 +10,7 @@
from itertools import product
import re
import logging
+import json
from scipy.stats import poisson
from collections import namedtuple
from .haplotype_assembler import VariantGraph
@@ -38,6 +39,7 @@ class Phaser:
"genome_depth",
"region_depth",
"sample_sex",
+ "fusions_called",
]
GeneCall = namedtuple(
"GeneCall",
@@ -97,6 +99,13 @@ def set_parameter(self, config):
if self.gene_end is None:
self.gene_end = self.right_boundary
+ self.call_fusion = None
+ if "call_fusion" in config:
+ self.call_fusion = config["call_fusion"]
+ fusion_json = config["data"].get("fusion_json")
+ self.fusion_gene_def_variants = []
+ with open(fusion_json) as f:
+ self.fusion_gene_def_variants = json.load(f).get(self.gene)
self.use_supplementary = False
if "use_supplementary" in config or "is_tandem" in config:
self.use_supplementary = True
@@ -106,6 +115,9 @@ def set_parameter(self, config):
self.is_reverse = False
if "is_reverse" in config:
self.is_reverse = config["is_reverse"]
+ self.is_palindrome = False
+ if "is_palindrome" in config:
+ self.is_palindrome = config["is_palindrome"]
self.expect_cn2 = False
if "expect_cn2" in config:
self.expect_cn2 = True
@@ -992,7 +1004,7 @@ def output_variants_in_haplotypes(self, haps, reads, nonunique, known_del={}):
for hap, hap_name in haps.items():
# find boundary for confident variant calling
hap_bound_start, hap_bound_end = self.get_hap_variant_ranges(hap)
- is_truncated = False
+ is_truncated = None
# het sites
for i in range(len(hap)):
if hap[i] == "2":
@@ -1046,7 +1058,7 @@ def output_variants_in_haplotypes(self, haps, reads, nonunique, known_del={}):
hap_bound_start = max(hap_bound_start, clip_position)
haplotype_variants[hap_name].append(f"{clip_position}_clip_5p")
if clip_position > self.gene_start:
- is_truncated = True
+ is_truncated = ["5p"]
if hap.endswith("0") and self.clip_3p_positions != []:
for first_pos_before_clip in reversed(range(len(hap))):
if hap[first_pos_before_clip] != "0":
@@ -1065,7 +1077,10 @@ def output_variants_in_haplotypes(self, haps, reads, nonunique, known_del={}):
hap_bound_end = min(hap_bound_end, clip_position)
haplotype_variants[hap_name].append(f"{clip_position}_clip_3p")
if clip_position < self.gene_end:
- is_truncated = True
+ if is_truncated is None:
+ is_truncated = ["3p"]
+ else:
+ is_truncated.append("3p")
haplotype_variants[hap_name] += filtered_homo_sites
@@ -1679,6 +1694,152 @@ def add_homo_sites(self, min_no_var_region_size=10000, max_homo_var_to_add=10):
self.remove_noisy_sites()
return homo_sites_to_add
+ def find_fusion(self, ass_haps):
+ """Call fusion based on haplotypes"""
+ # update two-copy haplotypes
+ two_cp_haps = self.update_twp_cp_in_fusion_cases(ass_haps)
+ fusions_called = {}
+ for hap, hap_name in ass_haps.items():
+ if hap.endswith("x") is False and hap.startswith("x") is False:
+ if (hap.endswith("0") is False and hap.startswith("0") is True) or (
+ hap.endswith("0") is True and hap.startswith("0") is False
+ ):
+ new_hap, all_sites = self.new_hap_for_breakpoint(hap)
+ fusion_breakpoint_index = self.get_fusion_breakpoint_index(
+ hap, new_hap
+ )
+ if fusion_breakpoint_index is not None:
+ bp1 = int(all_sites[fusion_breakpoint_index].split("_")[0])
+ bp2 = self.get_range_in_other_gene(bp1, search_range=1000)
+ bp3 = int(all_sites[fusion_breakpoint_index - 1].split("_")[0])
+ bp4 = self.get_range_in_other_gene(bp3, search_range=1000)
+ if bp1 < bp2:
+ fusion_breakpoint = (
+ (bp3, bp1),
+ (bp4, bp2),
+ )
+ else:
+ fusion_breakpoint = (
+ (bp4, bp2),
+ (bp3, bp1),
+ )
+ fusions_called.setdefault(hap_name, {})
+ fusion_type = self.get_fusion_type(hap)
+ fusions_called[hap_name].setdefault("type", fusion_type)
+ fusions_called[hap_name].setdefault("sequence", new_hap)
+ fusions_called[hap_name].setdefault(
+ "breakpoint", fusion_breakpoint
+ )
+ return two_cp_haps, fusions_called
+
+ def get_fusion_type(self, hap):
+ """Fusion type: deletion or duplication"""
+ fusion_type = None
+ if self.call_fusion == "5p":
+ if hap.endswith("0") is False and hap.startswith("0") is True:
+ fusion_type = "duplication"
+ elif hap.endswith("0") is True and hap.startswith("0") is False:
+ fusion_type = "deletion"
+ elif self.call_fusion == "3p":
+ if hap.endswith("0") is False and hap.startswith("0") is True:
+ fusion_type = "deletion"
+ elif hap.endswith("0") is True and hap.startswith("0") is False:
+ fusion_type = "duplication"
+ return fusion_type
+
+ @staticmethod
+ def update_twp_cp_in_fusion_cases(ass_haps):
+ """Update two-copy haplotypes based on the presence of gene/paralogs"""
+ two_cp_haps = []
+ if True not in [a.startswith("x") or a.endswith("x") for a in ass_haps]:
+ gene1s = [
+ a
+ for a in ass_haps
+ if a.endswith("0") is False and a.startswith("0") is False
+ ]
+ gene2s = [
+ a
+ for a in ass_haps
+ if a.endswith("0") is True and a.startswith("0") is True
+ ]
+ fusions = [
+ a
+ for a in ass_haps
+ if (a.endswith("0") is False and a.startswith("0") is True)
+ or (a.endswith("0") is True and a.startswith("0") is False)
+ ]
+ if fusions == [] and len(ass_haps) < 4:
+ if len(gene1s) == 1 and ass_haps[gene1s[0]] not in two_cp_haps:
+ two_cp_haps.append(ass_haps[gene1s[0]])
+ if len(gene2s) == 1 and ass_haps[gene2s[0]] not in two_cp_haps:
+ two_cp_haps.append(ass_haps[gene2s[0]])
+ return two_cp_haps
+
+ def new_hap_for_breakpoint(self, hap):
+ """
+ Get the haplotype sequence for breakpoint identification
+ This is ideally based on PSVs defined in self.fusion_gene_def_variants
+ """
+ new_hap = ""
+ if self.fusion_gene_def_variants != []:
+ all_sites = self.fusion_gene_def_variants
+ for var_site in all_sites:
+ base = "1"
+ if var_site in self.homo_sites:
+ base = "2"
+ elif var_site in self.het_sites:
+ base = hap[self.het_sites.index(var_site)]
+ new_hap += base
+ else:
+ all_sites = sorted(
+ self.homo_sites + self.het_sites, key=lambda x: int(x.split("_")[0])
+ )
+ if self.clip_5p_positions != []:
+ all_sites = [
+ a
+ for a in all_sites
+ if int(a.split("_")[0]) > max(self.clip_5p_positions)
+ ]
+ if self.clip_3p_positions != []:
+ all_sites = [
+ a
+ for a in all_sites
+ if int(a.split("_")[0]) < min(self.clip_3p_positions)
+ ]
+ for var_site in all_sites:
+ if var_site in self.homo_sites:
+ new_hap += "2"
+ elif var_site in self.het_sites:
+ new_hap += hap[self.het_sites.index(var_site)]
+ return new_hap, all_sites
+
+ @staticmethod
+ def get_fusion_breakpoint_index(hap, new_hap):
+ """Infer the switch from gene1 sequence to gene2 sequence"""
+ # 2s to 1s
+ if hap.startswith("0") is True and hap.endswith("0") is False:
+ counts = []
+ for i, _ in enumerate(new_hap):
+ counts.append(
+ new_hap[:i].count("2") + new_hap[i:].count("1"),
+ )
+ bp_index = counts.index(max(counts))
+ if bp_index == 0 or bp_index == len(counts) - 1:
+ return None
+ return bp_index
+ # 1s to 2s
+ if hap.startswith("0") is False and hap.endswith("0") is True:
+ counts = []
+ for i, _ in enumerate(new_hap):
+ counts.append(
+ new_hap[:i].count("1") + new_hap[i:].count("2"),
+ )
+ bp_index = counts.index(max(counts))
+ if bp_index == 0 or bp_index == len(counts) - 1:
+ return None
+ return bp_index
+ return None
+
def call(self):
"""Main function to phase haplotypes and call copy numbers"""
if self.check_coverage_before_analysis() is False:
@@ -1763,9 +1924,8 @@ def call(self):
) = self.phase_haps(raw_read_haps)
tmp = {}
- mod_gene_name = ",".join(self.gene.split("-"))
for i, hap in enumerate(ass_haps):
- tmp.setdefault(hap, f"{mod_gene_name}_hap{i+1}")
+ tmp.setdefault(hap, f"{self.gene}_hap{i+1}")
ass_haps = tmp
haplotypes = None
@@ -1778,7 +1938,9 @@ def call(self):
)
two_cp_haps = []
- if len(ass_haps) == 3:
+ if (
+ len(ass_haps) == 3 and self.expect_cn2 is False and self.gene != "BPY2"
+ ) or (self.gene == "BPY2" and len(ass_haps) < 3):
two_cp_haps = self.compare_depth(haplotypes, stringent=True)
if two_cp_haps == [] and read_counts is not None:
# check if one haplotype has more reads than others
@@ -1792,19 +1954,39 @@ def call(self):
if probs[0] < 0.05 and others_max >= 10:
two_cp_haps.append(ass_haps[cp2_hap])
+ # call fusion
+ fusions_called = None
+ if self.call_fusion is not None:
+ two_cp_haps, fusions_called = self.find_fusion(ass_haps)
+
total_cn = len(ass_haps) + len(two_cp_haps)
# fully homozygous
- if self.het_sites == [] or total_cn == 1:
+ if self.het_sites == []:
total_cn = 2
# two pairs of identical copies
- if two_cp_haps == [] and total_cn == 2 and self.expect_cn2 is False:
+ if (
+ two_cp_haps == []
+ and total_cn == 2
+ and self.expect_cn2 is False
+ and self.gene != "BPY2"
+ ):
if self.mdepth is not None:
prob = self.depth_prob(int(self.region_avg_depth.median), self.mdepth)
if prob[0] < 0.75:
total_cn = 4
+ # correct CN for palindrome genes
+ # if self.sample_sex is not None:
+ # if self.is_palindrome:
+ # if self.sample_sex == "female" and total_cn < 4:
+ # total_cn = None
+ # elif self.sample_sex == "male" and total_cn < 2:
+ # total_cn = None
+ if total_cn is not None and total_cn == 1:
+ total_cn = None
+
# phase
alleles = []
hap_links = {}
@@ -1842,6 +2024,7 @@ def call(self):
self.mdepth,
self.region_avg_depth._asdict(),
self.sample_sex,
+ fusions_called,
)
def close_handle(self):
diff --git a/paraphase/prepare_bam_and_vcf.py b/paraphase/prepare_bam_and_vcf.py
index 3309b55..a0be37f 100755
--- a/paraphase/prepare_bam_and_vcf.py
+++ b/paraphase/prepare_bam_and_vcf.py
@@ -25,10 +25,11 @@ class BamRealigner:
deletion = r"\d+D"
insertion = r"\d+I"
- def __init__(self, bam, outdir, config, prog_cmd):
+ def __init__(self, bam, outdir, config, prog_cmd, sample_id):
self.bam = bam
self.outdir = outdir
self.prog_cmd = prog_cmd
+ self.sample_id = sample_id
self.gene = config["gene"]
self.ref = config["data"]["reference"]
self.nchr_old = config["nchr_old"]
@@ -44,7 +45,6 @@ def __init__(self, bam, outdir, config, prog_cmd):
if "use_r2k" in config:
self.use_r2k = "-r2k"
self._bamh = pysam.AlignmentFile(bam, "rb")
- self.sample_id = bam.split("/")[-1].split(".")[0]
self.realign_bam = os.path.join(
self.outdir, self.sample_id + f"_{self.gene}_realigned_old.bam"
)
@@ -342,6 +342,7 @@ class VcfGenerater:
"""
search_range = 200
+ min_base_quality_for_variant_calling = 25
def __init__(self, sample_id, outdir, call_sum):
self.sample_id = sample_id
@@ -365,14 +366,9 @@ def set_parameter(self, config, tmpdir=None, prog_cmd=None):
self.left_boundary = int(self.nchr_old.split("_")[1])
if self.right_boundary is None:
self.right_boundary = int(self.nchr_old.split("_")[2])
- self.samtools = config["tools"]["samtools"]
- self.minimap2 = config["tools"]["minimap2"]
self.use_supplementary = False
if "use_supplementary" in config or "is_tandem" in config:
self.use_supplementary = True
- self.keep_truncated = False
- if "keep_truncated" in config:
- self.keep_truncated = True
self.prog_cmd = prog_cmd
self.tmpdir = tmpdir
@@ -381,8 +377,7 @@ def set_parameter(self, config, tmpdir=None, prog_cmd=None):
self.bam = os.path.join(
tmpdir, self.sample_id + f"_{self.gene}_realigned_tagged.bam"
)
- self.vcf_dir = os.path.join(self.outdir, f"{self.sample_id}_vcfs")
- os.makedirs(self.vcf_dir, exist_ok=True)
+ self.vcf_dir = os.path.join(self.outdir, f"{self.sample_id}_paraphase_vcfs")
def get_range_in_other_gene(self, pos):
"""
@@ -400,63 +395,106 @@ def write_header(self, fout):
"""Write VCF header"""
fout.write("##fileformat=VCFv4.2\n")
fout.write('##FILTER=\n')
- fout.write('##FILTER=\n')
+ # fout.write('##FILTER=\n')
fout.write(
- '##FILTER=\n'
- )
- fout.write(
- '##INFO=\n'
+ '##INFO=\n'
)
+ alleles = self.call_sum.get("alleles_final")
+ if alleles is not None and alleles != []:
+ fout.write(
+ '##INFO=\n'
+ )
if self.gene in ["ikbkg", "f8"]:
fout.write(
'##INFO=\n'
)
fout.write(
- '##INFO=\n'
+ '##INFO=\n'
)
fout.write(
'##INFO=\n'
)
fout.write('##ALT=\n')
fout.write('##ALT=\n')
- fout.write('##FORMAT=\n')
- fout.write('##FORMAT=\n')
fout.write(
- '##FORMAT=\n'
+ '##FORMAT=\n'
+ )
+ fout.write(
+ '##FORMAT=\n'
+ )
+ fout.write(
+ '##FORMAT=\n'
)
fout.write(f"##contig=\n")
fout.write(f"##paraphase_version={paraphase.__version__}\n")
fout.write(f"##paraphase_command=paraphase {self.prog_cmd}\n")
- header = [
- "#CHROM",
- "POS",
- "ID",
- "REF",
- "ALT",
- "QUAL",
- "FILTER",
- "INFO",
- "FORMAT",
- "default",
- ]
- fout.write("\t".join(header) + "\n")
+
+ @staticmethod
+ def modify_hapbound(bound1, bound2, truncated):
+ """Get haplotype boundaries to appear in vcf"""
+ hap_bound = f"{bound1}-{bound2}"
+ if truncated is not None:
+ if truncated == ["5p"]:
+ hap_bound = f"{bound1}truncated-{bound2}"
+ elif truncated == ["3p"]:
+ hap_bound = f"{bound1}-{bound2}truncated"
+ elif truncated == ["5p", "3p"]:
+ hap_bound = f"{bound1}truncated-{bound2}truncated"
+ return hap_bound
+
+ @staticmethod
+ def convert_alt_record(ref, alt):
+ """Convert variant ALT allele to the pileup format"""
+ if len(alt) > 1:
+ ins_len = len(alt) - 1
+ return f"{ref}+{ins_len}{alt[1:]}"
+ if len(ref) > 1:
+ del_len = len(ref) - 1
+ del_seq_n = "N" * del_len
+ return f"{ref[0]}-{del_len}{del_seq_n}"
+ return alt
def merge_vcf(self, vars_list):
"""
Merge vcfs from multiple haplotypes.
"""
- merged_vcf = os.path.join(
- self.vcf_dir, self.sample_id + f"_{self.gene}_variants.vcf"
- )
+ os.makedirs(self.vcf_dir, exist_ok=True)
+ merged_vcf = os.path.join(self.vcf_dir, self.sample_id + f"_{self.gene}.vcf")
with open(merged_vcf, "w") as fout:
self.write_header(fout)
- for variants_info, haps_ids in vars_list:
+ assert len(vars_list) <= 2
+ haps_ids = []
+ haps_ids1 = []
+ haps_ids2 = []
+ haps_bounds = []
+ for list_counter, (variants_info, haps_info) in enumerate(vars_list):
+ for hap_name, bound1, bound2, truncated in haps_info:
+ haps_ids.append(hap_name)
+ hap_bound = self.modify_hapbound(bound1, bound2, truncated)
+ haps_bounds.append(hap_bound)
+ if list_counter == 0:
+ haps_ids1.append(hap_name)
+ else:
+ haps_ids2.append(hap_name)
+ header = [
+ "#CHROM",
+ "POS",
+ "ID",
+ "REF",
+ "ALT",
+ "QUAL",
+ "FILTER",
+ "INFO",
+ "FORMAT",
+ ] + haps_ids
+ fout.write("\t".join(header) + "\n")
+
+ for list_counter, (variants_info, haps_info) in enumerate(vars_list):
variants_info = dict(sorted(variants_info.items()))
for pos in variants_info:
call_info = variants_info[pos]
# unique variants at this site
variant_observed = set([a[0] for a in call_info if a is not None])
- var_num = len(variant_observed)
for variant in variant_observed:
_, ref, alt = variant.split("_")
merge_gt = []
@@ -468,25 +506,77 @@ def merge_vcf(self, vars_list):
merge_ad.append(".")
merge_dp.append(".")
else:
- var_name, dp, ad, var_filter, gt = each_call
+ var_name, dp, ad, var_filter, gt, counter = each_call
+ if counter is None:
+ if ref != alt:
+ this_ad = ",".join([str(a) for a in ad])
+ else:
+ this_ad = ",".join([str(a) for a in [ad[0], 0]])
+ else:
+ if ref != alt:
+ alt_converted = self.convert_alt_record(
+ ref, alt
+ )
+ this_ad = ",".join(
+ [
+ str(a)
+ for a in [
+ ad[0],
+ counter[alt_converted],
+ ]
+ ]
+ )
+ else:
+ this_ad = ",".join([str(a) for a in [ad[0], 0]])
if var_filter != []:
gt = "."
merge_dp.append(str(dp))
if gt == "0":
merge_gt.append(gt)
- merge_ad.append(str(dp - ad))
+ merge_ad.append(this_ad)
elif var_name == variant:
merge_gt.append(gt)
- merge_ad.append(str(ad))
+ merge_ad.append(this_ad)
else:
merge_gt.append(".")
- merge_ad.append(".")
+ merge_ad.append(this_ad)
+ if list_counter == 0 and haps_ids != haps_ids1:
+ for _ in range(len(haps_ids2)):
+ merge_gt.append(".")
+ merge_ad.append(".")
+ merge_dp.append(".")
+ elif list_counter > 0:
+ for _ in range(len(haps_ids1)):
+ merge_gt.insert(0, ".")
+ merge_ad.insert(0, ".")
+ merge_dp.insert(0, ".")
final_qual = "."
- if ref == alt:
- alt = "."
- if (var_num == 1 and ("1" in merge_gt or "." in merge_gt)) or (
- var_num > 1 and alt != "."
+ if (
+ alt != ref
+ and alt not in [".", "*"]
+ and "1" in merge_gt # or "." in merge_gt
):
+ if "1" in merge_gt:
+ variant_filter = "PASS"
+ # else:
+ # variant_filter = "LowQual"
+ info_field = "HPBOUND=" + ",".join(haps_bounds)
+ alleles = self.call_sum.get("alleles_final")
+ if alleles is not None and alleles != []:
+ alleles_rename = []
+ for allele in alleles:
+ allele_rename = []
+ for a in allele:
+ if a is not None:
+ allele_rename.append(a)
+ else:
+ allele_rename.append("unknown")
+ alleles_rename.append(allele_rename)
+ info_field += (
+ ";"
+ + "ALLELE="
+ + ",".join(["+".join(a) for a in alleles_rename])
+ )
if alt.isdigit() is False:
merged_entry = [
self.nchr,
@@ -495,20 +585,23 @@ def merge_vcf(self, vars_list):
ref,
alt,
final_qual,
- "PASS",
- "HapIDs=" + ",".join(haps_ids),
+ variant_filter,
+ info_field,
"GT:DP:AD",
- "/".join(merge_gt)
- + ":"
- + ",".join(merge_dp)
- + ":"
- + ",".join(merge_ad),
+ ] + [
+ ":".join([merge_gt[j], merge_dp[j], merge_ad[j]])
+ for j in range(len(haps_ids))
]
+
else:
nstart, var_type, nend = variant.split("_")
nstart = int(nstart)
nend = int(nend)
var_size = nend - nstart
+ info_field = (
+ f"SVTYPE={var_type};END={nend};SVLEN={var_size};"
+ + info_field
+ )
merged_entry = [
self.nchr,
str(pos),
@@ -516,16 +609,12 @@ def merge_vcf(self, vars_list):
"N",
f"<{var_type}>",
final_qual,
- "PASS",
- f"SVTYPE={var_type};END={nend};SVLEN={var_size};"
- + "HapIDs="
- + ",".join(haps_ids),
+ variant_filter,
+ info_field,
"GT:DP:AD",
- "/".join(merge_gt)
- + ":"
- + ",".join(merge_dp)
- + ":"
- + ",".join(merge_ad),
+ ] + [
+ ":".join([merge_gt[j], merge_dp[j], merge_ad[j]])
+ for j in range(len(haps_ids))
]
fout.write("\t".join(merged_entry) + "\n")
@@ -574,7 +663,9 @@ def get_var(all_bases, ref_seq):
gt = "."
qual = "."
ad = len([a for a in all_bases if a != ref_seq])
+ ad_ref = len([a for a in all_bases if a == ref_seq])
var_seq = ref_seq
+ counter = None
if all_bases != []:
counter = Counter(all_bases)
most_common_base = counter.most_common(2)
@@ -588,7 +679,7 @@ def get_var(all_bases, ref_seq):
else:
gt = "1"
# qual = VcfGenerater.get_var_call_qual(dp, ad, gt, is_snp)
- return [var_seq, dp, ad, gt, qual]
+ return [var_seq, dp, (ad_ref, ad), gt, qual, counter]
def get_hap_bound(self, hap_name):
"""Get haplotype boundaries"""
@@ -622,7 +713,6 @@ def pileup_to_variant(
refh,
offset,
hap_bound,
- vcf_out,
min_depth=4,
min_qual=25,
variants_to_add={},
@@ -631,28 +721,13 @@ def pileup_to_variant(
Filter pileups and make variant calls.
"""
variants = []
- # for now, report F8 SVs in the haplotype vcfs. Ideally they should be in a diploid vcf.
if self.gene == "f8":
for pos in variants_to_add:
var_name = variants_to_add[pos]
nstart, var_type, nend = var_name.split("_")
nstart = int(nstart)
nend = int(nend)
- var_size = nend - nstart
- variants.append([nstart, var_name, ".", ".", [], "1"])
- vcf_out_line = [
- self.nchr,
- str(nstart),
- ".",
- "N",
- f"<{var_type}>",
- ".",
- "PASS",
- f"SVTYPE={var_type};END={nend};SVLEN={var_size}",
- "GT:DP:AD",
- f"1:.:.",
- ]
- vcf_out.write("\t".join(vcf_out_line) + "\n")
+ variants.append([nstart, var_name, ".", ".", [], "1", None])
ref_name = refh.references[0]
del_pos = []
@@ -662,21 +737,7 @@ def pileup_to_variant(
nstart, var_type, nend = var_name.split("_")
nstart = int(nstart)
nend = int(nend)
- var_size = nend - nstart
- variants.append([nstart, var_name, ".", ".", [], "1"])
- vcf_out_line = [
- self.nchr,
- str(nstart),
- ".",
- "N",
- f"<{var_type}>",
- ".",
- "PASS",
- f"SVTYPE={var_type};END={nend};SVLEN={var_size}",
- "GT:DP:AD",
- f"1:.:.",
- ]
- vcf_out.write("\t".join(vcf_out_line) + "\n")
+ variants.append([nstart, var_name, ".", ".", [], "1", None])
del_pos = [nstart, nend]
all_bases = pileups_raw[pos]
@@ -688,63 +749,40 @@ def pileup_to_variant(
refh_pos = pos
ref_seq = refh.fetch(ref_name, refh_pos - 1, refh_pos)
alt_all_reads = self.get_var(all_bases, ref_seq)
+ var_seq, dp, ad, gt, qual, counter = alt_all_reads
if (
hap_bound == []
or (None not in hap_bound and hap_bound[0] < true_pos < hap_bound[1])
) and (del_pos == [] or true_pos < del_pos[0] or true_pos > del_pos[1]):
# use only unique reads for positions at the edge
+ # or no-call when using all reads
if (
hap_bound == []
or true_pos < hap_bound[2]
or true_pos > hap_bound[3]
+ or ad[1] < dp * 0.7
):
bases_uniq_reads = []
for i, read_base in enumerate(all_bases):
if uniq_reads is None or read_names[pos][i] in uniq_reads:
bases_uniq_reads.append(read_base)
alt_uniq_reads = self.get_var(bases_uniq_reads, ref_seq)
- # if alt_uniq_reads[1] >= min_depth:
- var_seq, dp, ad, gt, qual = alt_uniq_reads
- # else:
- # var_seq, dp, ad, gt, qual = alt_all_reads
- # gt = "."
- else:
- var_seq, dp, ad, gt, qual = alt_all_reads
+ var_seq, dp, ad, gt, qual, counter = alt_uniq_reads
+ if dp < min_depth or ad[1] < dp * 0.7:
+ var_seq, dp, ad, gt, qual, counter = alt_all_reads
ref_seq, var_seq = self.refine_indels(
ref_seq, var_seq, refh_pos, refh, ref_name
)
var = f"{true_pos}_{ref_seq}_{var_seq}"
- qual = "."
var_filter = []
if dp < min_depth:
var_filter.append("LowDP")
- # if qual != "." and qual < min_qual:
- if ad < dp * 0.7:
+ if ad[1] < dp * 0.7:
var_filter.append("LowQual")
- if var_filter == []:
- call_filter = "PASS"
- else:
- call_filter = ";".join(var_filter)
+ if var_filter != []:
gt = "."
- variants.append([true_pos, var, dp, ad, var_filter, gt])
- if var_seq == ref_seq:
- var_seq = "."
- # write all positions where gt is not confidently 0
- if gt == "1" or gt == ".":
- vcf_out_line = [
- self.nchr,
- str(true_pos),
- ".",
- ref_seq,
- var_seq,
- str(qual),
- call_filter,
- ".",
- "GT:DP:AD",
- f"{gt}:{dp}:{ad}",
- ]
- vcf_out.write("\t".join(vcf_out_line) + "\n")
+ variants.append([true_pos, var, dp, ad, var_filter, gt, counter])
return variants
def run_without_realign(
@@ -803,17 +841,10 @@ def run_without_realign(
uniq_reads.append(read_name)
variants_info = {}
two_cp_haplotypes = self.call_sum.get("two_copy_haplotypes")
- # exclude truncated copies
- haps_not_truncated = [
- a
- for a in final_haps.values()
- if self.call_sum["haplotype_details"][a]["is_truncated"] is False
- or self.keep_truncated is True
- ]
- nhap = len(haps_not_truncated) + len(
- [a for a in two_cp_haplotypes if a in haps_not_truncated]
- )
- hap_ids = []
+ nhap = len(final_haps)
+ if two_cp_haplotypes is not None:
+ nhap += len(two_cp_haplotypes)
+ hap_info = []
# gene1only, or two-gene mode but gene1 side
if gene2 is False or match_range is False:
@@ -828,18 +859,13 @@ def run_without_realign(
nchr = self.nchr_gene2
if (gene2 is False or match_range is False) and final_haps == {}:
- hap_name = "homozygous_hap1"
- hap_vcf_out = os.path.join(
- self.vcf_dir, self.sample_id + f"_{self.gene}_{hap_name}.vcf"
- )
- vcf_out = open(hap_vcf_out, "w")
- self.write_header(vcf_out)
+ hap_name = f"{self.gene}_homozygous_hap1"
pileups_raw = {}
read_names = {}
for pileupcolumn in bamh.pileup(
nchr,
truncate=True,
- min_base_quality=30,
+ min_base_quality=self.min_base_quality_for_variant_calling,
):
pos = pileupcolumn.pos + 1
this_pos_bases = [
@@ -854,29 +880,23 @@ def run_without_realign(
refh,
0 - offset,
[],
- vcf_out,
)
- vcf_out.close()
- hap_ids.append(hap_name)
- hap_ids.append(hap_name)
- for pos, var_name, dp, ad, var_filter, gt in variants_called:
+ hap_info.append([hap_name, self.left_boundary, self.right_boundary, None])
+ hap_info.append(
+ [hap_name + "_cp2", self.left_boundary, self.right_boundary, None]
+ )
+
+ for pos, var_name, dp, ad, var_filter, gt, counter in variants_called:
variants_info.setdefault(
pos,
[
- [var_name, dp, ad, var_filter, gt],
- [var_name, dp, ad, var_filter, gt],
+ [var_name, dp, ad, var_filter, gt, counter],
+ [var_name, dp, ad, var_filter, gt, counter],
],
)
i = 0
for hap_name in final_haps.values():
- if (
- self.call_sum["haplotype_details"][hap_name]["is_truncated"] is True
- and self.keep_truncated is False
- ):
- continue
- hap_ids.append(hap_name)
-
variants_to_add = {}
if hap_name in special_variants:
variant_to_add = special_variants[hap_name]
@@ -903,11 +923,22 @@ def run_without_realign(
min(n3, n4),
max(n3, n4),
]
- hap_vcf_out = os.path.join(
- self.vcf_dir, self.sample_id + f"_{self.gene}_{hap_name}.vcf"
- )
- vcf_out = open(hap_vcf_out, "w")
- self.write_header(vcf_out)
+ if gene2 is False or match_range is False:
+ this_hap_info = [
+ hap_name,
+ hap_bound[0],
+ hap_bound[1],
+ self.call_sum["haplotype_details"][hap_name]["is_truncated"],
+ ]
+ hap_info.append(this_hap_info)
+ else:
+ this_hap_info = [
+ hap_name,
+ hap_bound[0],
+ hap_bound[1],
+ None,
+ ]
+ hap_info.append(this_hap_info)
# by HP tag
pileups_raw = {}
@@ -915,7 +946,7 @@ def run_without_realign(
for pileupcolumn in bamh.pileup(
nchr,
truncate=True,
- min_base_quality=30,
+ min_base_quality=self.min_base_quality_for_variant_calling,
):
pos = pileupcolumn.pos + 1
this_pos_bases = [
@@ -956,27 +987,36 @@ def run_without_realign(
refh,
0 - offset,
hap_bound,
- vcf_out,
variants_to_add=variants_to_add,
)
- vcf_out.close()
- for pos, var_name, dp, ad, var_filter, gt in variants_called:
+ for pos, var_name, dp, ad, var_filter, gt, counter in variants_called:
variants_info.setdefault(pos, [None] * nhap)
- variants_info[pos][i] = [var_name, dp, ad, var_filter, gt]
+ variants_info[pos][i] = [var_name, dp, ad, var_filter, gt, counter]
if hap_name in two_cp_haplotypes:
- variants_info[pos][i + 1] = [var_name, dp, ad, var_filter, gt]
+ variants_info[pos][i + 1] = [
+ var_name,
+ dp,
+ ad,
+ var_filter,
+ gt,
+ counter,
+ ]
if hap_name in two_cp_haplotypes:
i += 1
- hap_ids.append(hap_name)
+ this_hap_info_cp2 = [this_hap_info[0] + "_cp2"] + this_hap_info[1:]
+ hap_info.append(this_hap_info_cp2)
i += 1
bamh.close()
refh.close()
- if gene2 is False:
- self.merge_vcf([(variants_info, hap_ids)])
- else:
- return variants_info, hap_ids
+ return variants_info, hap_info
+
+ def run(self):
+ if self.call_sum.get("final_haplotypes") is None:
+ return
+ variants_info, hap_info = self.run_without_realign()
+ self.merge_vcf([(variants_info, hap_info)])
class TwoGeneVcfGenerater(VcfGenerater):
@@ -1033,11 +1073,10 @@ def separate_two_genes(self):
gene2_haps.setdefault(hap, hap_name)
elif self.gene == "ikbkg":
for hap, hap_name in all_haps.items():
- if "dup" not in hap_name:
- if "pseudo" not in hap_name:
- gene1_haps.setdefault(hap, hap_name)
- else:
- gene2_haps.setdefault(hap, hap_name)
+ if "pseudo" not in hap_name:
+ gene1_haps.setdefault(hap, hap_name)
+ else:
+ gene2_haps.setdefault(hap, hap_name)
return gene1_haps, gene2_haps
def run(self):
@@ -1049,11 +1088,11 @@ def run(self):
if call_sum.get("final_haplotypes") is None:
return
gene1_haps, gene2_haps = self.separate_two_genes()
- vars_gene1, gene1_hap_ids = self.run_without_realign(
+ vars_gene1, gene1_hap_info = self.run_without_realign(
gene2=True,
final_haps=gene1_haps,
)
- vars_gene2, gene2_hap_ids = self.run_without_realign(
+ vars_gene2, gene2_hap_info = self.run_without_realign(
gene2=True,
final_haps=gene2_haps,
match_range=True,
@@ -1063,6 +1102,16 @@ def run(self):
and vars_gene2 != {}
and list(vars_gene1.keys())[0] < list(vars_gene2.keys())[0]
):
- self.merge_vcf([(vars_gene1, gene1_hap_ids), (vars_gene2, gene2_hap_ids)])
+ self.merge_vcf(
+ [
+ (vars_gene1, gene1_hap_info),
+ (vars_gene2, gene2_hap_info),
+ ]
+ )
else:
- self.merge_vcf([(vars_gene2, gene2_hap_ids), (vars_gene1, gene1_hap_ids)])
+ self.merge_vcf(
+ [
+ (vars_gene2, gene2_hap_info),
+ (vars_gene1, gene1_hap_info),
+ ]
+ )
diff --git a/tests/test_data/AMY2A_ref.fa b/tests/test_data/AMY2A_ref.fa
new file mode 100644
index 0000000..b14a160
--- /dev/null
+++ b/tests/test_data/AMY2A_ref.fa
@@ -0,0 +1,335 @@
+>chr1_103611603_103631603
+ATTAGTAACAGCAGGATGAAGGGCAGATTATGGAGTTCCAGATGTTTGTAGTCCAGTAGA
+TAGGAGGAGCATGTCAAATTTTTGTGATATTTTCTCAGTGAAACAGGAAGCAAGCTCATA
+AAGAGGGAGGAAGTCACGAAGGAGTAAGGAAGGAAAAGGAGATGTAAAATGAATAGATAG
+GAGTGTAGGAAAGTAAGCATACTCTTAGGAAATCTGTGGTTTATTTAAGTGGTATCAGTC
+AATGTGGTTCTGTGTTTTTCTCCAGTTGAGTTCAGCATAGGTGCTATCACTGAAAAAGTA
+CAGAATTCAGTTTAACTAAAGGTGAGATTTCTCTAGGTGAATAAAACAGTCAGTGATGAG
+ATGTTGATACATTTAAGAGAATGATTATAAAAGAGGGATGTGGAAAGTAGTCTGGGTTAA
+GTAGTGAAGTGAGAAAATAAAGAAGGTTGAGACAGTGGAAAATGGAATGATCAATTGTAG
+ATTGCAGTGTGGTTGAAGATGCAATGGACTTCTTGAGGTGGGAACTAGAAATTTGGGGGA
+TATAAATAAGAGAGGGGTGACTGAAATTGAAAATATGAGAGTATAATAGTATACAATATT
+ACTAAGTATGGTATACTAAGTATAGTAGAATAGTACTAAGTATAGTAGAATAGTACTAAG
+TATATAGTACTTAGTGAGTATAAGTTTTAGGATGTGAAGAGGGGAGTAGCGTAGAGGCAA
+GGTGATGGAAGCAGATGAGTTCAAGAAAATGAGAACCTAGGAGATTGAAACCATTATCTA
+AATGAATTCTGAAATCTCAAGAAGAACTGTGCTGAGTGCTTGTAGTCATTATTTTGTCTT
+TACTACTCTGTGAGAAGACAGCACTTCCTTTTGTGAGTAATGGAAAGTTCTTTAATATGT
+CAACTGTAAGATACTTTATTACATTTAATTATCTTTAGTAATGCATAGTAATAACTAATG
+ACCTCAGTAGGATTATTCAGTAGATATGTGAGCTAACCCCCTGTGTCAGACACCTAAGAG
+GAACAACTGATGTTTTTCAAGACGAATAGAGGGTTCTCATCCCAGGGTTTTAAGCTGCAA
+AATATCCCACCCTATATTTTCTGAAAACAAACGTTATATAGAGTGTCCACATAAAAGCTA
+ACAGGAAATATTACACACACACACACACACACACACACACACACACACAGCATTGAAAAT
+GCATCAGGAAAGATATACCCTGTACCCAACATGCACAAGGTCTAAGCAATGAAGAAAATC
+TAGGAAATCAAATAAGACCAGAACATCTATGACTAATATATAAATACTTGAGGCAGTTAT
+ATCCATTTAAAAGGTGTCAATTAGACATGAAAGAAAAAGGCCAGGTGCAGTGGCTCATGC
+TTGTAATCCCAGCACTTTGGGAGGCCAACGCAGGCGGATCAACGTGAGGTCAGGGGTTTG
+AGACCAGCTTGGCCAACAAGGTGAGACCCCGTCTCTACTAAAAATACAAAAAAAATTGCT
+GGTTATGGTTGCACGTGCCTGTGGTCCCATCTACTTAGGAGGCTGAGGTAGGAGAATCAT
+CTGAACCCAGAGGTAGAGATTGCAGTGAGCTGAGATTGCGCCATTGCACTCCAGCTTGGG
+TGATGGAGTAAGATTAAAATATATATATATACATTTTTACATAAGAAATTTTATTTTAAT
+TTTTTTGTAAATTAGAAAATAGCAATTAATATGGTATTTATATACCAGAATCTTTGAATA
+TCTAAAATTTAAGAAGATGTGAGCAATTTCATGAAACTTTTCAAAGCTTAGAAAAAGGTA
+GCTATGTAAAATAACAATGTATGTTGTTTATGGGAACATTAGATGAAGTAAGTCATGAAA
+ACATGCATAGAAATGATCATATAAACTTGCAAGAAATTGAATGTCTTGCTATCAAAGGAA
+AGGATGGAGGTGTAAGAGACTGAGGTAGAATGTGAGAAGCTCTGTATATATTGTATTTTC
+CTTAAAAATATATATTTAAACTCGCATTTCTTGTTTTTCAAATGTTCACTCTAATGAAAA
+TTCTTTTATTTTTCCTGTTACAGTTTTTATTTCTGATCTCATATATGCAGTGTACTGTTG
+TTCAGAAATATCATTGATCCTCCCTGCAAATGAAATACTCTCATCCTGTTGAATTTGAGA
+ATGTACACATGATTTGCTTTGGCTTTGAAATGTGAGCAGAAAGAATGTTTTACTTTGGGG
+AGTTCCCAATTGCTAAACCATGTGTCAAAATGAAGTTTTTGTGACATTACAAATGGACTA
+ATAAGAGTGCCTTTGTTGATCCTAGTTGAACATGAGAAACTTGGTTATTGTAAGCCACTG
+AAAAATTTTAGTTTCTTTTATTACTGCAGAATAGCCTATTCTGTCCTGATTGATACAGCC
+ACAACATTTCCTTAAGCTCTTTGATTTTCATATTGCCATTAGGATGTAGAATGCCAAGAG
+AAGGATAAATCATAGGATCAGAGAATGTTAGCAGTGAACGAGACTATTGGAATACATTTT
+TTCAATGGTTTCCAGACTTTGGGTTCTAAGTAATCAGTAATATTTCGAAAAAGTAGTAAA
+AGATATTTCAATATAAAAAATAATGACATTTAACTTTGAAAAAAAAAACCTATCATTGCT
+AGGCATAATGGGTCTAGCTTGTAATCCCAACAACTTGGGAGGCTGAGGAGGGAGGATCAC
+TTAAAGTCCAGGAGTTTGAGGCTGCAGTGAACCATGATTGCATCTCCACACTCCAGCCTG
+GGTGACATAGGGCCACCTCAACAACAACCACCACCAAAGCTATCATTAATACATTGTTCC
+TCTGTCTCTCTTCCCATCCTCCTTACCACTTAGTCTTAAATCTAATCTTATTAAACAATA
+TCTAAGGAAAAGATAAGAAAAATGATGGTTGCATCTATACTGAACATGTGTTGACTTTTT
+TCTTGTCACTATTTTCTAAACAACACAGTATCACAACTTTTGACATAGCATTTATGTTGC
+ATTAGTTATTATAAGTAATCTGAGATGATGTAAAGGTTATATGCAAATATTATATTTTAT
+ATAAAGGACTTGAGTAATTATTACTTTAAGTATCCATGGTGGTTCCTGGAACAAATTCCT
+CATGGATATGGAGGGAGGAGGAATCACTGTAATTAAACAAATCATCACTTTAAAACTTAA
+AACCTTTTGCATGTTGAATACAGTCATGCACCACATAAATGATGGTGGTCCCATTAAGTT
+TATTATGCCATATTTTTATCGGAAGTTTCCTATGTTTAAGTGTATTTAGATACACAAGTA
+CACTGTGTTACAGTTGCCTATAGTATTCAGTAAAATAACATGCTTTACAGGTTTGTAGTC
+TAGGAGAAATAGGCTAAGCCATCTAGGTTAATGAAGTCCATTCTATGACATTTGCACAGT
+GGCTTAATTACCTAAGGGCTTTCTCTTCAGAACCTACTCTTGTCATTAAGCAATGCATGA
+CTGTAGTTGGACAAAAAATAATAATGCAGGCATACCTTTGGCACTGTCCTAATTCTTACC
+AGCATCCATGATTGCTTAAATTTGCACTGGAAAGACTTCAAAAAACAGTGATATATTAGG
+ACAATATATTTCCAGTACATGACAAATGATAACAACATAAGAACTCGAAACACTCAATAA
+AAAACAAAAAATAAAGCAAAACTAGATGGGAAAAGGAAGAAGCAAATTTCAAGAGAAAAC
+TCAAAAGGCCAAAAAATGTGAAATGATGTTCAGCCTTGCTAGTAATTGGGAAAAACACAA
+ATCATTTTTTTTTTTTTTTTTTTTGAGATGGAGTCTCCCTCTGTCACCCACGCTGGAGTG
+CATTGGCGTGATCTGGGCTCACTGCCAGCTCCACCTACCACCTTCAGGCCGCCATTCCCC
+TGCCTCAGGCTCCAGAGTAGCTGGGACTACAGGGGCCCACCACCATGCCTGGCTAGTTTT
+TTGTGTTTTTAGTAGAGACAGGGTTTCACCATGTTAGCCAGGATGGTCTCCGTCTCCTGA
+CCTCGTGATCTGCCCGCCTCAGCTTCCGAAAGTGCTGGGATTACAGGAGTGAGACACGTG
+CCCGACCCAAATCATTTTTAATCTATTAGTCAAAAATTAGAGCTATGATAATATTAATTT
+TTGGTGGGAGTGTGGCAAAAGGTACTATAATATACTGCTACTAGAGAGTAAAAATTGATA
+AATTTTGTAAGGGCAATTTTGCACAATATGAAAATATAAAAATATGCTTCATGTTTACCA
+TACATTTATCTAGCATACAGAATTACCCATGTGTAAACATGTATACAGATGTTCATTATA
+ACACTTCTTATAAAAACAAAATATTTGGAAATAAATGTTCATATTAATAAGGCGCATACC
+GTATGATCTTAGTCCAGTTAGAATATTCTGTTTTATTTCAATCCTTTAAAAGACTCAACT
+TCTGACTCTATATAGACGATTAAAAAAAGAATGTGTTCTCCCTTTGTGCATTTGGTCAGG
+TAAATTAAAAAATACACCACATGCTAGCCGCACCAAACTGGAATAAGCCTTTGGAAAGAA
+GTTGTCCTTGAAGCTTGTATCTGACATTGTAGCAGGACGAGCCTCAGACAAAACCTCTCA
+GACACTGAGTTGTAGAAGGAAGGGCTTTATTCAGCTGGGAGCATCGGCCAGCTACTGTCT
+CAAAATCTGAGCTCCCGGAGTGCACAATTTCTGTCCTTTTTAAGGGCTCACAACACTAAA
+GATTTCACATGAAAGGGTCGTGATTGATTTGAGCAAGCAAGGGATACGTGACAAGGACTA
+CATTCACTGCTGGTCAGGGAGAAACAGAACAGGGCAGGGAGTTTCACAGAGTTCTTTTAT
+ACAATGTCTGGAATCTGTGAATAATATCAGCTTCTAAATCATAAGTTGATTTTTAGCTAC
+TGGGTTTAGGCCAATCAGGCCCAGGCCTGGTTTCAGGCCTGGCGCTGGGCTGCATGTCTT
+TGGTTGTACTTCCTGGTTGTTTTTACTGAATAGAAAACAATATAAAACAAGGAGAGGGTC
+TTTGTCTCCTCTCAATATCAGCACTGGATTGTAGAATGTGTTGCTGATTTTGACCTTGTA
+TTCAAGTTAACTGTTGCCCTTGGTATCTGTACATATCTTTGATTTCAGTCTTTACTACAC
+GTGGCTTGGTCACTTCATGGCTAAAAACATGCTTGTGGAAGACCAGTCTGGCTCGGTGAG
+TCTGTGCGGCCAGCAGTCTCTGATCTGTACAGGGTATTAATGTGTCAGGGCTGAGTGTTC
+TGGGATTTGTCTAGAGGCTGGTAAGGGCTTCTGGACCACTTGTTTCTGTCCTGTCAGTCT
+GTCAGGGTTGGAAAGTCCAAGCCATAGGACCCAGTTTCCTTTCTTAGCTTACGTTATCTA
+CCAGAGCACCGTGGGCTGTTACTTACCTTGAGTTGGAAGGGGTTCGCATTTATACCTGTA
+AAAGTATTCATCCTTTTAATTTATGTAAAGTTTTTTTGTATGCCATTCTGGATCTTTAAA
+GAGATGACAACAAATTTTGGTTTTCTACTGTTATGTGAGAACATTAGGCCCCAGCAACAG
+GTCACTGTTTAAGGAAAAATAAAAGTGCTGCCAGAACCTAAGAAAAACATTAATATCTAA
+AAGGTCATTTAGATGATTTCCATGAGAGACTTTTTGATGTTCTTTACCTGTTAGGATTAT
+TATTGATAATCCTTTTCAGATTATGAATAAACAGTTTGCCCTCAAGTATTTATTCATGCT
+AATATTTACTTTGTAAAATGTGCTTCTTACAGGAATATAAATAGTTTCTGGAAAGGACAC
+TGACAACTTCAAAGCAAAATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGG
+GCTCAGTATTCCCCAAATACACAACAAGGACGGACATCTATTGTTCATCTGTTTGAATGG
+CGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTTAGCTCCGAAGGGATTTGGAGGG
+GTTCAGGTGGGTATGATTCATAGTATCAATTGCGGAATTCACTGTGCTTGTAGGAAATAG
+TATTCTGATCTTATCTGTGAAGCTTGGGCAACATTTTACTTCACAGGTAAGTATTCTAAG
+TAAAAGAGATTTCTGAGGGAAAATCTATGTAGTATTCTTGGCAACTTTATATTTTGTTTC
+TGAGATAATCTTTCTTCACCAAGAGCCCTCCAATGTGCTGTTAATATTTTCAAGAGATAG
+CTGCCTATACCAAGATTCAAGAATCTTTTATACTATTGATTAGTTTCTAGAACATTCAAT
+GATACACAGTAAGACAGAATTTGGTACTTATGAAGACTGTTTAATTTGTAGGTCTCTCCA
+CCAAATGAAAATGTTGCAATTTACAACCCTTTCAGACCTTGGTGGGAAAGATACCAACCA
+GTTAGCTATAAATTATGCACAAGATCTGGAAATGAAGATGAATTTAGAAACATGGTGACT
+AGATGTAACAATGTTGGGGTAAGTGAATTCTAGTTTCCTTTAAAAATAACAGACAGGAAA
+ATGGTTTCTCTCTCTTCTTTCTTGCTCCTTTTCAGCAGAAAATTTTCCGTATTTTATTTT
+TTTAATTTTACTTCATAATTTAAAACTCAAAATTAACTGTTTATTTATGTTCAACTTTTG
+TGAATATTTGTGTGTGTGCTATCTACTAAAGAGGTAAGTTAAAGTTTAAATCAGAATTTG
+CTTCTAAAGCAAAACATCAAATTTTAACCCTTATAACTGTTCGTATTTCCCGGAAACAAT
+TTACTGGTTAGGAAGTATAATTCCAGTTACAATGTTTGCTATCATTTTTAGGTGACTTGT
+GTCTCCATCCGTAATTCTTGGGTTTTTCATGGTGAATAGCTAGCTTCTCTATTTAATGAG
+GAGCATAAATTGAGATTAATAGCTACCTTGTTTGTCTTCAAAAGCTTCATAGAGAGTACA
+GGCTTTCTCCTGGTGACCCACTGAAATTTCCAAAACAATAACCTTTCCACTCTCATCTGA
+GTTTTGTCTTCCCAAAGTGGGCTTTTTGCATTTCCTCCTATTTATGGTAGTTTCTGGTCT
+CTCAATTTATCATTCCTATAAATATTTGACCAAGTGTCTAGAAGGCATGTAGGTGTTTAG
+TTCACATTACTTTCCTTTCACAGTTGATTTTTGATCTTGTAGGAAAATAATTATAAGATA
+TCATGAAATATTTTGGAGTTTTATTAACATACTATAAACTTGCATCAATAATGCTTTAAA
+TTTCTACCTCTCTGTAAGTCACACTGAAGTAGAAACTTTGTTTTCTAGGTTCGTATTTAT
+GTGGATGCTGTAATTAATCATATGTGTGGTAACGCTGTGAGTGCAGGAACAAGCAGTACC
+TGTGGAAGTTACTTCAACCCTGGAAGTAGGGACTTTCCAGCAGTCCCATATTCTGGATGG
+GATTTCAATGATGGTAAATGTAAAACTGGAAGTGGAGATATCGAGAACTACAATGATGCT
+ACTCAGGTAATTTTTTTACGAGAGTGATCTGAATAAAAGAGTAATATATGCCTTTTCTTG
+TAGACATGTAGCTAATTGAATTTCATTTAAAATAGGAATTTAGATCTCTTAGGGACAGAA
+GTTAACAAGTTTGACTACTTTAAGAAACTCAAATCCATATTTAAGAACTTTCAAATATTG
+ATTTAAGATTTTTAATCAATACACATTTGCCCACTTCTAAGAAGTTCCCAATTAAAAATC
+TCATCGACTTTATTTCCTAAATTCTCTATTTTCTATTAGAAAATATTTCAAAGATACATC
+TGTAGTAGAATGTGAGCATCCCCAGTGCCCAATGCAAGGAAGTCACTATAGAATATCTCT
+TGAGGAATCATGGAATAAATGAATAATCCAATGGATTCTCAGGAGAAAAATGAGGTTTTA
+TGAATCAATCATAACATTTTTACCTCAACAGGTCAGAGATTGTCGTCTGACTGGTCTTCT
+TGATCTTGCACTGGAGAAGGATTACGTGCGTTCTAAGATTGCCGAATATATGAACCATCT
+CATTGACATTGGTGTTGCAGGGTTCAGACTTGATGCTTCCAAGCACATGTGGCCTGGAGA
+CATAAAGGCAATTTTGGACAAACTGCATAATCTAAACAGTAACTGGTTCCCTGCAGGAAG
+TAAACCTTTCATTTACCAGGAGGTACATCAATACTTATATGCCTATAAAATATCATCTTA
+TTCGTTAGAAAATTAATGGAAGATTTAATTAAAAATGCAATTTCTGTAGGATAAGGAATG
+AGACATTTACATAAAACAGTGTTCTTTAACCTCCTCTTCTTCACATACAGCATATCTAAT
+TCTTTATCACAACATGTTTTATGGAGGTACACAGAATGTAGGATACTGATAATAGTTATG
+TCTTTACTTTCTTTGGATAATGAAACAAGTTAATATTTATCAAGGAATTTCAGTCGATAC
+TAAATGTTTTATTAGTGTGAGCTCTTATTATTATCATTGATGTAGAAGACTAAAAATTAG
+GTAAGTATTTTCACAGGACAACAGGTATCTTTGACATTATGCTTCTTTCAATATTGTAGC
+CTATACTTTATCAAATAAAAGAATATAAGAATATTACCTGTTGAGATAATAGGAATAAGA
+AAACCATTTTGCACATTTCGTGTAACAAACAGGACCAGGCATGGTGGCTCGTGCATGTAA
+TCTCAGCACTTTGGGAGGCTGCGGCAGGGGGATTGCTTGAGGCCAGGAGTTTGGGACCAT
+CCTGGACGACATAGCAAGACCCTGTCTCTAAAAAACAAAACAAAACAAGACAAAAAGAAA
+TAATAAATAGCTTAATTTATTAATAAATAACAAATAGCATAAAGCTATTTTTATATAATA
+TTAACTTATTGGTTAAAATGCTTTAAAGTCCTTACACAAAATGTTATTGTTTCCTAAATT
+TCTACTAGGTAATTGATCTGGGTGGTGAGCCAATTAAAAGCAGTGACTACTTTGGTAATG
+GCCGGGTGACAGAATTCAAGTATGGTGCAAAACTCGGCACAGTTATTCGCAAGTGGAATG
+GAGAGAAGATGTCTTACTTAAAGTAAATAAATACAACTTTTCCCTGAAGTATTTCATGGA
+TCTATTAGTCATACTACCTCAGTGTGACTTATCTTCTGGAACATTCTTATTCAGACAACT
+ATCAAAGAGTCAATTGTGAATGATAAGTATTCTAGTGCCCTAAACTCTAATCAATCATCT
+TTTGTATTTAGAGTGTCTGTCACAAGGCAATATGTCTAGGAACGCTAAACATACCCTAGG
+AGTTTTCATCTAAGTACGAGATGAATATACTGGATTTGACTGATGTTTGCATATAATCTT
+TTAAAGCCAGGTTATTATTAAAATGATCCTATCATTTATGAAGTATGTACAAAGTTTCCA
+TCCTGTAGAATTTACATGTATTATATGAATTAAAAATATAAAAAATATTTATATTATAAC
+AATACAGTATTGAAGCCTTATTTTAATCTAGTTTGATTAAACTAGATTGGTCTAGTTTGA
+TTAAACTAGATTAAACTTCTTGGTCTAGGCACATGAATATTGTTGTGGGGAAGAGAATTC
+TGTATAATGTGATATGGATATTGATCCTTCTGGAGTGCCTCTAAATGATAATGTGCTGAA
+ACCTCTGAAAGGACCTTTTTTAATAACAAAAATCTTATATTTGTAATATGAATGTAAGTA
+TTCCATACATGTATATACAAATATGGACCATACATGTAGATTACACATGTGTGTGTGTTT
+GATGTGTGTGTATATATATGTGTGTGTGTGTTTGTGTGTGTATATATATATATATCTCTT
+ACAGGAAAAGCATTTAATTAGAGAAAGAATTTAATCTTCAGATGCCATGCCTTACAGAGA
+GAGATGCACAGTTAAGTTACTCTCAAACTGTTGTGAAATGATACATCAACGTATATCTTA
+TTTTTCAAAAATAGGAACTGGGGAGAAGGTTGGGGTTTCGTACCTTCTGACAGAGCGCTT
+GTCTTTGTGGATAACCATGACAATCAACGAGGACATGGGGCTGGAGGAGCCTCTATTCTT
+ACCTTCTGGGATGCTAGGTAGAAAACCAAGTTCTCTATTTTTTTAACACGTCTTTTAATG
+ATGGCAAGAATATTCTGACATCCTATGAAAATATAATTATGTAACTTCCAGGCTGTACAA
+AATGGCAGTTGGATTTATGCTTGCTCATCCTTACGGATTTACACGAGTAATGTCAAGCTA
+CCGTTGGCCAAGACAGTTTCAAAATGGAAACGTAAGTTTTGGAGTTGTTCAATATATCCT
+TTTCTGAAGAAAAAGGAAGCAATCTTATTCTAACTTAACATGACAACTATTAATTATATA
+TTTATTCAACAAATATTTAATTGATTGTAAACTGGATACAGGGCTGTGATTTTAGTAATG
+TAGGTTATATTAAAGGAGTAAAATTTATATTCTCCATTGACAAAGAGTATGCAAGCTGTT
+TCAGAGATATGACAAACATCCCCTTAGCCCGCAGGGAAACAAAAAACAAAAAACAGAAAA
+ACACTCAAAACTAAGAGCTAGACACAGGGATTAAAATATATACTTCGAATAAGTACCTAC
+CTCAGGGCTGATAGGAAGATTATACATGCCAACACTTTTAGAGAACTTAAAACATCATCT
+GCCCGTAGTGAGAACAATATAAATGTTTGTTAAATACTTTTAAAAAGTTATATGGAATAG
+AAAGAAATGAATCAATTGAGCTGAGTTAAATAGGGAAAGTATCATATAAGAGGAAGGAAA
+TGATATGTACTAAAGAATAGAAATTTAGAGAGTATTCCAAGAAAGGTAAGAATGAGAAAA
+ATATTTGGGAGTATGGTAAAGACATTAATCTGATGAGAAGTTTCAAAAAGGAACAGAGAA
+ATTACAGTGTAAAGATATTTGGAAAGCTAGTAGAAGGTTTTCTTTTAAACTAAAGGGTTC
+AGAAACAGCATCAGAGACTTCAGAACTAAAGCAGAAATTCCTCCTTCCTATGAGTCACAC
+GGATATCTAGCTAGCTTTTTTTAGATTCCTTTCAGTTTGAGAAGTCCGCACTTTGTATAG
+CAATTTATTCTATTGTTAAACAGCTTTAATATTTAGAAGGTGTACTTTTATATTGAGCCA
+ACTTCTTTTAATTTCTACTAATTGGTCTTATTTCTGATGTTAGGAGTCATAGAGTATTTT
+TATTTTTTCTATTACTATAACATTTCCACTTTGCCAGGACTGCTGCATGCTAAAAACTCT
+TAGTTTTGTTCACTTTTCACCATATGACATGATTCTAAGGTCAACACTGAAAAACTTCCT
+AAGATTCCTCTGGATTTTTTAATGAAGATTTTTTAAAGTGCCAATCAGAAAACCATAATA
+TGAAAAATGTGGTCAATTTATAAATAATGTTCAGATGTATTGTTTTGTACATTTATCTAA
+AAAGAAGCATGAATGATTCTAATATTTATTCAGCACATGTCACATTCAAGGCATTTTCAC
+ATATATTACTTAATTTTTATAGCAAAAAAACCAATATTCCCATTTTACAAATGAAGAAAC
+TGAGACACAGAGATATTAAGTGTATTGATTAAATTTTCTCAGGTACTAGTAATAGAGCCT
+ATGTTTTAATCCTGGTGTTTCTAGTACTAATGCCCTTCCCATTTCAATGACATTGCATGG
+CTTACCACGATGTTAAGAAGCTCTTGCAGGCCAGGTGCAGTGGCTCACACCTGTAATCCC
+AGCACCTAGTGAGGCGAGGCGAGAAGATCAGCTGACCTGAGGAGTTCAAGACCTATCTGG
+GCAAGCTAGCAAGACCTCGTCTTTACTGAAAATTTTTTAAAAATTAGCTGGTTGCGGTGG
+TGCACACCAACAGTCTTAGCTACTTAGGAGGCTGAGATAGGAGGATCGCTTGAGCCTGGG
+AGATCAAGGCTGCAGTGAGCTATGATCATGCCACTGTACTCCAGCCTGGGTGACAGAGCA
+AAGAAGCCTTTGCAGTTCTTTGGAATGAAAAGGAGAGGATAAAAATTTGTTACCTTGTTT
+GAAATATGCCAGAAGAAAACCAGAGGATAGAGAGATGATGAAGACCCAGTAAAGGGCTAT
+AAACATTAATGAAGGCATTGGATTCTAGATAAAGTCACTGAATGCAGAGACACAAGTAAC
+AGGATTGGTTGGGTTTGGTGTAAAGGAGAAGGAAGAGGTAAATACATGCATAGTAAAATT
+TGGCTTTTTCCCCCCTACTTAAGGATGTTAATGATTGGGTTGGGCCACCAAATAATAATG
+GAGTAATTAAAGAAGTTACTATTAATCCAGACACTACTTGTGGCAATGACTGGGTCTGTG
+AACATCGATGGCGCCAAATAAGGTGAGAATATGTATTTAGACATGTCCTCTAATAGTAAA
+CTTTCCACTGCATTTTATTTAAAACAGTTGAAGTTTAAGAATATCAACGTTTTATATGGT
+ATTGTGTTTTTAGGAACATGGTTATTTTCCGCAATGTAGTGGATGGCCAGCCTTTTACAA
+ATTGGTATGATAATGGGAGCAACCAAGTGGCTTTTGGGAGAGGAAACAGAGGATTCATTG
+TTTTCAACAATGATGACTGGTAAGTAAATATCAATTAAAAATAATATTTTGTACCAGTAT
+GTTCTTGGTTTATTCTTTTTTTTCTCTGTTCATTGACTTTTATCATATCTGAAAAATCAT
+GTAGTCAGTGGAGCAAGAAGACAATAGAGATCAAAATTGGGCAGAAGCAAAAGGATGATG
+GCTGTTACTCCTTCGTTCTTTTTTTTTCATAAGTGCTTTCTGTTGTAAGCAGAATCCTTT
+CTGTGCACCCTTGCAGTATCATATGCATATATATGATGCACATGCATATGCTCACCTACA
+CATGCCACAAAATCAATATATAAAATCACAATCAATATAAGGATTGTGAAATCATTAAAA
+AAAAGTGTCTTATAATCCTGCTTTTTTAACCATGGAGAAATGCTGCCTAGGTACTAAAAT
+ATCTTTATTTCTAACTCTTTTTCTCAATGACTGCTCTACGTAGTTTTTTGGTACACTTTC
+TTCACTTCTCTGTCTCCTTGTGACAAATAACATTTTTAAAGCACATGTATGAATAATATG
+TATCTTGTGGTTACTGTTTTGCTTCAGGAGATTTGAGTTTTATTTTTGAAACTTCTCATT
+ATTGGCCTTTCATCTGTGATTCTTATATGCTTTACCTGAAGCATAAATGATTATCTAAGA
+TATAGCTCAGAAGACCTTGGTGTAAACAGTTGAATTGTCCCTGTCCAAGACCAACTGACA
+CTCATACTTAGCTCACTCTAGTATAAATTATATTTCACTGATGAAAAGTAAATAAATACA
+TCAATATACAAGTCAAGTTGATCTCTTTCCTGCCAAAAAAGCCCATGTCTAGTTTTTTAA
+TTTCTTTCTTTTGTAGATGAAAATACCAAAATAAGTTTTTGTGAAAAAGCTTTATATTTC
+AAACTATCACTTCTTCATAGAAATGCTAGATTATTTTTATGTGCATCCCAAATTCGTTTT
+GATCTCATGGGAGAAGAAGGAGGTTAAAAAATAATACCCTTTAAATTTTTAAGAGTAATT
+GGTATTAATTTCAGTTGAGAACAAATTTGATTTTACCAAGGTAGAACTTTTATCAAAGTG
+TGACCGTTCCTGCCAATCTTCAGTGATATTCTTCAACTTTGATGTTTTGGTAATATTTTC
+ACTACTAACCAGGAAATTGCTAGGTTTTCTGTAAGGTTACTTTTGGTCCTAGAAAGCTAT
+TTCCACCTACTAGAGAGGCATATGGGTTTTCTTCTTAATAAGACTTCACTGCTTAGGTTT
+GTTTCTACAACATAAAGTTATGCTGTTTATTTGTGTTAGTCTGTATTCTTGATTTTCATT
+GTATTGAAGATCAACCTTAAATTTTATTTTACAGGTCATTTTCTTTAACTTTGCAAACTG
+GTCTTCCTGCTGGCACATACTGTGATGTCATTTCTGGAGATAAAATTAATGGCAATTGCA
+CAGGCATTAAAATTTACGTTTCTGATGATGGCAAAGCTCATTTTTCTATTAGTAACTCTG
+CTGAAGATCCATTTATTGCAATTCATGCTGAATCTAAATTGTAAAATTTAAAATTAAATG
+CATGTCCTCAAAACAATAGCCAAGTGTGTTTCTTTTCTTACATGTACAGCAGTACTTATA
+TTTCATTAATTTTTACTAAAAGCTCTAATTAGTAATTAGAGGTTCAAAAAAGTTGATAGT
+GTGAAACACAATTCCAGGTTACACAGAAGTTATTTATTTTGGCAAAAGGATGTCTCAGAA
+ATTTTAAAACTAGGCAACAACCTTTACTCATTAAGAGGGAAGACTTAGCTTTCCAAACAG
+TCTGTCTCCTGTCTACTCTTTTCTCTCCTTGGCAGTCTACCAACAAGGCAAACCAAATAT
+TTCATTATTCTTCTCTATTACATGAAAAATCTGTACAAGGGAAAGAAAACCAAATTTTAC
+CCTTACATTGGTTTCAAAACATTCCTTTTTCCATAGGCAACGTTTACATTTTTATGCCTT
+TTTATAATCTTTTATGACAAACACATTTTACTGTTTTTACACACATTTTACACACTTTGC
+ATGTAAATTTATTTTTAGTTGTCTTATTTACATGTTTTAATGGTAAATCTTAACTTTGCC
+ATTTTAGTTAAAATGGTAACTTTAATGTAAAACCTGGTAAGTTTTTTCAATTATGTACTA
+GATGCAGATAAAGTCTGACTTTTTCCATCCTAGTTAGGGGCATGGTTAATTTCATATGTC
+CCCAGGCCTTATCAAGTTGTAAAGTAGGCAGTATACAACCTTGAAACATTTAGCAAACCT
+GGTATCTAACTTATATGATTTAAACCTCCTATTTACGTGTTGATGATATTTGCATTTTCC
+AATTTGATCTTTAAAACAATTTTTATTTCTTAGAGATGAAAGTCACATAAACTAAAAGGC
+ATTGCAGTTTTAACTTTCCTCTAAAATGTTTGATTTAAGTGCATATTTTTATTTCAGGCA
+ATCAGTTATAGCTCGTTACAGACATCACACACAACACATATATCATTACACAGACAAACA
+GAAGCAGATCCAGTAGTTCTAAGATTTCTTCCTGTCCCACTTCCTAATTGAATTATTGGC
+TTCCCAGTAGAGCCCTTTAAGAGCAGGGATAGAAAAACCATGAAGTTTCTAGGGCCTAAT
+CAACTTGTATAGCTGTAAGACAAAAGCAGATTTTGAGTGGGATCTATCAGCCTCTAAATT
+CTGGGATTCCATCAGGAAAACAGAGGTTTCTCCCAAAATGGAATCCACTGTGCCTTTTCT
+GTTTTTCCCAAAAAGTTTCAGGCCACCAGACATTACCTTGGGTGCATCAAGAGTGGAAAG
+ACAGAGCGGAGGACAGTAATTCAGCCAACAGAAAAATTCTTTTCAAGAAAAACACAATCC
+AAGAAGAGAAAAACATACGGGCCTTTTAAATATACCTATAACTTGAATATCCACTTTTAA
+TCAAGCTGAGCACTCTTTCAGAAATTCCTTTGAAATCTTCTGTTACCTGACTTTAGCTAA
+ATCAAGCAGGTAATATTTCTGGCTTTTGAACTTTACTAAAAGTAACATCACAGGTGAAAA
+CAACAACTCTCAATCAGGTTATAAGTTAACTGTGAGTATTCAAGATATTTTATAAGTGGT
+GGTAAGCAGCTTTTACTAGATCTAGAAACTTTAAAGGCAATTTAGAGAAAGGAAGATTTC
+AGAAAGGAAATTAGAGTTGTTCATGGAGGAGGAAGAGATGATTAAAGGTCATACAGCTAT
+TAAATTGAAAGTATTCATCACCCAAGCTAGGATTGAACCTGGGCCACCATTGTAAAATGG
+CAATGGCTAAAAAAGTCCTGCCACAAGGTTACAGGTTATGCTCAAGACATAAAACAAGAT
+AGAGGCATGCAGCAAAGTTTGTTACTTACCAGTTTATGTGGGTGACTTGAATAGTGAGCT
+TATGGAGTCCCAGGCCTGGATTCCATTCCAACGTATTCCAACCCGTTCCAAGATGGGTTG
+TTACCCATGCAGAAAACGGGGGGAATATAAGGTGTCCCTTAGTCTCCTTTCTCCTTTTGA
+AGTGACCCAGGATGAAGCAAAAGATTACAGGGGTGTCCCTGTTCTCCTCTTTCCTCCCAT
+CTCCTTTGGGTCCCGGTAACCATCATAGGTGCTGTCCATGGATGAAAGCATGAATTGCAC
+CCATGGATGTGGAGGTGCTAGCTGGCAGGAGTAGTCATTTTTACCAGCACAATGCCTCCT
+CACACTGCTCTTCTGGGTTCCTAGGCCTCCCAGGAGATTCTACAGTAGATAAAGTTGGGT
+GAGACACTTTAATAGAGGGAGTGTTTTAACCCTATTCCTGCCTCCTCTAGTTATGGACCT
+GGAAAAGCAGTGCATTCCCAGAAAATTTTATCCATTGACATTTAAATATAAAATCCCCTT
+TCTGTTTAAATGCCAATGTGGTTGGAAGCAGAACAGGTGTCTCAAAAGAACATATAGATT
+TAATGGCTGTCCTCCTTCTGATGGAAACAGCACTAAGGCTAGAATTTGTCTCTCAAGGGT
+GGCTTCCTCCCAACTGTTGAATGCGGAGTTTTTTTCCTTAGAAATGGGGCATAGGGTCTG
+CTTGCTGTTAGAGGAACACAAAAGAGGGAGAAATCTGGGCATTAGAATTTTTTGGCGAAC
+GGCCACCAAGAATTTTTACGGAGAAAAATAGCCTATCTCATGAGGTGGTGCTGTAGGGTC
+TGAAAAGTTATGTAAAATCTGTGACTCTAAATTTTTTCCAGGAAGAAGTTAGAAAGAGAG
+GTTTGGGGTTTAACAGGCTGTCACTGTATATGCCTCCCAGCAGTAGAAAATTAACTTGTC
+TCATTAATAAACTGTTCAAATTCATTCAGCAGTGCTGAGCTTTTACATGAAGGAAAAGCA
+ACTGAAATGGAGAGGGATGAGGGTATTCACTTGGGGTGAAATATCTTCTCATAGAGTGCC
+ATGAATGACTGCTATTGCGGGACAAGAAGCACTTACTAGGTGAAGGTTTAGACTGAAATC
+TTGAAATCCCCTGGTATTTTGTGTGTCTGATTACCTTTCCAAAGGAATAAAATCAGATAT
+GTATCCATCTCAGTGAGCAGAGGGGTGACTTTGAGTAGAATGGGAGGAAGGTTTGTTCTA
+AGTAGTTTCCCGCTTGCATTTTCCCCAGTGATTTCAGGGGCCCAGTATATTTTCCTTTCA
+CACATCTGACAAGGGATTAATAACCAGAATATCTACGGAGCTCAAACAACTCTAGAGGGA
+AAAAAAATCTAATAATTGCATCAAAAAGTGAGCAAAAGATTTGAATAGACATTTCTCAAA
+AGCAGACAGACATATGGTAAACAAGCATATGAAAATGTGCTCAACATCATTGATCAGAGA
+AATACAAATAAAAACTACAATGAGATATTATCTCACCCCAGTTAAAATGGCTTTTATTTA
+AAAAACAGACAATAACAAATGCTGGTGAGGAGGTGGAGAAAAGGGAACCATCATACACTT
+TTTGTGGTTATGTAAATTAGTACAGCTGCTATGGAAAACAATTTGGAGGTTCATCAAAAA
+ACTAAAAATAGAGCTATGATGTGATCCAGGAATCCCACAGCTAGGTATATACCCAAAAGT
+AAGAAAATCAGTATATTGAAATGATAACTGCACTCCCATGTTTATTCCAGCCCTGTTCAC
+AATAGCTAAGACTTGGAAACTACCTAAATGTCCATCAACATATGAATGGATAAAGAAGAT
+GTGGTACATATATACAATAGAGTACTATTCAGCCATGGAAAAATGAGATCCTGTCATTTG
+CAACAACATAGATGGAACTGAACTGTACATTATTATGTGACGTAAAATAAACCGGCCCAG
+AAAGACAAACATCACATGTTCTCACTTATTTGTGGGAGCAGCAATTCCAAACAATGGAAC
+TCACGGTCATAGAGAGTAGAAAGATGGTTACTAGAGGCTGAGAAGGGTGGATGGGAAGCA
+AGTGTGAATGTTCAATGAGTACAAAAAACAGAAGAATGAATAAGACCTAATATTTGACAG
+CACAACAGGGTGACTATCGTCAATAACAACTTAATTGTACAATTTAAAATAACTAAAAGA
+ATATAATTAGATTGTTTGTAACATAAAGGTAAATGCTCGTGGTGATGGATAACCCATTTA
+CCCTGATGCTATTACTACACATTATCTGCCTGTATCAAAATATGACATATGCCCTATAAG
+TATATACACCTCCTAGGTACCAACAACAATTAAAAATTTAAAAAAGTGTATCCTTCCCAG
+TGAACCGAAATATTTTACATTGATCATTTTTACATTATCATAGTGTGTTGGAATAATCAT
+AGCTTTGGAAAGAAAAAAAATAAATGTAAACTTTCAAAGATGGCAGGAGATATAGATCTC
+TCTTTCTTTAGTGGATTTAAATATGCAATTTGTTATCAACATTGCACTGACAATTTTTAA
+CACAGGCCATATTCTGGAAATGATATGATTATAGGAGTTATAGGAAAAAGCATTTTTTCA
+TACTGTGATTATCTAGGCTATGAGGAAAAGAGATATTTCCTCTATGATGTCTAATTTAGC
+ATAATTATTCTTTTTAAACTATATTTAGATTTAATTAAGAGATACACAAACAAATTACTT
+TTAGTAGTAAGAAGAGTTTTTAAAATGTTTTGTGTAAGATGACTATGTAGTAAAGAAGGT
+TACAAACCTGCTTTTTAACTTAGAAAAATAATCTCACCTTCTTGCCCCATGGAAAATTGA
+AATGTTATGTGGAACCCAATGTACAAAAACTCTCAGTGAGGATTTCTGTCTTAAGTTTCA
+ACGGATTGAGGTGAGATCAGTGCCTGTTCCACCTTTTTCCTCCCTGTCCTAGGTGACTTG
+ATAACATTTATACAGCCATAAAATTTTGCTTTTAAATTCAAGTATAGAAAATAAACTTTA
+AAATGTGATATTTTTATATATTTTACCTTAATAATTTATATGGGAAATATTTTTGAAAGG
+TTAAAAAATTAAAATTTTTAAAAGAGTATAGTGAAATAATCTAATCACATGTGGAAATTG
+TGTTACTATTTCCCCAAGCATTTTTTGTACATGTAATTGAAAATGAGTTGAATGTAACTC
+ATAAGTAGTATAAAAGTGTTTAATATTTAATTTTCTGCTGCAGTAATGTGACATAACAAC
+CTTAAATTCTTGGTACTTTAAACAAGATATTTTTTTTCTCCCAAATTTCTAGGTTATTTA
+AGGGCAGTTCTGCTTCATATTGCAGTTTGCCTGGGCTTTACTCCAGGCTGTGTGTTGGAG
+TTAGGTCTAATCTATTGTATCCCTACACATCCCACTTAGAAAAGCTAACAAATTTCCTTT
+GGACTTCAGTCAGTTCGATTGTATTTGTCATTCAAAAGCCAAATCATTTATCCTAACACC
+AACGGCTTATTTATTTATTTATTTGTTTGTTTATTTATTTATGTTTTAGAAGGAGTTTTT
+CTCTTGTTGCCCAGGCTGAGGTGCAATGGCGCGATCCTGGCTCACCGCAATGTCCCCTTC
+CCGGGTCAAGGGATTGTCCTGACTCAGCGTCCTGAGTAGCTGGGATTACAGGCATGCACC
+ACCATGCCCAGTTAATTTTGTATTTTTAGTAGAGATGGGGTTTCTTCGTGTGGATCATTC
+TGTTCTAGAACTCCCGACCTCAGTTGATCCGCCAGCCTCAGGCTCCCAAAGTGCTGGGAT
+TACAGGCGTAAGCCACCGCTCCCAGCCTAACCGATTGCTTCTTTTGTTCTGTAGTCTTTT
+GCCAAAACTGTTATTCTCTGGACTCACTTTCACCTTCTCTCACACATACATGTCCCTCGT
+GTCCCTTTGTGTCTCCAGGGTCTTCTATTTCCACGTAATCTTAATTTTTCATCACATCCT
+GGACTCATGAGTAATTAATCT
diff --git a/tests/test_data/AMY2A_ref.fa.fai b/tests/test_data/AMY2A_ref.fa.fai
new file mode 100644
index 0000000..3047944
--- /dev/null
+++ b/tests/test_data/AMY2A_ref.fa.fai
@@ -0,0 +1 @@
+chr1_103611603_103631603 20001 26 60 61
diff --git a/tests/test_data/ARL17A_ref.fa b/tests/test_data/ARL17A_ref.fa
new file mode 100644
index 0000000..3e05ce7
--- /dev/null
+++ b/tests/test_data/ARL17A_ref.fa
@@ -0,0 +1,467 @@
+>chr17_46552256_46580191
+CTTCCTTTCCCTTTTCCTTTTTCCTTTCCTTTCCTTTCCTTTTTTTTCCCTTCTCAGGGC
+CTTGTTGTCACCCAGGCTGGAGAGCAATGGTGTGACCTAGCTCACTGTAACATCAAACTC
+CTGGGCTTAAGGGATCCTCCTGCCTCAGCTTCCTGAGTGGCTGGGACTACAGGCAGGCAG
+CTAATTTAAAAAATGTGTTCGTAGAGACAAGGTCTTGCTATGTTGCCCAGGCTGGTTTTC
+CTGCCACTTCAGAGGAAGGACTCAGGTTTCCTTTTTCTCCTACTTTTAAGAGTTTTTATT
+AGGAATTATCTGTTGAATGTTATCTAAAACAGTCAATAAAATGTATTAAGTGCCAGCTGC
+ATGCAAGACCCTAAGTTAGATACAGTCAGCCCTCTTCATCAGCAGGTCCACATCTTCAGA
+TTCAACTAGATCAGGCTGAATATTTGAAGAAAAAAAAAACCAATAAAAATACAAACAGAA
+AGTACAATATAACAACTGTCAACAATGTACAATATGTATACATTTTATTAGTGATGACTT
+AAATTACATGGGGCCAGGCATGGTGGCTCACACTTGTAATCCCAACACATTGGGAGGCCA
+ACCTGGGCAGCATAGTGAGACCTTGTCTTTATTAAAAATTAAAAAAAAAAATAGCCAGGT
+GTGGTAGTATGCACCTGTAGTCTCAGCTACTCAAGAGGCTGAGGTGGGCGGATCACTGGA
+GCCCAGGAGGTTGAGGCTACAGTGAGCTGTGATCGTGACACCGCACTCCATCCTGAGTAA
+CAGAGGATGACACTGCACTCCAGCGTAAGCAACAGAGGGCAATCCTGTCGCTAAGTAAAT
+AAAGTATAGGGGGGATGCGTGTTGGTTATAAGCAAATATTACACCATTATATGTAAGGGA
+TTGAGCATCCACAGATTCTGGTATGGTGTGGGGGCGGTATCCTAGAACCAATCCCCCGCA
+AGATAGCAAGGATGACTGAACTATGGAAGAATCAAAGCAGTGTTACACAGCATACAATTC
+CTGTCTTCAAAAAAGTTACCTCATCAGGTAGATGAGACTTATAATGAATAAAAGGAATCA
+ATACAGATTTGGAGACGGTGGTTGTTGTCATAGATAATCTTAATTGCGTTTTCTTCTAAA
+ACAGATATGTTGTCACCGAAGGTCATTACAAGAAGATGAAGAAGGATTCTCAAGGTAAAT
+ATTAGTCTGGTGATTTTTTTTTTCTTCTCTTTTGAGACGGAGTTTCCCTCTTGTTGCCAG
+GCTGGAGTGCAATGACACGATCTCGGCTCACGGCAACCTCCACTTCCCAGGTTCAAGCGA
+TTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGATTACAGGCATGCACCACTAGTCTCGCGA
+CGTTTTAATTAGAATTTTAGAATTAGAGGAGGGCTTAGAACTCTGCCCTCATTTTTCAGT
+GAGGAAACTGCCCAAGACAGGACAAATACTTACCCTAATGCTTAGCCTGGCTCCAGTGAA
+ATTAGCTCCCCAGCCAAAGCTGAGCTGGATGGAACTAACAAGGACACACCTGCTGTCCCC
+AGCCCTTTCGGGAGGTGGGGAGGGATAGGAAGGAGAAAGGTTTTGGTGCCTATTGCTGCT
+GATGGTGGGCATCAGGCCAGGCCAGGGGCCTTCTTGGAGGCTCTGGGAAAGGGGAAGGGA
+AGGCCACCGGGTGTGAGAGAGAGGGCACTTGTCTCCTTCAAGGCTGATGGAAGGTAGGAT
+ATGTGAGTCCTTCCTCTTAAGTGGCAGGAAACAGTATTTTCTCTTTTATTTCTTTTTTTT
+TTCCCCTGGATCCTAGAACTGAGGAAACACTATTTTCTCATCTTACTGGTTTTTGGGCCC
+CTACTCTATTCCTTTTATGCAAACCTCACAGAATTTTAACCAGAAAGGCCAGGCAGGATG
+GCTCACGCCTGTAATCACAGCACTTTGGGATCACTTGAGGTTAGGAGCTCGTGACCAGCC
+TGACCAACATGGTGAAACCCCATCTCTACTAAAAATACTAAATTAGCTGGGTGTGGTGGC
+GCAGGCGTGTAATCCCAGCTACTTGGGAGGCTGAGGCAGGAGAACTGCTTGAGCCCAGGA
+GGCGCAGGTTGCAGGGAGCCAAGATAGCACCATTGCACTCCAGCCTGGGGAATGAGCCAA
+ACTGTCTCAAATAAAAAAAAAACAACAAAAAAGAATTTTAACTACGGAGACCTTAGATAG
+TAGTTTGTCCTTTTATGACAGGAAAACTGAGAAGGAGAAAGGGAGAGTGTCTTACTTACC
+CCATGGTCACACAATGCACTCTGCCTTTTCCTATTTTATTCAAATTCAAAAATAACATGT
+TGTGCTTTAAGCTGATTTCGTAGCTCACTCACTCATCAGCGACAGCCAGCCATATGAAGA
+CTGTGAGATAGAGAACTTGGCTAGCTGCACTCACGCATTTGCTTGAGGTGATCCAACTTA
+TAGAATAAAGTATCTAGAAAACGAAGAACCACTTCATCTCCCATCCCCCATCAAATGATG
+CCTGTGAGACTAAGTCCGAAGGGGACTGATATAAAATGCTTCTGCCCAGCATGGTTGTGC
+AGTTTGTTCACTGCACAAGGGCACTTGGCCAAGGGAGTGAGTGGGACTGAAAATCCTGCC
+CCTGCTCCATGCTGAGCCACATACAAAGTCCCCCCAGTATATTGTGGGGCCCTTCTGGGC
+AGACATGGAGAGCTTCTGAAAGTCCCACATGCATGGAATTATTTTCAAGACCCCGGGTAT
+GTGGTCTGTGGTGGTGGTTCTCCCTGTGATTATGGACTGAGATACTCATTTAGTCCTAAT
+AAGACCAGAGAAGTACATTGTGAGATACGTGGGAAACGGCTGCATCACTCACTGTCTTGT
+GCATGTGTCTCCCCAGGGGCATTTTCAGATTTCTGCCATGGAGGGGATGCTCTTCGCGAA
+GGGAGAGTCAGGTACATTGGAGGATTCATTGCTGTGGCCAGGGAAAGCAGAGGACATGCA
+AAATTAATATTTCCCTTTCTATCTTCTAGGATGGACTTTCCTCATTTGGACAGCCGCTCT
+GGTTTAAAGATATGTACAAACCTCTCAGTGCCACAAGAATAAATAATCATGCATGGAAGC
+TGCACAAGAAGTCATCTAATGAGGACAAGATCCTCAACAGGGACCCTGGGTAAATGATGG
+GGCCCTCACAGTTCCCATCTAAAATGAGGAGGGGGTGAGAAGCTTAATTGCTCCTTTCAA
+GAATGAAAACCTTGGCATTGTTATTCTTATCCCAGGGACAGCGAAGCCCCAACGGAGGAG
+GAGGAGAGTGAAGCCCTGCCATAGGAGGAGAACACAGCCCACCTCAGGCCTCCTGCAAAA
+ATACATAGAATAAACAACAACAGTTACTAAATGAATGAAAATTGTGATTCCGATGAAGCC
+TGCCAGAGAAACAAAGCATTTTTTAAAAGAGGAAATAAGGTGATATCTGATTAGGGCAAA
+CATGATGCAGACAAGAAATGCACCGGTTCAGAGGAGGGAAGGTCAGGCCGCCTGGGGAGA
+GTCCATGAAAAAGATGGAACGTGCCAGATGCTGTACCTGGTGCTGGGAAAGAGTTGACTA
+GGCCAGCATCCCTTTCCTCAAAGGGGGGGCTCCTAGACTGGGGGGAGGGCTGGACATCTG
+AATACATCCTGAGGAGACAGTGTGGGACAGCATGGTGGCAGTGGAACCAGCCGTGGTTCT
+GCTCTTGGTCGGCTGGAAAGGAGTAGATGTAAGGGATGGTTTAGAAGAAGGGAAGTGGAA
+GAAAAGTTTTCTGAGCTGACAAGAGGAAGGAAAGGCCGCCTAGAAGGACACTAAAAAGGC
+AAGAGAAGCCCTAAGCAGAGTGAGCACCAGACTCCACAGGTTAAGGGCTCAGTCACACAG
+GACCATCCGCATGTCAGACCCCAGGTGCAAGGCCAAGCATCACCTATGCATCTGACCAAC
+TGGCTGTAAATTGGAGGTCCCCACAACTCCCTCCTCAGGTTTGAACATTTGCTAGAACAG
+CTCATGGAACCCAGGAAAACAGTTTTCTTACTAGTGCTGATTTATTACAAAGGATATTTT
+AAAGGACACAAATGATGAAGCCAGTTGAAGAGATACACAGGGTGAGGTTTGGAAGGGTCC
+TTGTGGAGTTGGGGTGCACCACTCTCCTGGAACATGGATGTGTTCGCCAACCCGGAAGCT
+CTCCAAGTCCTGTCTTTTAAGGAGTTTTCTGGAGGCTTTATCACGTAGGCATGATTGAGC
+TCCAGCTCTACTCCCCACGCCAGAGGATGGGGAATGGGGCTGACAGCACAACGCTTCCAA
+CCATAGGTCTTTTTGGTGACCAGTCCCCAAATAAGGAGCCCACCAAGAGTCACCTCATGA
+GAACAAAGGACGCTTCTATCACCCAGAAAATTCCAAGGGATTTAGGAGCTCTGTGTCAGG
+AACCAGGTTTAAGGACCAAATGTTAGAACAAAAGATGTGCAACCATAAAAAACAGCGAGA
+TCATGTCTTTTGCAGGAACACAGATGGAGCTAGAGGCCATTATCCTCAGCAAACTAAGAC
+AGGAACAGAAAACCAAATACTGTATGTTCTTTTAAGTGGGAGCAAAATGATGAGAACTCA
+TAAACAACAGACACTGGGCCCTACCTGAGGGTGGAGGGTGGGAGGAGGGAGAGGAGCAGA
+AAAAACTATTGGGTACTAGGCTTGGTACCTGGGTGATGAAATAATCTGTACAACAAACCC
+CCATGACACAAGTTTAGCTATATAACGAACGTGCATATGTACCCCCTAACCTAAAAGAAA
+AGTTTAAAAAGGAAAAAACACCTAGGAGAAAAGAAAAATGATAAATTAACAAAGGACAAT
+GCTCTTAGCACCGCCATCATTCAGGAATTTCCAAGGGTTTTAGGAGCTTTGTGTTAGGAA
+CTGGGGGCAGAGACCAAATATATATTTCTTCTTATGTTACACTACCCCAGATAGGAAAAC
+AGAAATTACTCTAGATATTTCAAACAAAAAAGGGTTGTATATAGGCAATTAGTGCTTATC
+ACTGGAGGTGCTAGAGGTGGTGAAGGTTGTGGGGATGGGGTTGCACCACTGGCTTTCAGG
+CTACTTTACCACAGCTGATTTCCAGAGGATGGAAGAAGTCAGGAAACTTGGGAAACCGCT
+GCTGAGGTCCTTGCAGCCCCACGGTCCCCAGGCTGGTGACTGGTGGGGGAGTATGGAGTC
+CAGCTGACCACCAGAGCCTGCACACCTGCTGCTATGGGGGAGGAAAGGATGACTTCTACC
+TCCTTTCCACATTCCAAATTCCACGTGACTACATTTTATTGGCAGCACCCCGCTGGCAAG
+TGAGCCTTGATGTGTGCTTCCTGGGCTTCTGGCACCTGCACAGAAAGGGGTGAAATGAGT
+GTCACGAGCAGCCACCACTCTGCATCACACCTCCACATCCAGGTGTGCTGGAGAGCCCCA
+CTTACACTTGGAGTGTCCTGGTGCCCTACCCACTTTTGGTAGGTGTTTGGGTGTCTGAGG
+CTTTGCAAAAGAAGAAGGTGGGGAGTCTATGGTGGAGTCACATGGTGGAGACCTAGCTAA
+GTCAGAGGCCTGGAGAGGTGTCACTGGCTGGGCAGCAGGTAACACACAATCATCCTGAGC
+TGATTGGAGAAACACCTGGGATTGATTCAGAGTTTTTTCTGGAATGTTTTCACTGGAATG
+AAAGCTGAGCGGTCTGCAGGCCATATAGTATTGGAGAAAACTTAGCCCTCATTGAAAAAG
+GCTGCCAGAGAAAAGATATCCACAGGGAAATTCAGGAGTTTTGTTTGTTTTTTTTTCTTT
+TAAGGTGGAGTTTTGCTCTTGTTGCTCAGTCTGGAATGCAATGGCACGATCTCAGCTCAC
+TGCATCCTCCGCCTCCTGGGTTCAAGCAAGTCTCCTGCCTCAGCCTCCCTAGTAGCTGGG
+GTTACAGGCATGCACCACCATGCCCGGCTAATTTTTGTATTTTTAGTAGAGATGGGGTTT
+CACCATGTTGGTCAGGCTGGTCTCGAACTCCTGACCTCAGGTAATCCACCCGCCCTGGCC
+TCCCAAAGGGCTGAGATTACAGGTGCGAGCTACCGCGCCTGGCTTGTTTTGGGTTTTGGG
+GTTTTTTTTGTGTTTTGTTTGTTTGTTTGTTTTTGGAGACAGTCTCTGTCACCCAGGCTG
+GAGTGCAGCAGCGTGATCTCGGTTGACTGCAACCTCTGCCTTCCAGGTTCAAGTGATTCT
+CCTGCCTCAGCCTCCCGAATAGCTGGGATTACAGGCACCCACCACCATGCCCGGCCAATT
+TTTTTTTTTTTTTTTTTTCTAAAAACAGTTTCACCATGTTGGCCAGGCTAGTCTTGAACT
+CCTGACCTCAAGTGATCCGCTCACCTTGGCCTCCCAAAGTGCTGGGATTACAGGCATGAG
+TCACTGTGCCCCGCCAGGAAATTCAGTTTCTGAAAATACACCTGTGGATCTCTAGCCTTG
+AACACCCTTGGATGCTGCTTTAAATGACTGATCCTCGATGCCTCCCTTCTAACTCACACT
+CCCCTATATCAATCTCCCAGAAAAAGGGACCTCTTTTATTCTTTTTTTTTTTTTTTTTTT
+TTTTTCAGAGACGGGCCTCACTTTGTTGCCCAGGCTGGTTTTGAACTCCTGGCCTCAAGT
+GATCCTCCCGCCTTGGTCTCTCAACGTACTGGGATTACAGGTGGGAGTCCCCGCGCCCGG
+CAAAGCCTATATTAAACCCTTTTATGCACACTCGGCGGTACTGCAGAGAGGGCAGGGAGG
+AAGCAGAGGTGCCCTGGCATCTTCAGCTGGAGGTGAGCAGGGCGCTGAGGGTGGGAGAGG
+CCCGGCGCCTGGGGATGGGAGGCAGGACTGCACCTTCACAGGGACGCTTCCACCCTACCC
+CGGAGGTCAGGGCCTCTCGCCCAGCTCTGGCTCTGAGGTCCTGGAGGGAGGGAGATGCTG
+TTGCGACTCAGAAGATTGGGGGAGGGCCACCCCCATTCGAGAAGAGTGAAAATCCTGAGC
+CTGAAGAAGTGGAACCGGTTGGAGCCGAGGCTTTAGAGGATGGCGTTCGAAAGAGGGTCT
+GGCGCCGCCCTGTGGACCGTTCGGGCTCGCAGGGCCGAAGGCTCCGAAGACTGAGACCTG
+TGAACCATGGGGAGGCTCCATGCGGATGGGGGCCACAGCCCCCGCCGGAGCCCCCACACT
+AGCCCTGGACTTCTCCACTGGCTTACGACATGAGAGCTCAATATGCTCCTTATTTAACGC
+ACTGTTGTATCAGGTCCCTGTGGGAGCCACTGGCTCTATAGCCTAATAAAGGAGCGGGTG
+CACGCACTGGATTGGTGAGCTACCGCCACTGCAACGCGTCCTAATCAACCATCCTAAACG
+GCGGCTGGAACAAGGTTCTCGCAGGCCTGTGCTTGGGCTTGAACGCTGGTCCAGCCGCTG
+CGCTCTGTGGCTCCCTGTAGGCCTGCGGATCGGCCAGGGGGCTCCGTTCCTTTTGGGCGG
+AGGCTGAAGAAGCAGCGGCTGCACCAGAGAAGGCCCTCTGGGTGAAGGTGGGAGCGCACG
+GGGCCCGCGGAACCACCTAAGGCGACTTCAGACGTGGGCTCGGAACTGGCAGCCTTTCGT
+TTCTGCTTCATTCCAAGGCCAGAGCAAGCCACGTGGGCAAACCCAAAGCCAGGGGACAGG
+AAAGTATCCTCCACCCACAACGAAACCATGGCAAGCGGTGGATGCAGGTACGGCCAATAG
+TCTATCTATCCCGGTGAGTGAGGAGACCTGCTTTGAGGGTTGCACAACCTGGATCTGCTT
+TTACAGTGGTGTCTGTCACTATGAAGACTCCACCATGGGTCGCCATCAGGTCAGGGACCC
+TGACAAGGCAAGAACTGCATCTTCCTCTGCACACAGCTCTGCTCCCTTCCCCGCCATGCC
+TAACACCAAGCCTAGCCCTGAGGGATGACTCAGGAATATTACTGAGAGCATTTTAGGCCA
+TTCCTTCATTATCCCCATGTGACTTGTTATGAAATATAGACTGACTTCCTGAAGATCAGC
+ACATAGTGCTAAGTATTTGGCTTGTAATCTGTAGAGACTCTGCCATTTGGAGCTGGGATC
+TGTCCCCAGAGCTGTCAGACACCAAATCCCGTATCTACTGCCACCCAAAGGGACCTCCAG
+AAGAAAGGGGTTATACAGGGTCAAACACCAAGGCAGGTTAGTGAAATTTCTCTAGAGGCC
+ATTTAAAGCTGGAGTCTCACCACCTGAACTGCCCTCAGAGGAAGGCTGTCTAGGGCACAA
+ACCTAGTCAGGGGTCCACATGGACTTAAGGACAATTTTTTTTTTTTTTTTTTTTTGAGAC
+AGTCTCATTCTGTCATCAAGGCTCGAGTGCAGTGGTGTGAACTCAGCTCACTGCAAGTCT
+CAACCTCCTGGGCTCAGGTGATCCTCCCACCTCAGCCTCCCGAGTAGCGGGAACCACAGG
+CTCGTGCCATGATGCCCAATTAATTTTCTTTTAAATTTTTTGTAGAGATGAGGTCTCCCC
+GTGTTGCTCAGTCTAATCTTGAACTCCTGGACTCAAATGATCCTCCTGCCTCTGCTCCTC
+AAAGTCCTGGGACTACAGGTGTGAGCCAATGCACCTGGCCTCTTATGAATAATTTTAAAA
+ACAATGAGGTTCACCGTCAGAGCCCCTGCTGCTCTACCAAGTCCCTTGGCCCCTCTCAAC
+AGGGCAAAAGCAAGATGAGCCCCAGATGTTCTGCTTAATGACCACCTTTCCCAGGAGACT
+TTGCTCTTTAAAGGAGAACCACTTAGAGATATGAGCAACCTTAAAGAATGCCACCAGCAC
+TAGTGAATGCCAGGCACGGGCCACGTGGGTGGAGAGTATACTTTAGGGCAGTCACTCATG
+GTAAATTATTTCCACCAGCCCCCAGAAGTGACTATTCAATGTCCAACTATGTCAGGCCCA
+GGCTAATAAAAGTAGAGGCATGAGGAACCTAGGGTTGTTCTGAAGTGCCTTGATTATGGT
+ACATGTGGAAGATTTTAGAGCTTGTTGTAAAAAGTGATGACCCATGGCCCCCCAGGCTGG
+GTCTGGGATTCCCTTTGTGGATTACAAGAGGATATTATACAGCAGTTTTTAAAAATGAGG
+CAGACTGGGACAATCTATCTCCAAGATGCATAGGTGCTGTTAAGGGAACAAAGCAAGATT
+TAGTAGGGCGTGTATAGTATGCTACTGTGCGCTGTGCGTTATCTGTACAGAACTGTGAGG
+TCTGATACAGTAGCCACTAGCCACATATGGCTATTTACATATAAATTTAGGTTGGCCACA
+GTGGCTCATGCCTGTAATCCTAGCACTTTGGGAGGCCAAGTGGGAGGATAGCTTGAGGCC
+AATAGTTCAAGAACACCCTGGGCAACATAGTGAAGCCCCTTTTCTACAAAAAATTTATTT
+ATTTATTTATTTTTATTTTTTTTGAGACGGGTCTCACTCTGTCACCCAGGCTGGAGTGCA
+GTGGCGCAGTCTCAGCTTATTACAACATCTGCCTCCTGGGTTCAAGCGATTATCGTGCCT
+CAGCCTCCAAGTAGCTGGGACTACAGGCACGCACCACCATACCCAGCAAATTTTTGTATT
+TTTGGTAGAGACAAGGTTTTGCCATGTTGGCCAGGCTGGTCTTGAACTCCTGACCTCAGG
+TGATCTGCCCGCCTCAGCCTCCCAAAGTGCTGAGATTACAGGCATGAGCCACTGCGCCCA
+GGCAAAAAATTTAAAATTGTAAAAATCAGCCAAACATGGTGGCATTCATCTGTAGTCCCA
+GCAACTTAGGAGGCTGAGGTGGGAGGATTTCTTGAGCCCAGGAGGTCAAGGTTGCAGTGA
+CCTATGACTGCACCACTGCACTCCAGCCTGGACAACAGAGTGAGGCCCTGTCTCAAAAAA
+TAAATAAATAGGAGTTTGAGGCCATGTTCACACATCACTGCAGTCCAGTCTGGTAAACAG
+AGCAAGACGCTGAGTCTTTAACAAAAAAAAAAAAAATTTTAAAGTCAGCTGCTAAGACAC
+ACTAGCCACATATCAAGTGCTCAACTGCCCCGTGTGGCTAATGGCTTCCAAACTGGCAGC
+ACAGGCAACTGTTTTCATCACTGAAGTTCTGTTGGACAAGAATACCTCTGGAAGCACACA
+CGAGAAACTGGTAATGGCGGTTGCCTTCCGGGAGGGAAACTGGGAGAGTGCAGGGCTGTA
+TGCCCTTTTGTACCTCTTGAATTTCATATCATGCGTTTGTACTAGGTGTTTACAAATTAT
+TTTAAAAACATACTGGGAATTAGGAATCCCCATATGGAATCACTGCAGTGGGACACTGAT
+TCCCAAAAAATTACTTTGCAATTTGCCTAAAACAAATTAAGCTAATAGCTATCATGATTT
+CATATTTAATATTTTTTCACAGCTTAGAGTTTTTTTAGCTCCTGTTGCTATCTCTCTGCT
+CTTATGAGCTCACAGGGTGACAAAGTGAATTAGTAAGTAGCAGGAGAACATTAAAGGAAA
+AACTCCTGGGCAACAGGGCAAAACGCCGGCTCTACAGAAAATACAAAACATTAGCCAGGC
+ATGGTGGCATGCACCTATGGTCTTAGCGACTTGGGAGGCTGAGGTGGGAGGATCGCTTGA
+GCCCGGAAGGCGGTAACTCAATCCCAGCAGAATCCCAGGGAGCGAAGGTGGCTCATCCCA
+AAAGAAAAACAAGAAGGAAATTCTATTACCAGAAGACAAAGGGATGAAATGAGGGATGGA
+GAATCAAAGACTGAAGCTACTAGGGTTTGTTTTTATTAATATTTAATTTTTTCAGAGGCG
+AGGGTCTCAGTATGTTGCCCAGGCTGGCCTTGAACTCCTGGCCTCAAGCAATCCTCCTGC
+CTGAGCCTTCCGAGTTGTTGGGATTACAGATATGAGCCACTGCATCCAACTTTGGTTCTT
+GTTTGTCTGTTTTGTTTTGTTTTGGTTTGTTTTTTTGACAGAGTTTTGCTGTGCCACCCA
+GGCAGTGACTCAGCCTCGGCTCACTGCAGCCTTGACCTTCTGGCCTCGAGTGATCCTCCC
+ACCTCAGCGCCACCCCCCACTGCCCTCCAATATCTGGGACTACAGGTGCGCGTGACCGCA
+CACAGCTAATTTTTAAATTTTTTGTAGAGATAGGGTTTCACTATGTGGCTCAGGCTGGTC
+TCCAACTCCTGGACTAAGCGATCTGCCTGCCTTGGCCACCTCCCAAAGTGTGAGCCACCA
+TGCCCACCCATTGAACATTGAAGCTAGACTGGGCAAACCCTTAAGCCTAAACCAGTAACA
+GTTTTTCACAAGTTCATAGATGTTACTGTGGTTAATAACACACAAATTCATTTAAAAGCA
+TGTGTGTCCACATAGTAATTTTTGGTCCTTATTTTTATTTTTATTTTTCAGTTAATGGAT
+ATTAAAGATACAACTTTATTTTGTTTTTTTTGAGACAGGGTCTCACTCTGTCACCCAGGC
+TGGAGTGCAGTGGCATGATCAGAGCTCATTGCAACCTCCACCTCCTGGGTTCAAGAGATT
+CTCCTCCCTCAGCTTCCTGAGTAGCTGGGATTGCAGGTACATGCAACCACACCTGGCTAA
+TTTTTGTACTTTTTGTAGAGATAGGGTTTTACCATGTTGCTCAGGCTGGTTTTGAACTCC
+TGAGCTCAAGTGATCCACCTGCCTCGGCCTCCCAAACTGCTGGGATTACACAAGTGAGCC
+ACCACACCCGGCCTAAAGATATAATTTCTATCATGAGGAGGTCCAAGAACTATTCTCTTT
+TTCTTTTTTTAATGTTAGAAAGGGATTAACTGGGTATGTGCTGCAGCAAAGGGAGGGGAA
+ATTAAGCAAGAAGAGAAGGGAGCCAGGAAATAAAGGCCCCAACCCAGGAAGCAGTTAAGC
+AAAGTTCCAGGATGACCCATGTGACAAGTTTAGGGGATAACTTGAGCACATGGAGGACAG
+AACTTGGAGAGGGCACTGTGGGCCTGGGCGCCACCTGCTCCGCCAGAGCACTGGAAGAGA
+ACGAGGGCACGATAATGGCAGATGGCACTGAAAGAAAAGGAGAGAGCTTGAGGCACCCTT
+GGGGGAAGCAGCCATCATCAGAGTGTATTTTATTTTTATTTTATTATATTTTGAGATAGA
+GTCTCACTCTGTTGCCCAGGCTGGAGTGCAGTGGCATGATCTCGGCTCACTGCAACCTCC
+ACCTCCCAGGTTCAAGTGATTCTCTGCCTCAGCCTCCCAAGTAGCTGAGACTACAGGGGG
+GCACCACCACACCCGGCTAATTTTTGTATTTTTAGTAGAGATGGGGTTTCTCCATGTTGG
+CCAGGTTGGTCTTGAACTCCCGACCTCAGGTGATCCGCCCACCTTGGCCTCCCGAAGTGC
+TGGGATTACAGGCGTGAGCCACCATGCCTGACCTCACAGCACATTATTAAGCTCTGTGGT
+GAATAATATTTATATAGTCACAATTCTGTAAACACTGTTCATTTTCTACAAATTGTGGCA
+AACCCAAACCTCAAGAATGGACAGGGCTAGGGTGTAAAAGAGCTAAGTCCTTGCCAGGTT
+TACCAGGAAGGCAACAGACAGTGTCTAAAACTATGAGACAGCTGGGCGCGGTGGCTCACG
+CCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGTGGATCACGTGAGGTCAGGAGTTTG
+AGATCAGCCTTGACAACATGGTGAAACCCCGCGTCTACTAAAAATATCAAATTAGCTGGG
+CATGGTGGCAGGTGCCTGTAATCCCAGCTATTAGGGAGGCTGAGGCAGGAGCATTGCTTG
+AACCCAGGAGGCGGAGGTTGCAGTGAGCCAAGATGGCACCATGCATAGGTTTTATGCCAA
+TACTACACCATTTTATATCAAAGCCTTGAATATCCAAGGATTTTGGTATCTATGGGAGGT
+CCTGGAACTAATCCCCCACAGATACCAAAGGATGAGTATACACCTTTTCTTACTTTCGAA
+TTTTGAACCAGACAGATACGCTGCAATTCAACAAATTAAAATAACTGAACCTACAATCAA
+GGAGAAAAATATTTATCCCTAATAACTAATGGACCATCTTACGCCGTCTATAAAAATATC
+AATTAAGAAATCACGGCCGGGCACAGTGGCTCACGTCTGTAATCCCAGCACTTTGGGAGG
+CCGAGGCGGGCAAATCACTTGAGGTCAGGAGTTTAAGACTAGCCTGGCAAACGTGGTGAA
+ACCCCATCTCTACTGAAAATATAAACAAATTAGCCAGGCATGGTGACGGGCACCTGTAAT
+CCCAGCTACTCGGGAGGCTGAGGCAGGAGAATTGCTTGACCCAGGAGGTGGAGGTTGCAG
+TGAGCCGAGATCACACCACTTCACTCCAGCCTGGGTGACAGCAAGACTCCGCCTCAAAAA
+ACAACAACAACAACAAAAAACGCGGTGATACACGCCTATAATCCCAGCACTTTGGGAGGC
+CAAGGCGGGCGGATCACAAGGTCAAGAGATCCAAGACCATCCTGGCCAACATGGTAAAAC
+CCCATCTCTACTAAAAATACAAAAATTAGCTGGGCATGGCGCGCACCTGTAGTCCCAGCT
+ACTTGGGCGGCTGAAGCAGGAGAATCGCTTGAACCTGGGAGGTGGAGGTTGCAGCGAGCC
+GAGATTGTGCCACTGCACTCCAGCCTGGCGACAAAGCGAGACTCCGTCTCAAAAAAATAA
+AAAAGAAATCAGAAACTGTTCAAATTCCAATCTATACAAGTGAAACCTACACAGTTAAAT
+GTCTTAATAGAGAGCTAAAAGCATTAAGTTCTGTCATACTCAACATACTGATGAAAAAAA
+TTTGAACTAAAAGACAGCCTGATCTTTAGAAACTGGCAGAAAAAATAGGGGCTGCATCCA
+AGACCACTCACGGCACGTCCATTCTTCCAGCAGCTCCCAACTGCTATTTTATGAAAAGTC
+CAAATTTCCCACCCCAAGGTCCTAGAGTACAGCCAGCTCCCTACCATCAAAAACAATAGC
+TACCAAAACTCAGTAAAATTAGGCTTGTTATGAAGAGAAGATTGAAGAAATAAAGGTATT
+TATTTCATCTTGGTATTTGAATGTGGAAACTCCCAGTAGGGTAAGAAAAAAGAATTCTGC
+CAGAGAAATGATGAAGTGAAACCTCAAACAATTTGGGCTTTTGAGTGCCTACTGTCATCC
+AGCCACGAAATGAAGCATTTTACATAAGTAATTTAAGCCCAGTGATGCAGGCAGTAAGTA
+TTATCTCCATTTTAAGATGAGGAAATTGAGACAGACGAGATTACCTAACTCTCTCAAGGT
+TTCACCACAGAGCTGTGGTTCAAACCCATGTGTTCTAGTTTTATTACAGTTAAGAAACTG
+CAAGAGTTTGCCAATTTGTCAAGTGCCAGTCCAAGATAAATGGACACTCTGAGTATCAAA
+AAAATACATAATCTGCAGATCAAAACACGAAGATACAAACATCTAAGAGTCACCATGGAA
+GTTGCGGGGCATCCAGTGGTTTTTCTGAAAATCCATGAAGGAAAAGAATAAAGCATTTTT
+CCAGCCTCTCTTCTACAAACTATATTTCAGGGAAACCAAATATTTGATGAGGGAATTCTT
+TGTTTAGAAGGATTTTAGCTAATAAATGCAGAGGAATAACAAGATTTAGGAGGGGGGAGA
+AATCACCATTTTGTGACTTCTAATAAAATAATGGGTCTAGGCAACAGTTTTCAATGGATG
+CTAAAACGATTAGGTGAAAAGTTGATGGAGAATTTTAATTCAGGGGAATTAGGCTGATAC
+CATCTGAAACCATTTGGCATCATTAAAAATGTGACAACCTGGTGGCTGCCAGGGAGGAAG
+GGGAGAGGGTGGGGGAAGAACAGAAGTTATAGTTTATGGGCACAAAGTTTTGGTTTTACA
+AGATGAAAATAGTTACAGAAATGAATGGTGGTGATGGTTGCAGAATATTATGACTGACTA
+TACACTTAAAAATGTTTAAGATAGAGCTGGGTGCAGTGTTGCACACCTGTAGTCCCAGTT
+ATGCAGGAGGCTGAGGTAGGAGAACCACTTGAGCCCATGAGTTCGAGACCAGTCTGAGTA
+ACATATAGAGACCTCATCTCACAAAAACATCACTACCACAAAACAAAACAAAACAAATTG
+TTAAGATGCTAAATTTTAAGTTATGTGTATTTTACCACAATAAAAAAAAATGAGGCCGGG
+TGCAGTGGCTCATGCCTGTAATCCCAGCACCTTGGGAGGCCGAGGCGGGTGGACCACAAG
+GTCAGGAGTTCGAGACTGGCCTGGTCAATATGGTGAAACCCTGTCTCTACTAAAAATACA
+AAAAAACTAGCCGGACGTGGTGGCTCACGCCTATAGTCCCAGCTACTCGGGAGGCTAAGG
+CAGGAGAATCACTTGAACCCAGGAGGCTGGGGTTGCAGTGAGCCGAGACTGCACCACTGC
+ACTCCAGCCTGGCCAATAGAGGGAGACTCCGTCTCAAAAAAAAAAAAAAAAAGAAAAGAA
+AACATAGTTACCCAGCAATTCCACTTCTAGGCATATACCCAAAGAACTCAAAGCAGGGAC
+TCAAACAGATACTTGGACATGAATCTTCATAGCAACACTATTCATAATAGACAAAAAGCA
+GAAGCAACTCAAGAGTCCATCGATAGATGAATGGATAAACAAAATGTGGTATATCCATAA
+AATAGAATATCATTCAGCCATAAAAGGAATTAAGTTCTGATACATACTACAGCATAGATG
+AACCTTGAAAACATTGTACTAAGTGAAAGAGGCCAGGCAGCTAAACTTCCAAAGACTAAT
+ATACAATTCCACTGAATAGGCAAATTCATAGACAGAGGATAGAATAAAGGCTAATGAGGG
+GTGGGGGAGAAGAGAATGGGCAGTTATTGCTAATGGATACAAAGTTTCTGTTGGGGATGA
+TGAAAAAATTGTGGAACTGGACAGTGGTGATGGTTGTAAAACACGGTGAATGTACTTAAT
+GCCACTGAATTGTACACTTAAGAATAAACTTGTAAATTTTATATTATATATACTTTGCCA
+CAATAAAAATTTTTTTAAAAATGTCTAAGTGTGACAACCGAACTTCTGTGTTAGGATAGG
+AAGAATACTTATGAAGTATTCTTGCCCCTGAAATGAACCAAAATCTAATCAAGTCTCTAG
+AATAAACAACCAGTTCACAGGAAATCCACAGATAGCCAAGCAAGTTAGATGGCACTATAA
+TAAAGTAATCAACCAAATCCAGAATGTGGAATAAGCAGTTAAAAAAAAAAAAAAAAAGAG
+GGGGGGAGGCCAGGCGAGGTGGCTCACACTATGGTCTCAGCACTTTGGGAGGCCAAGGTG
+GGCAGATGCTTGAGCCCAGGAGTTTGAGACCAGATTGGGCAACATGGTGAAAATCCATCT
+CTATAAAAAGTACAAAAAAATTTGCTAGGTGTGATGGTATGCACCTGTGGTCCCAGCTAC
+CTGGGAGGCTGAAGTCAGACAATCGCTTGACTCCAGGAGGCAGAGGTTACAGTGAGCCAA
+GATTGTGCCACTGCACGAGACCCTGTCTCCAAAAAAAAAAAGGAAAAAAAAAGGCATAGA
+GAAAGAACAACTGCTGTAGAATAAAAGACTAAAGAGGCAAAACAACAAAATGCACTGTGC
+GAACTCTGATCCAGATGCAAACAAAGCAGATGTAGAAAAGATGTAGAAAGGATACTTTTT
+TTTTCTCTTTTTTTTTTTTTTTTTTGAGATGGAGTCTCGCCCTGTCGCCCAGGCTGGGGT
+GCAATGGCGCAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGAGAATCTCCTG
+TCTCAGCCTCCCGAGTAGCTGGGATTACAGGCATGCACCACCAAGCCTGGCTAATTTTTT
+TTGCATATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCTCCAACTCCT
+GACCTCAGGTGATCCACCAGCCTTGGCTTCCCAAAGTGCTGGGATTACAGGTGTAAGCCA
+CCACACTCAGCCAGGATACATTTGATACAACTGGAGAAATTAGAATAAGGACTAGGTGCT
+AGATATTAAAGAATTATTATTAATTTTGGAGATGTTATAATAGCATGAAGATTTTATATA
+TATATATATGCCTTATCAGTTGGAAATGAACACTAAAGGACATGTGAATAAAAGGCATCA
+TGTCTGGGATGGGCAAAATGCTTATAACTGTTGAGGCTTGGTGATGGGTACATGGGGGTT
+CGTTATGTTATCCTCTCTGTCTTTGTATTTGTTTTAGATTTTCCACTAAAAAAGTTAAAA
+AAATAAATGCACTACTTTGAGAAAAAACATTCTAAAATCTATAGCCTGGAAAAGATAACT
+GCAAATAGAGTTAATTGCCAAAGGTAACAGCTAAAACAATGATATAAAATATAAAGGGAC
+AGTCGACTTGGGTATTTCATCTGGAAAGACTGACGAATATGTAAGTCTTTACTATCAAGT
+ACATGCAGGAGAATCTCCAAAACAAACAGACTCCAAATAAATTCTAAAATGAGAAATTTG
+TAGATTCCATATGGTTAATATTTTACATGAAAATCGGGGTGTAAAAACTATAGTCAGCCC
+TCTGTATCCACAGATTCAACCCACCTTGAACCAAAAATATTAAAAAGTAATAACGCAAGC
+CGGACACGGTGGCTCACGCCTATAATCCCAGAACTTTGGGAGGCCGAGGCGGGTGGATCA
+CCTGAGGTCGGGAGTTCGAGACCAGCCTGACCAACATGGAGAAACCCCGTCTCTACTAAA
+AATACAAAATTAGCCGGGCGTGGTGGTGCATGCCTGTAATCCCAGCTACTCAGGAGGCTG
+AGGCAGGAGAACCACTTGAACCCGGGAGGCGGAGGTTGTGGTTAGCCGAGATCGCGCCAT
+TGCACTCCAGCCTAGGCAACAAGAGTGAAACTCCGTCTCAAAAAATAAATAAATAAGTAA
+ATAAATAAAAATAATAACGCAACAGTAAAAAAAAAAAATACAAATAATACAGTCTAACTA
+TATAGCATTTACATGCTAGTAGACATAAGAAGTAATCTAGTAGTGATTTAAAGTAGACAG
+GGGGGCTGGGCACAGTGGCTCACGCCTATAAGCCCAGCACTTTGGGAGGCCAAGGTGAGC
+GAATCACCTGAGATCAGAAGTTCGAGACCAGCCTGACCAACATGGTGAAACCCCATCTCT
+ACTAAAAATACAAAAATTGGCCAGGCGTGGTGGTGCATGCCTGTAACCCCAGCTACTTGG
+GAGGCTGAGGCAGAAGAACCACTTGAACCTGGGAGGCGGAGGTTGCAGGGAGGCAAGATC
+ACACCACTGCACTCCAGCCTGGGCAACAGGGCAAGACTCTGTCTCACAAAAGAAACCTCC
+CCAGTAAGTATAAAGAGACCCTAAGAGAGGAAATGGCTGACAGTGTAAATAGAGCAGAGC
+ACCAGAAGGTATCACTTCAAGCATCCGTCTTTAGAGACATTTCACAGAAACAGTATCGAG
+GCTACAAACCGAATAATCTTTACCTTTTGTGTTCTGGAAAAAATGCTGCCACAGAGGTCT
+GATTTTGAAGTGGCTGCCAACATCCCAGACAGCGAAGGTGTTATTTTTATATTCTACTGT
+CTCCACACAGAAACCTAAATGAAACATGGGGAAAACATTTAATTATGATTTGTGCTGGTT
+GAACAATTCAAAATAATTTCAACATGCGGACAGTACTTTTAATTTACAAAGCAGCCTGCT
+AGTCAAAGATCTATAGCCTTCAAACAGAATGTGCTGGCCCCCTCGCCCATTCATCAGGAC
+AGGTGTCTCAAACCTTCTCTACTCTCCAATCTCTAGCCACGCACCTGCCCTTTCTCTCAG
+CAGGAGACTATCACAGTGTACTTCAGAGCAGAGAAGCCGCCGATGGGAATGACTTCCACT
+TCCACCATACAGTGTAAAAGCCAGTCTACATCTTTACCCATGCTTTTCTTCCTCCCTCTT
+GGAACAACAGAAGAAGGTAACTCTCTTCCACCAACTCCCTCCCTCTATGAGCCAGATCCA
+CCCTACCCAGACTGTCTGGAACTTTTTACTATTCCTATTCTCTCCCATATCTGAATATCT
+CCAATCTGTTCACCTTGGCATGCTTAAGCCTTTTACATCTTTAAAAACAACAAAACAAAC
+CTTTCCTTGATCCCCACATCTTCATCGCACTCTCTTCCTTCACATCTTCACCCTCTATTT
+GCTTTTTTTTTTTTTAAGAGACAGGGCCTTGCTGTGTTGGCCAGGCTGGAGCACAGAGGT
+ACAATCATAGCTCACTGCAGCCTCAAACTCCTGTGCTCAAGGGATCCTCCTGCTTCAGCC
+TCCCAAGCAGCTAGAACTATAAGCATGAGCCACCATACCTAGCTAAATTTAAAAAATTTT
+TTGTAGACACAGGGTCTCACTATGTTGCTCAGATTGGTCTAAAACTCCTGAGGTCAAGCA
+ACCTGTCCACTAAGCTTTCTTCGGTCCTCTCACGCAACAGCACCTTAGCTCTGCTCCACC
+ATTGGGTAGCTGGTTTTCAGTAAGGTCAGCAGTGATCACCAGGTCAATAAATCTAACACA
+GGCCTTCTCAGTTCTTCCATAATTTGATCTTCTATCTGACGCTACTGCTCACTCCCTCCT
+TGACACCCTTCCCCAGCTTCCCTTTTCCTTCTATCTCTATGGCTGTACTGTCTCAGTCTC
+CTTAGTGAACTCAGCCACGTTTAACTGATCCTGCAATGATGGGAGTCGCCCTGGCTCTCA
+CTGGCCCCATCCACTCTCCTCTTTCTGCACCCCTCCCTAGGCAATCTCACTCACTCCCAC
+GACTTCATTCCTATCTCTGTACTAACTCTGAGATTTATATCCCCCAACCTAGCCCTCTCT
+CCTGAGCTTCAGGCTCATAATCTAACAGCTTTGGCAAATGTCTTCAAGGCCCCAAAACAA
+GGTGATCAGTCAAAAATGGAATTTAAGGTCCCTACCCCCAACCTCCTCCTCTGCTAGTTA
+TCCCATCATTGACTTCGTTCCACAAGCTGATAGCCTGAGAGGTGCCCTGGAAACATCCGA
+TGCTCGCACCCCTAGAACCCTACTACCACACCTTATCAATTCAACCCCTTCAACAACTAA
+TAGTCACCCTAACCCAACAGACCATCATTTCTAGCCTCAATTCCTGAAACCATATAACTA
+ATTTCACCTCAGCCACTCTCCCTCTCTCCATCCGGTCACCATCTTCTAAACAAAGCAATA
+GCTTAAAAACACATAGTGTGGTCACGTCATTTCTCTACACATTTATGTTTGATTGGAATA
+TTTTCAAACTTAAAAAAGAAAGAATAATCGCCCCAAAAAATAAAATTAAAAATTAAAAAA
+AAAAGAAAGAATAAGCCGAGCACAGTAGCTCACACCTGTAACCCCAGCACTTTGGGAGGC
+TGAGGTGGGCAGATTGCTTTGAGCTCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAA
+ACCCTGTCTCTACAAAAAAACACAAAGATTAGCCAGGCATTGGTGCCGTGTGCCTACAGT
+CCCAGCTACTTGGGAGGCTGAGACAGGAGAACTGCTTGAAGCAAACATTGCAGTGAGCTG
+AGATCACACCACCGCACCCCAGCCTGGATGACAGAATAACATCCAAGAAAGAAAAGAAAG
+AAAAGAGGAAGAGAGAGACAGAGAGAGAGAGACAGAGATGGGAGGGGAGGGGAGGGGAGG
+AAGGAAAGGACAGAAAAGGGAGGGAGGGAGGGAGGGAGGGAGGAAGGGAGGAAGGGAGGA
+AGGGAGGAAAGGAAAAGAAGGAAAGATGTCCCCTTAGCGGCTTCCCACACCCAAATCCCA
+AACATGACTGTCAAGACCCAAGCAAGCTGGGTCTGCTCACCTTCCAAACCTCAATTCTAG
+TCATGCCTTATTTTAGTGAGAGACACTCGTTACATTGCAACAATCTAAATTCTTGCTAAA
+TAAATAAATACAATGCCATTCCAATCCTATAATGATGGTAGGGGCAGGGGCAGGAAACTT
+AGCAAAATTATCTAAAATTTAACCTGGAGAAATAAACAAGTAAGCATAACCAGGAAATCT
+CTGAAAATGAGTAATGAATTATTTTTAAGGATATGCCAAGCCCTAGTAACACTTGAATAG
+AGTTCAAATATAAATGCAACAGGTACTTGCCCAAAAAAAGACACACATACCAGTGAGAAT
+TTAGTATGTAACAAAGAAGGCATTTCAAATCAGCATGGAAAAATTATTCAATAAATGACA
+TTGAACAACTGTCTACCCCAGCACCGTCCAATACATAGTCACTAGCCATTTGTGACTATT
+TGTTTATGTTTTTGTTTTATTTTGTTTTGTTTTTGAGACAGAGTCTCGCTCTTGTCACCC
+AGGCTGGAGTGTATGGTGCGATCTCGGCTCACTGAAACCTCCGCCTCCCGGATTCAAGTG
+ATTCTGCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCGCCTGCCACCACGCCCATT
+TTGCAATTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGA
+CCTCAGGTAACCCGCCTGCCTCAGCCTCCCAAAGTGCTGAGATTACAGGCGTGAGCCACC
+ATGCCCGGCCTATTTATTTATTTTTTGAGACACAGTCTTGCTCTGTAACCTAGGTTGAAG
+TGCAGTGGTGCAATATCAGCTCACTGCAACCTGTGCCTCCCGGGCTGAAGTGATTCTCCT
+GCCTTAACCTCCCGAGTAGCTGGGATTACAGGTGCATGCCACTACTAATTTTTGTATTTT
+TAGTAGATATGGGGTTTCACCACGTTGGCCACGCTGATCTCGAACTCCTGGTCTCAAGTG
+ATCTTCCCACCTCAACCTCCCAAAGTGCTGGGATTATAGGCATAAGCCACTGCGCCTGGC
+AACGTGGCTATTTAAACTTAAAGTTAAAATCAAAAATCAATCAATAAATAAAAATAAATA
+AAAAATAAACTTAAAGTTAAATACAATTAAAAGTTCAGTTCCCGGCTGGACACAGTGGCT
+CATGCCTGTAATCCCAACACTCTGGGGGGCCAAGGTGGACGGATCACCCGAGGTCAGGAG
+TTCGAGACCAGCCTGGCCAACAGGGTAAAACTCCGTCTCCACTAATAATACAAAAATTAG
+CCGGGTGTGGTGGCGTGCACCTCTAATCCCAGCTTCTCAGGATGCTGAGGCAGAAGAATG
+ACTTGAACCCAGGGGGCGGATGTTGCAGTGATCTGAGATCGCGCCACTGCACTCCAGCCT
+GGGTGACAGAGCAAGATTCCATCTCAAAAAACAAAAAAGTTCAGTTCCTCAGTTGCATTA
+GCCACATTTCAAGCACATGAACAGTCACATGGCTATTGGCTACGACACTAAACAGCACAG
+ACACAGAACATTTTAATCACTGCAGAAAGTGCCTCCTGGGCAGCACTGGCTTATCCATTT
+GAAAAATTTAACCAAATTACTACCTTACACACAACCATCTCATAACTCGAACATTCTCCT
+CGGTTTCCCACTCCTCAATCGATGCAATCTCTGCAGCTACTGCCCAAGTTGAAAGTTGAT
+CATTTGGAGACCAGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTAGCCGGGCAGATC
+ATTTGAGGTCAGGAGTTCCAGACCAGCCTGCCCAACATGGTGAAACCCTGTCTCTACCAC
+AAATACAAAAATTAGCCTTGCATGGTGGTGAACGTCTGTAATTCCACCTACTCAGGAGGC
+TGAGGCATGAGAATCGCCTGAACCCAGGAGGCGGAGGTTGTAGTGAGCCGAGATCGCGCC
+ACTGTACTTCAGCCTGGGGTGACAGAGCGAGACTCTATCTCAAAAAAAAAAAAAAAGAAA
+GTTGATCATTTGAGTCCTGTGCCTAATTCAATATTCAGAACAGAACAGTAGTAATGTTCA
+CATGCCACCTGTGGGGTGTATCCTCAGTCAGAAGTTTGGATCTAGAGGCAGTTCACAAGG
+CAAAGATTCAGTTCTGTCAAAACTCGCTTTGTAAATCTCTTAACAAGCCGGTAAAACACA
+GTGCTGGTTCACAGTAAGAGCAGTGCAGCCAGCTTGTCTATTTCTCTAGTTGGACATCAT
+CTCAAGCAGTTGGTTTGAGCCAGAGTGAAGAAAAGAAATCACACTCCATCTCTAAGCACT
+GGGAGCGTACTTGGTGTAGAGATGAGATGCTGCCACTAAGGAAGGCCAGCTGGAACAGAG
+GCTGACTGAGGAGGCACCAGCAGATTCCCAGCTCCCACCTCAGCCTTCTGCTGGGACACT
+AGCTGCGAGGAATGAGGGGCAGAACTACCCTCCCTGTTTTGCCCATTATCTTTTTCTTTT
+TTGCCAAGCAAATGGGGTTAAATTTGCTTAGCCTCCATGTGTTCACATGTCTCCCTTGTG
+AACCTAAGAATGTGCAAGCCACTAGAAAAGGATACATAACCTGGCCAGGCGCGGAGGCTC
+ATGCCTGTAATCCCAGCACTTTGGGAAACCATGGCGGGAGGATCGCTTGAGCCCAGGAGT
+TCAAGACCAGCCTAGGCAACATGGCAAAACCCCGTCTCTACTAAAAATACAAAAATTAGC
+TGGGCATGCTGGTGCACGCCTGTAGTTCCAGCTACTCAGGAGGGAGATGGAAGGATCACT
+TGAGCCCAGGAAGTTGAGACTGCAGTGAGCAGAGATGTACCACTGCACTCCAGCCTGGGT
+GACAGAGTCAAAAAAAAAAGAAAAGGATATATATAACCTGGTTTTTGGCCTCTAAAAATG
+TATGCTCATGGAGTAGACAAGATTTTTAAAAAGAATTATTCAATAACATAAGAAGCATGT
+AAAAATAAAACATATTTCTCTGGTGAAACAGAACCTAAAAAATATATATAACATACTAAA
+TTGTAAACTGGAATTGCCACAGGTCTTTCATTAACTTACCTACTGTAGGGACGGCAGGCA
+CAGTCTCCCCCAGCTTCAATTTATACAAGATGGTGGTTTTTCCAGCTGTATCCAAACTCA
+ATATAAGAATCCGCATCTTTTTTTTCCCAAGTAGACTTTTAAAGAGCTTTTCAAAAATGT
+TTCCCATTGTAATTTAATCGAATTCTGTAACATGACAAGAAAAGCAAGCCAAATAATTGT
+TCTCTGCAGAGATATAGAAGCAACACGCAGAAAATACAAGTGTCAATTAGAAATGGCTCC
+CAGGAAAAATTCAACAATTTTAGGTACACTTTGGCCAATGAGTAGTGAGTCTTGTCTTCA
+AAGGGAAACTCAACTAAACCTTTCCATGCCTTACCAGAAAATTCTATTTTCCACTGCAGC
+ATCCAAGAGGGATATGCAACATAGCCTCAGCACTGCTATGACAGTGGCAGCCTTTAGTCA
+TTTAGTACAGAAGTACAAGGCCAGGTGCGGTAGCTCACGCCTGTAATCCCAGCACTTTGG
+GAGGCCAAGGTGGGTGGATCACCTGAGGTCATGGGTTCAAGACCAGCCTGGCCAACATGG
+TGAAACCCCGTCTCTACTAAAAACACAAAAATTAGCTGGGCACGGTGGTGCACGCCTGTA
+GTACCAGCTACTCAGGATGCTAAGGCATAGGAATCACTTGAACCTGGGGAGGCAGAGGTT
+GCAGTAAGCCAGGACCACGCCACTGCACTCTAGTCTGGGCAACAGAGCAAGACTGTCTCA
+AAAAATTAATTAATTAATTAATTTTTTTTAAAAAACAAGTACAGAAGCAATGATACAACA
+AATACAACTATACCTCATTCTAAATTCTACTGAAATGAGAATTTATGAAACAGAGAAGCT
+TTTTATTTTATGCAATATAAAACAAAGTCTTTAAGACAGGTTAAACAACAATATCCCTTG
+GAAACGTAGTAGTTCCTTATGGCTAGCAGACACCTTGGAAATAACCTAAGAAATTTCAGA
+GGCAAAGAAATTCAGCGTTGGGAAGCTTCACTGCCAAGAGAGATAACTTGGCAGAGTTGA
+GGAGGCACCACAGCATCCAGTTTTCGTTTGAGTTTTTTGTTTTGTTTTGAGACAGGGTCT
+CGCTCTGTCACCCAGGCTGGAGTGCAGTGGCACAATCACGGCTCACTGCAACCTCAACCT
+CCCCATCTCAAACAATCCTCCCACCTCAGCCTCTGGAGCAGCTGCGACCACAGGCATACG
+CCACCTCGCTCAGCTAATTTTTGTATTTTTTGTGGAAATGAGGTCTCACTATATTTCCCA
+GGCTAATCTCGAACCCCTGGGCTCAAGTGATCTTCCCACCTCAGCCTCCCAAAGTGCTGG
+GATTACAGGCGTGAGCTACCAGGCCCAGCCAGTATCCAGTGTTCTATTGCTGTAACTATC
+ACCTTACTCTCAACCAGTTCTTAGAACTTTTTATTTAAGTATAATTTACATACAGATATA
+AACATACATGATAACTGTAAAGTTCAATACATTTTCATGAACCAAACACACCTATGTAAC
+CAAATCAAGAAAAGATGATTTATCAGGCTGGGCACAGTGGCTCATGCCTGTAATCCCAAC
+ACTTTGGGAGACCAAGGTGGGTGGATCACCTGAGGTCAAGAGTTCGAGACCAGCCTGGCC
+AAAATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCTAGGTGTAGTGGCACAC
+ACATGCAGTCCCAGCTACTCAGAAGGCTAAGGCACAAGAATCACTTGAACCTGGGAGGTG
+GAGGTTGAAGTGAGCCAAGATTGGGCCACTGAACTCCAGCCCAGGGCAACAGAGTGAGAC
+TCCATCTCAAAAAAAAAAAAAAAAAAAAATGATTATCAGACCAATTCTCCTTGTGCCTCC
+TCCCAGTCATTACTCCATAAAAGGTAATCACTATATTGACTTCTAACAGCATAAATGTTC
+AATATTTTATTTTTCAGTAGAAATCATGGTAGGTTATATAAAGTAAGTCAATAGTAAAGT
+CAGACTAACTTACAGAAATTTACAGTCTCTTTGAGATCAGCCTACTTGAAATGTGAGCTT
+GACATTCTAAGTAAAGACTTACAAGCAGTGCTGGGTCTACACCTCTAACTAAGCTCTTGG
+AAGTCTGTAAAGATCTTCAGACCAGGCAGGGTGGCTTACCCCTGTAATTCCAGCACTTTG
+GGAGGCTGAGGCAAGAGGGTCACTTTAGTTTTTGCTTACTGAGATAACAGAAGACAAGGG
+AGGATTGCTTTAGGCCAGGAGTACAAGACCAGCCTGGGCAACATAGCGAGACCCTGCCTC
+TATTTTTAAAAATAATAAAACAAAAAAAAATTCAAAAAGAAAAAGACATACATCCTTGTT
+CCAGAGACTGTGATTTATTTTCTTTTTAAATTGCTAACCAAAGAAAGCTTTACTCCTGGG
+CAGGGGAGGAGGAAGACACAAGGAGGATAGCCAATCTCCCATTCATTCGGACTCTGGTAA
+CCACAGAGCGGAGGTAACCAAATTCCACAGGCCTCACCGCTGAAATTCAACAGTGGCCAG
+CAGTAGCTTAATATCATGGATTACGCCCAAGTACACAACATTCCTTTCTAGGATGAGAGA
+AAGCGAGCACAGAATTTATCCTGTTCACAAATGTGGCGTAAGTGACCAAAGAATGATAAA
+TGTTTTAATCATTCAAAGAAAGTCCATTATAGGTCTATAAAAGTCCAGTGCATTGCAGCA
+CTCAAAATCTTGCCGGGCGACAAAACTAGAGATTCCTTTTCTTTGTTTCGAAAAACAAAT
+CTCACAGATGAGGAGACTTTTCAGGAACATGCTGGGGGAGGGAAGGTACTTCCCGCTCAG
+ATGCCCCTTGAGGCAGCTAGTCCCAATTCCCCCTGCCAACCGCACAGACACACCACCTTG
+ACCTCCCTCTCCCTTCTCTTTCCTCTTATTGCTACTCATGTGCCCCAGGCGCCCGTGGTC
+CTGACACCCCTGCGCCGCTGGAGACCCCTCTAAGGTAAGATGGCTCCGACTACAGGGCCT
+TTTTGGCACGCTCAGTCTCCCAGCCCCAAGCGCCAAAAGTGCATTAGGGAGAAACGTCGG
+CACGACGTCAAGGGCGCGAACAAGCGTGGTGGCCCAGGTGAGCGTGCGCGGCAGCCAGGC
+ACGCCCGGCTCCCGGGGAAAGACGCCCCTTTTTTGCCCCGGCTGCCAGGCCGCTCCTTCT
+CAACTTGTGCGCCCCTGGAAGAGCAAAGAGAGGGCCCTGCGTGGAACCTATGAAGCCTCC
+TTCGGTTCCTCGCTGCTCCGGCCCTAGAGAGGGCGAAGAAGGAACAGCGGAGCCCAATCC
+CTTTCCGAAAGCCACCGCCGCCCTCCAGTTTCTGGCCCGCAGACGAAGTGGAGATCCAGC
+CAGGTCTTGAGTGCTGCCGCCCCATCCCTGCAGCCGGAGACTCACCTGTTGCCACTCAAG
+GTACCGCTACCTACAACACCGCCGACTCCGCGGCCTTTAGGATTTCAGCTCAGTTCAGCT
+AAACCACGACAGGCGTGGGGGCAGGAACAGCAACCAACCAATCACCCACCGCCTCCTGGA
+GCTTTTGCACCAATGAGCTCGAAGTTTTGTGAGTGACGACATATCTGGCCAACGCAAAGA
+AGAGTAAGGGCCTGGGAGGGAGGGAGCGCCGTAGGCGACGCCATGAAGCTCTGGCAATAC
+CATGTTCATCTTCAAATCACAGTTAAACGAATTCTGGCGAGACACGCCCACGTCCCCCAC
+CCCCGATGCCATGTGCGACCAATCAGAAAAGCAAAAGGATTGTCTATTTGCACGGCCAAT
+CAGCTGGGAAAATCGCCGAGGTTTGAGCTAACCTCGGAGCGTTCACACCAACCGGGAGGG
+GACATGTGGGCCGGGCCAAGTTAATAGTGCCATGGAAGGAAATTTACCGCGGTTGAGTTA
+AACGTAGACATTAGTTTGGGGCGGTGTTCCGCGTAGGAAATACCACACACTGACACTGAA
+ATTAGGCATAAGGAAGTTTTCCTATTCCGCCTGAGG
diff --git a/tests/test_data/ARL17A_ref.fa.fai b/tests/test_data/ARL17A_ref.fa.fai
new file mode 100644
index 0000000..4577ca0
--- /dev/null
+++ b/tests/test_data/ARL17A_ref.fa.fai
@@ -0,0 +1 @@
+chr17_46552256_46580191 27936 25 60 61
diff --git a/tests/test_data/del_f8_realigned.bam b/tests/test_data/f8/del_f8_realigned.bam
similarity index 100%
rename from tests/test_data/del_f8_realigned.bam
rename to tests/test_data/f8/del_f8_realigned.bam
diff --git a/tests/test_data/del_f8_realigned.bam.bai b/tests/test_data/f8/del_f8_realigned.bam.bai
similarity index 100%
rename from tests/test_data/del_f8_realigned.bam.bai
rename to tests/test_data/f8/del_f8_realigned.bam.bai
diff --git a/tests/test_data/f8_del_genome.bam b/tests/test_data/f8/f8_del_genome.bam
similarity index 100%
rename from tests/test_data/f8_del_genome.bam
rename to tests/test_data/f8/f8_del_genome.bam
diff --git a/tests/test_data/f8_del_genome.bam.bai b/tests/test_data/f8/f8_del_genome.bam.bai
similarity index 100%
rename from tests/test_data/f8_del_genome.bam.bai
rename to tests/test_data/f8/f8_del_genome.bam.bai
diff --git a/tests/test_data/f8_inv_genome.bam b/tests/test_data/f8/f8_inv_genome.bam
similarity index 100%
rename from tests/test_data/f8_inv_genome.bam
rename to tests/test_data/f8/f8_inv_genome.bam
diff --git a/tests/test_data/f8_inv_genome.bam.bai b/tests/test_data/f8/f8_inv_genome.bam.bai
similarity index 100%
rename from tests/test_data/f8_inv_genome.bam.bai
rename to tests/test_data/f8/f8_inv_genome.bam.bai
diff --git a/tests/test_data/inv_f8_realigned.bam b/tests/test_data/f8/inv_f8_realigned.bam
similarity index 100%
rename from tests/test_data/inv_f8_realigned.bam
rename to tests/test_data/f8/inv_f8_realigned.bam
diff --git a/tests/test_data/inv_f8_realigned.bam.bai b/tests/test_data/f8/inv_f8_realigned.bam.bai
similarity index 100%
rename from tests/test_data/inv_f8_realigned.bam.bai
rename to tests/test_data/f8/inv_f8_realigned.bam.bai
diff --git a/tests/test_data/ikbkg_ref.fa b/tests/test_data/ikbkg_ref.fa
new file mode 100644
index 0000000..ae6c441
--- /dev/null
+++ b/tests/test_data/ikbkg_ref.fa
@@ -0,0 +1,325 @@
+>chrX_154555600_154575000
+GACAGAGTTTTGCTCTGTCCCCCAGGCTGGAGTGCAATGGTGTGATCTCGGCTCACTGCA
+ACCTCCACCTCCCAGGTTCAAGTGCTTCTCCTGCCTTAGCCTCCCAAGTAGCTGGGATTA
+CAGGCGTGTGCCACCACACCGGGCTAATTTTTGTATTTTTATTAGAGACGGGGTTTCACC
+ATGTTGGCCAAGCTGGTCTGGAGCTCCTGACCTCAGGTGATCCACCCACCTTGGCCTCCC
+AAAGTGCTGGGATTACAGGCATGAGCCACAGTGCCTGGCCAACACGTACTTTTAAGTGAA
+GCTGATGTGTTTGGTGTTATTTTCTTGCAGAAAGTGAGGGGCATTAGTGTAAAGGATTTT
+GGAAGTGTTTAAAGAAACAAAAGGGAGTGTTGAGACGCCATCCACCCCTGAGAGAAGCTG
+CGTGGTATTATGGCGGGTGGGGGCACCAGGATGGGTGGCCCCACTTCTGGCCTCTGACTT
+CCTGAGCCTCAGGCCCATGTGGGCCCAGGCAGGGCCCGGCAGGCCGGGCTGCCCAGCTCC
+CCTCCACTGTCCCCTCTGCCACCAGATGCCATCCGGCAGAGCAACCAGATTCTGCGGGAG
+CGCTGCGAGGAGCTTCTGCATTTCCAAGCCAGCCAGAGGGAGGAGAAGGAGTTCCTCATG
+TGCAAGTTCCAGGAGGCCAGGAAACTGGTGGAGAGACTCGGCCTGGAGAAGCTCGATCTG
+AAGAGGCAGAAGGAGCAGGCTCTGCGGGAGGTGGAGCACCTGAAGAGATGCCAGCAGGTA
+GTCGGGGCAGGGCCAGGTTCTGAAAACCCGCGGTGACGCCAGTGTTCCACAAGGGAACCC
+GTGGTCGGGGTCCCCCAAAGCACCCTGGGGCTCAGTGCTGTGCCGGGAGGGCTCGGAACT
+CAGAAAAGCCGTCACACTCCCAGTTCCGGTTTATTACAAGGAAAGGACACAGGTTACGGT
+GAGCGAAGGCTCAGGGCGCACAGGGCGGGCTCCAGGAGAGACCAGGCGTGAGCTTCAGCG
+GCTCCTCGCCCAGGGGAGTTGTGCAGACGGCACCTGTTTCTTTCGGCAACAGTGTGGGAC
+AGCGAGCACGGAGTCACAACCGGGAAGCTCACCCCAGCCGTGGCGGCCGGGGTTTTCACG
+GGGGGTGGGCCGCGTGGGCACCGAGCGCCTGCGTGGCCAACCCTGGTCACTCGGCTGTAG
+CCACCAGAGGTCCAGCTGTGTGGCCCAAGGCTCCCCCCATAAATCGTGTCATTAGCACAG
+ACCGCCTGGTTTCAGGGTCTTTGTGTGTGGGCTTGGCTGATCGCAGGATCCTGGCGATGG
+TAGTCAGGAAGGGGCCGTGCTCCCTTTGAGGGGCAAGGTGGAGAGAAGTGCTGGAGAGGA
+GACTTGCTGGCGGGTACCTGGCACTTGCCACAGCCAGGCTCCACTCCCCTGGGGAAAGGC
+GTGGATGGTGGGCTGTGCACGCCGCTCCACTCAGGGCTTAGAGCGCCTGGCTTAAGGCGT
+TGATTTCCTGTGTGGGAAGTGGATGAGTTTTCTACAGCTGCCGTGACCAAGCACCACAGA
+CTGCGGGGCCGAAGCCACAGAAACGCGTGGCCTCCCGCTTCTGGAGGCCTGGAGGCTGAG
+CTAGCGGTGGTGTCGGCAGGGTGGGCTCCCCGCCAGGGCCGCGAGGGAGCTGCCTTCCAG
+GCCTCTCCACGGCGCCGGGGGCCGCCGGCTGCACCTCTCCAGCCTCCATCTCCGTCATCC
+TGTGGCCTTGTCCCCGCGGGCCTCTGTGCCTGTCCTCCTCTTTTGACAAGAACACCGGAG
+ATACACAAAGGTACACAAAAGCGGGCCTTTGTTCAAGCTGGCAAAAGAGATCTTCTTCAG
+AAACCCCTGCTTGCGGGGGAGAGAGCTGAGCTCCGTTCCCGCCCCAGCAGAGGCGGCCTG
+GCCTTGCGAAGGGAGAAGGAGGGAGTCGGGAGGGGGCGAGTGCAGGCTCAGGTGAAAGAT
+GACGGGGCAGCCAGCGTCCTTGCCGCGAGGCCAGCCGTGTGTGGGAGCTGCCGGTGCTTA
+CCAAGGTTGGGATGCTTCCGTCCCGTGGAGACTGGGAGACTGGGCCCCGCGCCTCCTGAG
+GTTTCCGTTTCCAAGGAGTGGCTGCGGGGCCCTCGGGAAAGCCCCTGGGTTGTGGGTGCT
+ACACAGATGTCTCAAAGGGACAGGGTAAGCCCTTTGTAGTAAATGCTGTCAGAAAGGGAG
+GTCAGGTGTTGGCCGGAACAGACAGTACATGCTCTGGGCAGCCCTGAGCGTTTCCAGACG
+GGAACTCACTCAAAAGGGGGCTGGGGCGTCCCAGGGGCGCGGCCTTAGGCTCCCAGAGGC
+CCCGCGAGGTGGTGGCCGGGTGTCTTCGGGCAGGGGTTTGAGTGCAGTGTGCCTGCCGAG
+AGGTTCTGCAGTTCCCAGTGTTTAACAAAATTCAGTGTCCACTCTTGATCTGCACAAACT
+CTCCCATCCTGGCGGCCCCGGGTGTGGACTGGGGCCTGTGTTTACTTTGCCCTATTCGTG
+TCTGGCCTCCTTTTGTCCCAAGTCTCAGAGAGACGGAGAGAGATCCCCTGCTGGGGCTGT
+AGCTGCAAGGCCACCGGGTTCAGCCCTCGAGGCCTGCTTGCCGGGGCAGTGACTAAGCCG
+TTGACAACCTCAAGGCAGCTTTGTGCTCCTTCGTCTCTTTGGGGATCTCTTTTTGCCCCA
+TCTGTGTGTCACCCTGTGGCAGAGGGTTAAGGTGGGCAGCTGGGGAGGGTTGGGTGGCCC
+TTGGGCTCATGAGGCCCTAGGGCACCCAGGTTTGGGGGTGCCGAGGGCAGGAAAAAAGGC
+CTCATGGCGCGCAGGCCTCAGCCGCTTGCGGGTTGCCCCGGGCTTGCGGATGGCAGGAGT
+GGGCCGCTGGGGAGAAAGCAGTGCTGACAGGAAGTGGCTTTTTATCCTGCAGCAGATGGC
+TGAGGACAAGGCCTCTGTGAAAGCCCAGGTGACGTCCTTGCTCGGGGAGCTGCAGGAGAG
+CCAGAGTCGCTTGGAGGCTGCCACTAAGGAATGCCAGGCTCTGGAGGGTCGGTGAGTCGG
+GGGAGCCGGCTCCGGAGACCCCTTCCAGGGTTTCCAAAAGCAATGAGGTGGGTTTAGGGG
+CCTCCAGGGTGCTCCTTGATGAGGATAGACCGGGGCAGGCTGCGTAAAGACGTCGGGGCA
+GACGTCGGGAGAGGTCTGGGCCAGGCATCCGGGACCTGGGTCCCAGCCGGCTCTCCGCAC
+TCTGTGACCCTTTGATGGAGTTTGGATTATTTCCTTAGGAGGCATTCTGGGGGCCCCGAG
+CCCACACCCACAGTGTCTAGTTCTCTGGAAGGACTTCTGGGACCGGCGCACAGTCGCCCT
+CGTGGCTGAGGTTGATGACAGGGAAAAAGGCACAGGGCAGGAGCCGCCAGGGCAGGAGCC
+GTGGGGGGAGTTGGAGAAGCCCTGTCCCAGCCTCCCGCTGCCCTGCGAGCGGCACAGTGA
+GAAGCGCCTCCCACACGGGCCGTGTTCCTGCCCAGGGATGCCCGCTAGAGACTCAGCGCC
+CTGGGTGTTTACTGGGGGCAGGTCCTGTCAGCGCCCTCTGCCAGGCAGGTACCTAAATCC
+CCGACTCCCAGCAGCAGAGCGGGTGCTCACGTCAACCACGTTCTTCCTACAAATGGCCTA
+GGTGCAGGGACCGACCTGACCACTAGAGAAAGTGTCACTGTGGCAAGGGAACTTCACCAG
+CCAAGGGCCAACCTTGCCAGCCGGTGGCCTCAGGCCTGCTGGTTACTCTCTTTTGCAAAG
+GGGTCTTGGTTCTTGTGAGTGGGACCATTGGGTCAAAGGGCAGGGAGGTTTCTGTGGTTC
+TCATTCGGTCCTGCTTCTGCCCTCCAGACAGATGGATCAGCTGCCAGGGGGGCCCCAGCC
+ATCCCAGCACAGTAGGCGGTCAAGGTGCACTTGGGGCAGCCAGCAGGGCAGAGGGGAGGG
+GAGCTTGACCCAGGCTCTGATGGGCAGAGGGAACCCGTGCAGGGTGTGGGGGCAGTATGC
+AGGCAGGCGCGGAGGGGAGAGCCAAGCAGCCAGGCCTGCCAGGCAGAGTTGGGGTGACTG
+GAGAAGGGCCGTGTCTGCCTGTGGCCAGAGGCCACCCAGGACCTGGACAGATGCACCCAC
+CATTGTCCCTGCAGTGAGGCTGTGGAAGGGCTTGGTGTGGTGGGATGAGGCCAGACCCTG
+GAAACTGGAGGTTAAGGGAGCTGTAGGGGGGCAGGTGTGGGAACTGAGCATCCGAGCAGG
+TCGTCTGGGACTCCAGCAGAGCTCTGGGCAGCAGCAGGGATGGGGCCGAGGCCCGGTCTG
+CATTGAGCTCAGTGCTTGCACGCCCAGGTGGGCAGTCTCTCATTTTTGGAACAGCAGTCT
+CTCCTGACCCCCTCCACTGAGACTGCTTTTGCTGGGGCCCCCAGCAGTCCCCCAGTGGAA
+CTCCACGGGCAGTTCCGAGGGCTCCTCTCACCTGGCCCCAGCACTGCGGGATGCAGGCGA
+CCCCATCCTTTTCTCGGACCACCCCCTTCCCCTGGCTTCCAGGTCTCCTTGCCATCTGTA
+CTTGGTCACCTGCTGGGCCCCTGCATTGAAGCAAACACGTCTTAAGCAAAGCTCCTCACC
+TGCTGCTCCCACCTGGCCCCCTGCAGTTGTCCTCGTGTCTGTTGACGGTGCCTCCACCCT
+GCCGCCTGGCATCAGCTCGCAGTCACAGGGTGTTCAGAGCCGACCCCCACCCCCCGCCCA
+CGCCCTGCGCATAGCCCCTGCCGTCCCCCCGTTCGTCCTCCCTGAGTCTGCTCTTTCCCC
+GTGCCAGGGCCCGGGCGGCCAGCGAGCAGGCGCGGCAGCTGGAGAGTGAGCGCGAGGCGC
+TGCAGCAGCAGCACAGCGTGCAGGTGGACCAGCTGCGCATGCAGGGCCAGAGCGTGGAGG
+CCGCGCTCCGCATGGAGCGCCAGGCCGCCTCGGAGGAGAAGTGAGTCAGCGGGGGCGGGG
+CCGCACCGCAGGGTCTGTGGTTCTACACTTGATCTTAGCCGAAAGGCTGAGAAGTGTCGG
+GTCCATGGTTCTTTCTGCCTTCTGAGGACTCCTTCAGATTCTGCCTGTGGCTGTGGGCCC
+ATTCTGTCCCTTAGCCTTGCTAACGGTAGAGGCGACCATGATGACACCCGGTTTGTCTTT
+GATACAGTCATGCCATCTGCTCTCCAGACCACGTTTCACTGCGTGTCCACACGTGGCCTT
+TTTTGTAGTTTTTTTTTCCTAGCCACTAGGTCATCAGGGGACTTGTCCTTTAAAACCCCT
+TCTAGGCCAGGTGCTGTGGCTCACGCCTGTAATCCCAACACTTTGGGAGGCCAAAGTGGG
+TAGATGGCTTAAGCCCGGGAGTTCCAAGACCAGCCTGGGCAACAGAAAGACAACAAAAAT
+ACCCCCAAACCCCCCCGTCTACCAGCATCCAATCTGGGACCTCAGGTTCCTGTCCTTGGC
+GTGCCTTTTCAGTCTCCTTTAATCTAGAACAGTTCCCCTGCCTTTCTGAGCTGTTTGTGA
+AGTTCACAGTTTTGAACAGTGCAGGGTAGTTCCATTGTATTATTACTATTATTTTCAAGA
+CAGGGTCTTGCTCTACCGTCCAGGCTGGAGTGCAGTGGCATAATCTCGGCTTACTGTACC
+TTCCGCCTCTTGGTCTCAAGCGATCCTCCCAGGTAGCTGGGACTATAGGCGCAGGCCAGC
+ACACCTGGCTAATTTTTGCATTTTTGGTAGAGGTGGCGTTTTCCTATGTTGCCCGGGCTG
+GTCTTGAACTCCTGAGCTCAAGCGATCCTCCTGCCTTGGCTTCTCAAAGTGTTGGGATTA
+TGGGCGTGAGCCACCGCGTCTGGCCGCGATTTTATTATAAACATTAAAAATACTAGCTTT
+TAGGAAAACGATATTAACTGCCTGGTGACCAGCCCACCAAAGCCTGCTTTAGAGTTGACG
+GCCTCAGGAGTCCTCACACAGCCTTGGAAGACCCCATTCCAGGCCTGTGATGCGAGGGAG
+GGAAGGAAGGGGGTAGAGTTGGAAGCAGGCAGCACCGTGGCTGGACTGGCATGAGGTGGT
+TTCTCCAGCAAAAGCTCCCTTTCCTCAGGAGGAAGCTGGCCCAGTTGCAGGTGGCCTATC
+ACCAGCTCTTCCAAGAATACGACAACCACATCAAGAGCAGCGTGGTGGGCAGTGAGCGGA
+AGCGAGTGAGTGCGACCACTGGGGCTCTAGGGCTGGCCTTGCCTCTTCCTCTCCCCGTGG
+CCCTGAACCTTGAGAATGGGTAGACCTGCCTTAGACTTGCCTTAGACCTGTGTCAGGCTG
+CAGCTGCGACAGCTCAGGGAAGCTGTGGGGAGATGGCAACCCCAGGATGTTGCTCTCAGG
+AGTGTCAGCAGGCCATCTTAATGGGGGGCTGGGCCAGAGCCTTGGGGTGCTCCCTCTGTG
+GGGCTGGGGACGTCTTGTCTCCATGGACATTCCCTCTTGCCAGCCATCGCCATCTGGCAC
+CTGGCTCAGCTTCCCCCAAGCCAAGGTAAGCCCGACAGCATTTCCACCCCAGTGTTGGCT
+GGGAGCCTTTTCCTAGTTTGTCCTCATCAGACCTAAGCTGGGGTGCAGTTTGCTAGTGAT
+CACATTTTAGCAGGACACCGTCAATCGTAAGTGTACCCAGAGGAGATTTATAAGGACAAA
+GCCTGAAGCCAGGTCACATGGGGAAGAGTTAGCTACAAAACTGGCCACTTAATCTCTGGA
+GGGGGGCGTTGGTGGGGTGTGTCTGTGTGTGTCTCAGGGGGCTGGAGATGCCTGCGTGGG
+AGGAGTGCACCTCTGACCAGGTGGCAGAGTGGAAGGACTGAGGGCTCTCAGCTGAGCTGT
+GCACATGGCGGGCACAGGACCGGCTGGCTGTGAGTGGGTGTGGCCTGTGGCCTGTGAAGG
+GTGGGAGGAGGGCTGTGGAGCTGGGGATTCTGGGAAGGGAATGTCGGCCCAGCTGGGAGG
+TTGTACCAGATGACCTCAGCGGCCTCTTCAGTCCTGAAAAAAACCTCAGCATCTCCTCTG
+TCGTTTTGGGCCGTGACAGGACGCAGCCATCTCCCTGTGCACGCTGAGATCCTGCAATGG
+GCCCTCAAATCAGGGGCTGGCATCACCCAGCCTGGTCAGCCAGGGCCACTCTTTCATCCT
+TCTCAGTTCTTCTCAGCCAGCCTCGCCCTGGGCTGACGAGGCTCCGTCAGCTCCCCTTGC
+CCGTCCTTAGGGAATGCAGCTGGAAGATCTCAAACAGCAGCTCCAGCAGGCCGAGGAGGC
+CCTGGTGGCCAAACAGGAGGTGATCGATAAGCTGAAGGAGGAGGCCGAGCAGCACAAGAT
+TGTGATGGAGACCGTTCCGGTGCTGAAGGCCCAGGTGAGGGCCCTCCTCTCTGACCCACC
+CTGGCACTGGGACCTGGAGAGTCTCTTTGGCGTCTTTTTTTTTTTTTTTGCTTTTGCTTT
+TTGAGATTGAGTTTTGCTCTTGTTGCCCAGGCTGGAGTGCCACTAGTGGCACGATCTTGG
+CTCACTGCAACCTCTGCCTCCCGGGTTCAAACAATTCTCTTGCCTCAGCCTCCTGAGTAG
+CTGGGATTACAGGCGCCTGCCGCCATGCCCGTCTAATTTTTGTATTTTTAGTAGAGACAG
+GGTTTCACCATGTTGGCCCAGCTGGTCTCGAACTTCTGGCCTCAGGTGATCTGCCCACCG
+CAGTCTCTCAAAGTTCTGGGATTACAGGCGTGAGCCACCGCACCCGGCCTCTTTGGCATC
+ATTTTGTAGTGGCCTTTCGTAAGCTTCTGAGCCACTTGTGCTGCTCCTTAGACCTCTCGG
+TGAGCTTGGCATTACTCGCCGACGTATCTGTTTCCTCTGCGCCGCTGGGGGCTCTGGGAG
+GACAGCAGTGGGTTCTGCTTTGTTCCTGTGGTGCCTGGCGCAGTGCCTGGTGGGTGGCTG
+GCTTGTGGCGGGCACATCCCTTTCTGTTGGATTTGCCAGGCGGATATCTACAAGGCGGAC
+TTCCAGGCTGAGAGGCAGGCCCGGGAGAAGCTGGCCGAGAAGAAGGAGCTCCTGCAGGAG
+CAGCTGGAGCAGCTGCAGAGGGAGTACAGCAAACTGAAGGCCAGCTGTCAGGAGTCGGCC
+AGGTGGGCCTCTGAGAGCGTGCCCGTGTGAGCAGTGGGTGCGACACTGGGGGGTCGCCAG
+TGGTGACCCCGCAGTGGGTGCGACACTGGGGGGTTGCCAGTGGTGACCACAGGAGACGGA
+TGGCTCCTGGTGTTCTGGGTTAGGGCTCACTGTGGTCCCTCTCCTCTCACCTGAGCTTCC
+AAGAGCTGCTTTGACACTAGTCCAGCCAAGGAGCTTTACAGAAATGCGTGGCTTGACTGG
+ACGGTTTCTGTTTCCAAAGGATCGAGGACATGAGGAAGCGGCATGTCGAGGTCTCCCAGG
+CCCCCTTGCCCCCCGCCCCTGGTGAGTGAGCGAGAACTGGGCCTGCGGGAGGAGGTGGGT
+GGGGAGGGCAGGTGCTGCGCCGCGGGAGGTCACAGTTCGACCTTCCTGTTGCTCTCTGGA
+GACTTGACGGCGGGAGCTCGTGTAGGCCACCCCATCGGTAGCCCACCCCCTTCCCCGAGG
+CTAAGGGAGGCATGCCGTGGTAGCGGCGGCTCCTGGTCTTACATGAGTGGCCTGTGAGAC
+CAGGCCTGCCATTGACAGTCCTGCCAAGTCTCCGTCCCCCTCCATCCTCCCCTTCCCTCT
+GACTCTTCTCTTTTCCCAGCCTACCTCTCCTCTCCCCTGGCCCTGCCCAGCCAGAGGAGG
+AGCCCCCCCGAGGAGCCACCTGACTTCTGCTGTCCCAAGTGCCAGTATCAGGCCCCTGAT
+ATGGACACCCTGCAGATACATGTCATGGAGTGCATTGAGTAGGGCCGGCCAGTGCAAGGC
+CACTGCCTGCCGAGGACGTGCCCGGGACCGTGCAGTCTGCGCTTTCCTCTCCCGCCTGCC
+TAGCCCAGGATGAAGGGCTGGGTGGCCACAACTGGGATGCCACCTGGAGCCCCACCCAGG
+AGCTGGCCGCGGCACCTTACGCTTCAGCTGTTGATCCGCTGGTCCCCTCTTTTGGGGTAG
+ATGCGGCCCCGATCAGGCCTGACTCGCTGCTCTTTTTGTTCCCTTCTGTCTGCTCGAACC
+ACTTGCCTCGGGCTAATCCCTCCCTCTTCCTCCACCCGGCACTGGGGAAGTCAAGAATGG
+GGCCTGGGGCTCTCAGGGAGAACTGCTTCCCCTGGCAGAGCTGGGTGGCCGCTCTTCCTC
+CCACCGGACACCGACCCGCCCGCCGCTGTGCCCTGGGAGTGCTGCCCTCTTACCATGCAC
+ACGGGTGCTCTCCTTTTGGGCTGCATGCTATTCCATTTTGCAGCCAGACCGATGTGTATT
+TAACCAGTCACTATTGATGGACATTTGGGTTGTTTCCCATCTTTTTGTTACCATAAATAA
+TGGCATAGTAAAAATCCTTGTGCATTAGTCGTGCGTATCTTTGGCATAGATTCTGAGAAG
+TGACACCACTGAGCATGGGCGATGGCGTAGATGGTACCTGAGCCCCCTTCCTCCTTGGAG
+CTTGGTTTCCCATCTCTCCCCACCCCCTATTTCCCTAGCCTTGCCAAGGAGGAGGTGGGA
+AAGCCCGTTTGGGTTTTTGTCATTCGCTAGGCCATGCAGTTCTCTGTTAAGAGTGAGCTT
+AAACATCTTTCCTGAGGCTTTAAGGACCTTTTTTAGTTCTGCTTCTGAATGGGCTGCTCA
+TATCATATATATATATGTATATGTATAGTTGTGTATATGTATGTGTGTGTGTGTGTGTGT
+GTGTGTATTTTTTTTTTTTTTTGAGACAGAGTTTTGCTCTTCTCGCCCAGACTGGAGTGC
+AGTGGCGTGATCTCAGCTCACTGCAACCTCTGCCTCCTGCGTTCAACCTATTCTCCTGCC
+TCAGCCTCCCTAGTAGCTGGGACTACAGGCGCCTGCCACCACGCTCGGCTAATTTTTGTA
+TTTTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCTCA
+CGTGATCCACCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGCCACTGTACC
+TGGCCAATTTTTGTATTTTTAGTAGAGATGGGGGTTTCAGCACTTTGGCCAGGCTGGTCT
+CAAACTCCTGACCTCAGGTGATCTGCCTGCCTCGGCCTCCCAAAGTGCTAGGATTACAGG
+TGTGAGCCACTGCGCCCCGCTGGGCTGCTCATATCTTTTATCCGTTTTTCTACTGGGTTA
+TCTTTTGTTGCTCAGCTTTTAGAAACTCTTTGTCCATGAGCAGGATTGGGTTTGTGGCTG
+TGATAGAAATTGCAGATATCTTTTCCTGCTTTGACCTCGCTACTAGTGTTTCTTTGTTTT
+TTGTTTTGAGATAAGGTCTCTGTCACCCGGGTGGGAGTGTAGTGGCACAATCAGAGCTCA
+CTGCAGCCTTGACCTCCTGAGCTCCAGCAGTCCTCCTGCCTCGGCCTTGCGAGTAGCTGG
+GATCACAGGCGTGCGCCACCACACCCTGCTAAACTACTGGTGCTTCTCAGCCTCGAAGTT
+TTTTTTTCTTTATGTAGTCACATTTATCCATTTTTTCCCCCTTTTACTGCTTTGACTTTG
+GAGTCGTCGTGACCACCTTGTGGTCTGTGGGAGGGATGCTTCACACATGAGCACTTTTCT
+TTTTCTAACACCGAAGCGTGCTCTTGTCTTCACCAAGAGTGTTGTTGCCCAGTGTGACTG
+GTGATGGCTGTTTCATCATATTCCAGAAGTTTGTGATGCCATTTCCCACACTGGCCCCAA
+TTAGGAAGAACAGACGACTCTACTTGTTTTGCAAGATTCTGGAAATTTCCCGGGGTGTCA
+GAGCCCTGGGCATTAACCCTGCAATTGTAGCAAAGGAAACAAAGTTGGAAGTTGACTCCC
+CAGCACCCGGGCCTTCCTTCCTGCCCTCCTCCAAGCTGCCCTGGTTGGGAACGGGATATT
+GTGAATCCACCCTGAGGCTTCCTCACAGGGCTTCCTGCTATTTGACAGTGTGAGGCTGGC
+TAGAATTGGGATATGAGGCCTTCTAAAAATTAACCTGGCCTGAGGGCTTTCTCAGCATCT
+GGGGGCGGGGTGTGGGGGCGGGGAGTGACCTTTCCCCTTCTTCAAGCCAGGTGCCCATCA
+GATTCTACTTGGGAAAATGTGAAAGGCACAGACAGCTTGTCTGCTCACGGGTGCTGCACT
+TAAATCCTAATCTTGCAATTTTTCTACAAGAGCACTTTACATTTTTTTTTTGGAGGGAGG
+GTGTATGAGTAGACGGTGGCTGCTGTAACAAAGGACTGCACACTTGGGGCTTAAAACAAC
+AGAAACTCCTTCACAGTGTTGGAAGCCACAGTCTGCAATCAAGTTGTCAGCAAGGCTGGT
+TTCTTTTTTCTTTCTCTTTCTTTCTTTCTTTCTCTTTCTTTCTTTCTTTCTTTCTCTTTC
+TTTCTTTTTCTTTCTCTTTCTCTTTCTTTCTTTCTTTCTTTTTTCTTTCTTTTCTTTCTC
+CCTCTCTCTCTCCCTCTCTCTCTCCTCTCTCTCTTTCTCTCTTTCTTTCTTGTTGGATAC
+AGTCTTGCTCTGACACCCAGGCTGGAGTGCAGTGGTGCCATCTCAGTTCACTGCAACTTC
+CGCCTCCAGGGTTCAAGCAATTCTTGTGCCTCAGCCTCCCGAGTATCTGGGATTACAAGT
+GCCCGCCACTATGTCCGGCAAGGCTGGTTTCCTCTGCAGGTGCCAAGAGAATCGTTTACA
+GGTCTCTGTCTTAGCCTCTGCTGGTTGCTTATAGAGCCATCACCCTAGTCTCTGCCTCTG
+TCATCACACAGTCTCTTCCTCTTCGTCTGTGTCTTCCCTTTTGTCTCATAAAGGACACCA
+GTCATTGGATGTAGGGCCCACCTGGATAATCCAGAATGATCTCCTCATCTCGGCATTCAT
+TACATCTGCAAAGACCCTTTTCCCAAAGAAGGTCACACCCATAAGATATGCACGTATCTC
+TTCAGGGCCATCACACAACCCAGTATGGGGACATGTTCCAGTGCACACAGATGAGGGCAA
+ATGCAGCAGCAGTGCCTGGGAGCTGGAAAAACACCCCTGCTTGGGCCCCCTGGACCAACC
+GAGTCAGAATCTTGGTCTAGACCCACCAGGCATTGGTCTAGACCCACCGCCCTGCCCGGA
+GGTGGGGCTGCCTCCTCCCTCCTCCGGTAGCACAGTGTAGGGTTGACCATACACCAGACA
+CCAAGGATTTAAAACCATACCATTTGCGCAGAAGTACGGCATTCCAGGGTTTTCCTTTTC
+GGCTTTACTTAGGTTGAGCTTCCAGAACAACTGGTTGGATTACTCTTGGGAGGGAACCAG
+CCTGCCTGCCCTATTGGCTGCGGCCCCTTGATCCACAAACGAAGCCCATCCCTGCCGGAG
+CTTGTGGCGTTCTCTGGTGTTCATGCGCGCTCCCCCTCCTGCCGACCAATGCACAAGAAC
+AGTGCGGGCGACCAGTAAGCACAGTGAGAAAGCATTTCACCCTCATCAGTAACTAAAACA
+ACAGCAAGTCTGACGGCAAGCAAAAAGTGATCTGAAAATGGCAACCGATGAAGCCCTAGG
+TGAGCGCTCTGCTGGCAAGGTGCAGGACAGCGGCCCTTGCAATGCTGCCCTTTCCGAGGG
+GGCCATTTGCTGGGATGGAGCAAGCCCGTAGGCAGCGTCCTCAGCCCTGGTGCGTAACCC
+TAAGAGAGAGTCGGGCACTCAGCGATGACTCTGGCATTGGGAAGGTGCTTCACTTCGTCA
+TTTGTAGCTGCCAAAGGTCGGGGTGAGATAAGAGGCTGCGTATGTAACGATGTGGTCCGG
+CACAGCTGGCATTTTGAACATCTCCTCACACGCAACATGCCTTGTAGCATGTTGGTGTGA
+ACAGGCCAGGTGACAAACTTATGGATAAAGCGTGACCCCAATAGAGAAGTAAAAGCTCTG
+AGTGCATACCCTGCTAACATTTGGACGGACGTGAGTAGCTGGGTGGCCAGTAATGAGAAG
+TTTTCCTTACATGTTAGTACCTTTCCTAAATGCCTATCATTTGCACGCATTCCTTTGATA
+CACAGAGAATACGTCTTCCCACACAGTCGCTCGGGTGGTAACGCAGCTTGGTTTTCTTCT
+GTGCCAGTGGCAGGGAAGAGCCCGCTGTTGACACAGCCTCTCAGCAAGGCACGGGGCAGG
+GGCTGACTGTGTCTCCTGGGGCTGCCGTGACCAAGCACCACAGACTGCGGGGCCGAAGCC
+ACAGAAACGCGTGGCCTCCCGCTTCTGGAGGCCTGGAGGCTGAGCTAGCGGTGGTGTCGG
+CAGGGTGGGCTCCCCGCCAGGGCCGCGAGGGAGCTGCCTTCCAGGCCTCTCCACGGCGCC
+GGGGGCCGCCGGCTGCACCTCTCCAGCCTCCATCTCCGTCATCCTGTGGCCTTGTCCCCG
+CGGGCCTCTGTGCCTGTCCTCCTCTTTTGACAAGAACACCGGAGATACACAAAGGTACAC
+AAAAGCGGGCCTTTGTTCAAGCTGGCAAAAGAGATCTTCTTCAGAAACCCCTGCTTGCGG
+GGGAGAGAGCTGAGCTCCGTTCCCGCCCCAGCAGAGGCGGCCTGGCCTTGCGAAGGGAGA
+AGGAGGGAGTCGGGAGGGGGCGAGTGCAGGCTCAGGTGAAAGATGACGGGGCAGCCAGCG
+TCCTTGCCGCGAGGCCAGCCGTGTGTGGGAGCTGCCGGTGCTTACCAAGGTTGGGATGCT
+TCCGTCCCGTGGAGACTGGGAGACTGGGCCCCGCGCCTCCTGAGGTTTCCGTTTCCAAGG
+AGTGGCTGCGGGGCCCTCGGGAAAGCCCCTGGGTTGTGGGTGCTACACAGATGTCTCAAA
+GGGACAGGGTAAGCCCTTTGTAGTAAATGCTGTCAGAAAGGGAGGTCAGGTGTTGGCCGG
+AACAGACAGTACATGCTCTGGGCAGCCCTGAGCGTTTCCAGACGGGAACTCACTCAAAAG
+GGGGCTGGGGCGTCCCAGGGGCGCGGCCTTAGGCTCCCAGAGGCCCCGCGAGGTGGTGGC
+CGGGTGTCTTCGGGCAGGGGTTTGAGTGCAGTGTGCCTGCCGAGAGGTTCTGCAGTTCCG
+AGCACCATCATTTTCTCCTCCTCAGACCCCTTGGTTCTCCTTCCACGTCCTGGCAGCTGC
+TTCGCAGGCTCCTTGCTGGTTCCTGTGTCTCCGAGCTGACTCCCGAATTCTCTTCCTCTC
+CTTGCTCAGACACTGCCCCTTTGTGACCTCGTCCATCTTCAAGGCCTTGATGCTGATGAC
+AGATTTCTGTCTTCCAGTCCTGATCTGTTCCTTCACGAAAATGAGCCCAGTAGCCCGTCC
+AAGCCAGAAATGGACTTTTAGCCCCCACCCCCTGCCAACTCTGCCCTTTCCCTCATCCCC
+AGTTGCTTAAACCAAAACGGATTCCTCTTCCTCTCATGATCCAAACTCCTGAGTCCCTTC
+ACCTTTTGCCTACACTATCACAGTGACCTCCTTGCTGCTTTCACACTGGAGAGCGTGGGC
+TCCCTGTGATCTACTCTCCACATGGCAGCCAGTGTCATCTGGTAAACCTTTGCTGAAACC
+CTGCCATGCCCTTCAGTTGCCCTGGAAACCTGAACTCATCCTCAGCCTGGCTCGCAGAGC
+CCTCATGCCGCTGGGACGTCACGTTATGTCCCTCTCCTGCTGGCCCGCTGCACCCAGCCG
+CACCACATGCCGGCCGCACCTCACACACGCTGGCCTCTTGGCCCTTCCTCGAACACACGG
+CGCTTGTCCTTGTTGTCCCCTCATCTTTGCATGGCCGATTTCTTTTTCTCATTCAGCTCT
+AAGTTTAAACTTTCAACAGTTCTAAGCGTATCACCTTCTTCATCTTAAAGTCCTCATCCT
+AAATCACACTGCACTGTTTTAACTCCCAGCTTGGCAGTCCAAACGGCCTCATTTATTCTG
+CTTTGTTTTCTCTCTCCTTCCACCAGAGGGAGAGCAGAGGCCTCCTCCAGCTCAGTCAGG
+CACCATCCCCCGGAACAGCGGTCTCCAACCTTTTTGGCACCAGGGACCGGTTTTGTGGAA
+GACGAGTTTTCCACAGACGGGGATGGGGCGTGGGATGACGGTTCGGGGATGAAACTCTTC
+CACTTCAGATCATCAGGCATTAGTCAGATGCTCCTAAGGAGCACACAACCTAGATCCCTC
+GCACACACAGTTCACAATAGGGTTTGCGCTCCTGTGAGAGTCTGAGGCCGCTGGCTGATC
+TGACAGGAGGCAGAGCTCAGGCGGTCATGCGAGCAATGGGGAGCAGCTGTACACACAGAT
+GAAGCTTCGCTCGCGTGCCCACCACTCGCCGCCTGCTCTGTGGCTCAGTTCTTAACAGGC
+CACAGACCAGTATCGGTCCGTGGCCGGGGGGTTGGGGACCCCTGCCCTAGAACGATGCTC
+GGCACAGCTACACTCGGTGCACATTTCTTGGTGGCCTGAGCTTGTCTTGAGCTCCATCAG
+GATTTCTCTGTCAATATTTTTGTAGACCATACCCTGTCACCATAAATGGTGCTTGGTAGA
+AATGACTGTATGCATGTGATAGGGATAACACAGAAAACCCATGGTAGAGAGCATGGCGAC
+TAACTTGGAGGGACTTCTTTTTGTCTTTTTTTTTTTTTTTTTTTTTTTTTTCGTTTTTGC
+AGAGATGCAGGTATTGTCATGTTGCCTGGGCTGGTCTCAAACTCCTGGGTTCAAGCAATC
+CTCCCACCTGTGCCTCCCTAAGTGTTGGGATTACAGGCATGAGCCGCTGCCTCCCCGCAG
+CATGGGGGGACTTCTTATAAGCATTTTTATTTATTTATTTGGAATTCGGATGGATAGATC
+ATGTTGGCCTGATAATGTATGTGCAAGAGCACAGACTCTGGGGTCTCATGGACAAGCCCC
+ACTCTGCCCCTTGCTTAATGTCTCTGAGCCTCAGTTTCCTCATCTGTAACAAGGGCAGAA
+ATGCCATCCGTGTGCAGCAGAGTACCGACGAAAGGATGGATGCAATGCTCTTACGGCAGT
+GCCCCACCCAAGGCACAAGGAAAAACCACGAAAGCAATTCCAGTTTACACCACGTCCCAA
+ACTGGATTTTCCATTCGACTATAATTCCTCCCAATAAGGGCTCAAAACGTTGTCTAAGGA
+AAGTCGATTGGTTGTTGCCCAGGGCTGGAGGAGTTGGGGGAAAACGGGGAGTGACTGCTG
+ATGGAGACAGAGTTTCCTTTTGAGATGATGAGAATGTCCTGAAGCTGGTCATGCTGATGG
+TTGCACGTATCTATGAATATACTAAAAACCATCGAATTGTACACTTTAGACATGTGAATT
+GTGTAGCATGTGAATTAAACCTCAAGTCGTTATTTTAAAAATGTTGTCTAAGACACCTGG
+GCGCCTCCTGTAAGCTCCCATGAAAATGCAAATGAAGACACAGAAGAAGACCAACATTGA
+ACATTTACATCCTCTCCTAAAGCTATGTTTGACAGGTGACCTTTTTTTCATCCTTCCTTC
+CTTCCTTCCTCTTCCTTATTCCTTCCTTCCTTACTTCACTCCCTCTTCCTTATTCCTTCC
+TTCCTTCTTTCCCTCTTCCTTATTCCTTCCTTCCTTCTTTCCCTCTTCCTTATTCCTTCC
+TTCCTTCTTTCCCTCTTCCTTATTCCTTTCTTCGTTCCTTCCTCCCTTCCTTCCTTCCCT
+CCTTCCTTTTTTCCTTATTCCTTCCTTTCCTCCCTTCCTCCCTCCCTCCCTCCCTCCCTC
+CCTCCCTCCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTACGTTTTTTAACAGTAGAGA
+CAGGGTCTCACTGTGTTGCCCAGGCTGGACTCGAACTCCTGGAAATCAAGTGATCCTCCC
+ACCTGGGCCTCCCCAAATGCTGGGATTACAGGCTTGAGAACTGGCAGGGAGAATGGGCGA
+CTTTGTGTTGTGAAGCAGAGTGACCCTACTCTGTCCTTTCCTCAGAGACCCAGGGTTGGT
+ACCAACCAGGAGACTAGCAGCTGCTCTTCTACCCGGCCCAGGTTGCTTTCTCTATAGTCA
+TTGAGAGGGAAACCCTCAAACAGAAAGTGTGTCAGCAGGGGGCAGGCAGCTGATGGATGT
+CATTGTACCCTTGGAGGAACAGAACTCAGCGGTCACAGGAAGCGGGGAGATCCAGGGATT
+CCATCGGTCGCGTTGGTTCCGAGTGTCGCAGGCCTGAGGGGAAGGGGTGAGATTTGCTTT
+CTCAGTCTAGGAGAGCCAAAGTCGACAAATGTGTGCTATAGCTGGAAATAAGTCTTCTCC
+CGCACCTAGCGGGCGGTGGTCAGGGTGGTCTGTCTGTCCATCCCCGAAAGCACTGCAAGA
+CCTGCCATCTGTTTCTGGTGATGCTGGGTGGTATTATTGAGACAGATATTTCTGCTGAAA
+ATAACTGCAAATGATAGATGAAACAGTTTGAGACTGTTCAAAACAGCTGAGAACCAACAA
+GAGAGCAAAACCTGAGAGGAAGTGGGAATCCAAGAAGAAGTACGCCACTGAAGCTGCTTA
+TTTCCCGAGGACGCTGGCCGACCTGGGCAAACTTGGGTTTTGGTTCTGGAGGCCGTCCAG
+GCAAGGGAGACAGCAGGCAAGGCCCAAAGCCTTTGTGAAGTGTGGGGAGTTGTATAGCAG
+CCCCCGCTACATTAAGGAAACCCCACCCCAGCTCTGCAGATTGCGGGGAAGCAGGAGATG
+AAAGGAAAGGGAAAACACCCCTGTGCAAAGTTGTGGCCAAGGGCCAGGTTCTGTGCAGAC
+TTGCAGTCTAGGTTTTCCTGGTGGGGCCGTTCAGTGGCCCTCGAGTCCTAAATTTGGTCT
+GAGGTGGTCCTGGGCCGACAGTACCCATATTAGTTTTCTAGGGCTGCCGTAAGAAATGGC
+CACAAACTGGGTGGCCGAAAACAACAGAAACGGATTCTCCCGTGGTTCTGGAAGCTACAA
+GTCTGGAATCCAGCTGTCAGCAGGGTTGGTTCCTACTGGAGGCTCTGTGGGAGAATCTGT
+TCCATGCCTCTCTCGTAGCTCCAGGTGGTTTCTGGCAATCCCTGGCACTCTCTGTCTCAT
+AGAGACACCGCTCCAATCTCTGCCTCCTTCTTCGTGGGGTCATCTTCCCAGTGTGTCTGC
+CTCTCTCTCTTATGAGGACACTAGTCACATTTGATTAGCGCCCACCCCAATCTAGTATGA
+CCTTAACTTGATTACATCGGCAAAGACCCTATTTCCAAATCAGGTCACATTCCCAGGCAC
+CCAGGTGTTAGGACTTGAACATACCTTTTTGGGGGACACAATTCAACCCACAACGGTTAC
+CCCAACGGATGTGTGGATTCGTGGGAGCCTGGGGTTGAGATGCAGTCTCCACAAAAGTTC
+CTGAGTGAGAGCATGTCCTTTGCAGGAGCGTGGATGGAGCTGGAGGCCATTATCCTTAGC
+AAACTGACGCAGGAACAGACAACCAAATACCACATGTTCTCACTTATAAGTGGGAGCTAA
+ATGATGAGAACACACGGATACATAGAGGGGAATGAAACACACTGGGGCCTTTTGCAGGGC
+GGAGGTTGGGAGGAGGGAGAGTATGAGGAAAAATAGCTCATGGGTGCTAGGCTTAATACC
+TAGGTGATAAAATAATCTGTACAGCAAACCCCCATGACACAAGTTTACTCATATAACAAA
+CCTGCACATGTACACCTGAGCTCTTAAAATAAAAAGAGCGATGTATTAATATAACAACTT
+TAGGAGCACTGGGGGAAAAAAGTTCTTAATGACACTGAGCAGTGCACATAAAAACTAAGC
+ACACCCAGGGCAGGAAAGAGACCATGAGACTAAGAGTTCAGATGCTGGAGGCTGGGTGCA
+GTGACTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCAGATCACAAGGTCA
+GGAGTTCTAGACCAGCCTGGCCAATATGGTGAAACCCCGTCTCTACTAAAAATACAAAAA
+TTAGCCGGGCGTGGTAGTGGGCGCCTGTACTCCCAGCTACTCGGGAGGCTGAGGCAGGAG
+AATCACTTGAACCTGGGAGGTGGAGGTTGCAGTGAGCCGAGATGGCACCATTGCACTCCA
+GCCTGGGTGACAGAGAGAGACTCAAAACAGATGCTGGAGCACCACACAGATGACAGAACA
+ACTGAACTTTAAAAACTTAAAAAAAAAAAAAAAGGCTTGTGAATATCCACAGGGAATAGG
+GAATATATATATCCAAAAAGTAAAAGCTGTCTAACTTTTTATTTCTTTGTTTGTTTGTTT
+ATTTATTTGAGACAGGGTCTCACTCTGTCACCCAGGCTGGAGTGTAGTGGCACGATCACA
+GCTCATTGCAGCCTCAACCTCCCGGGCTCAAGCGATCCTCCCGCCTCAGCCTCCCGAGTA
+GCCGGGACTACAGGCGCGTGC
diff --git a/tests/test_data/ikbkg_ref.fa.fai b/tests/test_data/ikbkg_ref.fa.fai
new file mode 100644
index 0000000..abbf24f
--- /dev/null
+++ b/tests/test_data/ikbkg_ref.fa.fai
@@ -0,0 +1 @@
+chrX_154555600_154575000 19401 26 60 61
diff --git a/tests/test_data/vcf/HG004.paraphase.json b/tests/test_data/vcf/HG004.paraphase.json
new file mode 100644
index 0000000..0d70961
--- /dev/null
+++ b/tests/test_data/vcf/HG004.paraphase.json
@@ -0,0 +1,994 @@
+{
+ "AMY2A": {
+ "total_cn": 4,
+ "gene_cn": null,
+ "final_haplotypes": {
+ "1111111111111111111111111111111": "AMY2A_hap1",
+ "0221211111111111122222222222222": "AMY2A_hap2",
+ "0222222222222222212222112222222": "AMY2A_hap3"
+ },
+ "two_copy_haplotypes": [
+ "AMY2A_hap1"
+ ],
+ "alleles_final": [],
+ "hap_links": {},
+ "highest_total_cn": 3,
+ "assembled_haplotypes": [
+ "1111111111111111111111111111111",
+ "0221211111111111122222222222222",
+ "0222222222222222212222112222222"
+ ],
+ "sites_for_phasing": [
+ "103619276_A_C",
+ "103619687_A_T",
+ "103619968_G_T",
+ "103620513_C_T",
+ "103620877_T_C",
+ "103623201_A_T",
+ "103623290_T_C",
+ "103623441_G_A",
+ "103623445_T_C",
+ "103623474_T_G",
+ "103623496_T_C",
+ "103623508_A_G",
+ "103623521_A_G",
+ "103623702_G_A",
+ "103623955_G_C",
+ "103624855_A_G",
+ "103624986_A_G",
+ "103625157_C_T",
+ "103628310_G_A",
+ "103628386_G_A",
+ "103629402_A_G",
+ "103629894_C_T",
+ "103630095_G_A",
+ "103630100_T_A",
+ "103630320_C_A",
+ "103630519_T_C",
+ "103630682_G_A",
+ "103630818_G_C",
+ "103630938_C_T",
+ "103630980_G_A",
+ "103631135_A_G"
+ ],
+ "unique_supporting_reads": {
+ "1111111111111111111111111111111": [
+ "m84010_220919_232145_s1/101914303/ccs",
+ "m84010_220919_232145_s1/143201723/ccs",
+ "m84010_220919_232145_s1/193335832/ccs",
+ "m84010_220919_232145_s1/150867304/ccs",
+ "m84010_220919_232145_s1/149422116/ccs",
+ "m84010_220919_232145_s1/44501758/ccs",
+ "m84010_220919_232145_s1/27331211/ccs",
+ "m84010_220919_232145_s1/128258766/ccs",
+ "m84010_220919_232145_s1/166532067/ccs",
+ "m84010_220919_232145_s1/207165008/ccs",
+ "m84010_220919_232145_s1/164697710/ccs",
+ "m84010_220919_232145_s1/254870861/ccs",
+ "m84010_220919_232145_s1/118035415/ccs",
+ "m84010_220919_232145_s1/235275715/ccs",
+ "m84010_220919_232145_s1/74059034/ccs",
+ "m84010_220919_232145_s1/80875866/ccs",
+ "m84010_220919_232145_s1/217453817/ccs",
+ "m84010_220919_232145_s1/137038769/ccs",
+ "m84010_220919_232145_s1/136516591/ccs",
+ "m84010_220919_232145_s1/217318813/ccs",
+ "m84010_220919_232145_s1/249367055/ccs",
+ "m84010_220919_232145_s1/180816751/ccs",
+ "m84010_220919_232145_s1/171445489/ccs",
+ "m84010_220919_232145_s1/196677792/ccs",
+ "m84010_220919_232145_s1/157025376/ccs",
+ "m84010_220919_232145_s1/176493282/ccs",
+ "m84010_220919_232145_s1/21956866/ccs",
+ "m84010_220919_232145_s1/18682974/ccs",
+ "m84010_220919_232145_s1/216269367/ccs",
+ "m84010_220919_232145_s1/169479399/ccs",
+ "m84010_220919_232145_s1/58721001/ccs",
+ "m84010_220919_232145_s1/52171480/ccs",
+ "m84010_220919_232145_s1/187045770/ccs",
+ "m84010_220919_232145_s1/253235773/ccs",
+ "m84010_220919_232145_s1/180683018/ccs",
+ "m84010_220919_232145_s1/110497773/ccs",
+ "m84010_220919_232145_s1/192221852/ccs",
+ "m84010_220919_232145_s1/149821339/ccs",
+ "m84010_220919_232145_s1/26544903/ccs",
+ "m84010_220919_232145_s1/248976641/ccs",
+ "m84010_220919_232145_s1/80549151/ccs",
+ "m84010_220919_232145_s1/12587534/ccs",
+ "m84010_220919_232145_s1/72552357/ccs",
+ "m84010_220919_232145_s1/14289207/ccs",
+ "m84010_220919_232145_s1/94573236/ccs",
+ "m84010_220919_232145_s1/240125323/ccs",
+ "m84010_220919_232145_s1/72026146/ccs",
+ "m84010_220919_232145_s1/237114866/ccs",
+ "m84010_220919_232145_s1/74253073/ccs",
+ "m84010_220919_232145_s1/136973382/ccs",
+ "m84010_220919_232145_s1/129631978/ccs",
+ "m84010_220919_232145_s1/105191321/ccs",
+ "m84010_220919_232145_s1/52823420/ccs",
+ "m84010_220919_232145_s1/14879985/ccs",
+ "m84010_220919_232145_s1/230884747/ccs",
+ "m84010_220919_232145_s1/159453965/ccs",
+ "m84010_220919_232145_s1/14813359/ccs",
+ "m84010_220919_232145_s1/87753411/ccs",
+ "m84010_220919_232145_s1/89784573/ccs",
+ "m84010_220919_232145_s1/104073493/ccs",
+ "m84010_220919_232145_s1/153424741/ccs",
+ "m84010_220919_232145_s1/192154775/ccs",
+ "m84010_220919_232145_s1/249301766/ccs",
+ "m84010_220919_232145_s1/193528961/ccs",
+ "m84010_220919_232145_s1/97125773/ccs",
+ "m84010_220919_232145_s1/136121712/ccs",
+ "m84010_220919_232145_s1/91820624/ccs",
+ "m84010_220919_232145_s1/27723729/ccs",
+ "m84010_220919_232145_s1/153753249/ccs",
+ "m84010_220919_232145_s1/39849936/ccs",
+ "m84010_220919_232145_s1/197265622/ccs",
+ "m84010_220919_232145_s1/99157892/ccs",
+ "m84010_220919_232145_s1/210568702/ccs",
+ "m84010_220919_232145_s1/27000903/ccs",
+ "m84010_220919_232145_s1/52498893/ccs",
+ "m84010_220919_232145_s1/56954229/ccs",
+ "m84010_220919_232145_s1/234558398/ccs",
+ "m84010_220919_232145_s1/176425930/ccs",
+ "m84010_220919_232145_s1/132711500/ccs"
+ ],
+ "0221211111111111122222222222222": [
+ "m84010_220919_232145_s1/155127370/ccs",
+ "m84010_220919_232145_s1/131141751/ccs",
+ "m84010_220919_232145_s1/48431977/ccs",
+ "m84010_220919_232145_s1/221385343/ccs",
+ "m84010_220919_232145_s1/89655025/ccs",
+ "m84010_220919_232145_s1/102501866/ccs",
+ "m84010_220919_232145_s1/158466184/ccs",
+ "m84010_220919_232145_s1/184815905/ccs",
+ "m84010_220919_232145_s1/102829642/ccs",
+ "m84010_220919_232145_s1/163252758/ccs",
+ "m84010_220919_232145_s1/218632779/ccs",
+ "m84010_220919_232145_s1/228329949/ccs",
+ "m84010_220919_232145_s1/70913209/ccs",
+ "m84010_220919_232145_s1/114887705/ccs",
+ "m84010_220919_232145_s1/27657150/ccs",
+ "m84010_220919_232145_s1/48957376/ccs",
+ "m84010_220919_232145_s1/141689061/ccs",
+ "m84010_220919_232145_s1/101649431/ccs",
+ "m84010_220919_232145_s1/233575979/ccs",
+ "m84010_220919_232145_s1/51057252/ccs",
+ "m84010_220919_232145_s1/208540836/ccs",
+ "m84010_220919_232145_s1/104468013/ccs",
+ "m84010_220919_232145_s1/92475988/ccs",
+ "m84010_220919_232145_s1/210043179/ccs",
+ "m84010_220919_232145_s1/67505189/ccs",
+ "m84010_220919_232145_s1/176753643/ccs",
+ "m84010_220919_232145_s1/131206958/ccs",
+ "m84010_220919_232145_s1/33947882/ccs",
+ "m84010_220919_232145_s1/139592182/ccs",
+ "m84010_220919_232145_s1/122687021/ccs"
+ ],
+ "0222222222222222212222112222222": [
+ "m84010_220919_232145_s1/129368388/ccs",
+ "m84010_220919_232145_s1/24317018/ccs",
+ "m84010_220919_232145_s1/198184134/ccs",
+ "m84010_220919_232145_s1/235212752/ccs",
+ "m84010_220919_232145_s1/90115184/ccs",
+ "m84010_220919_232145_s1/27201546/ccs",
+ "m84010_220919_232145_s1/200609047/ccs",
+ "m84010_220919_232145_s1/109445955/ccs",
+ "m84010_220919_232145_s1/263852387/ccs",
+ "m84010_220919_232145_s1/93590711/ccs",
+ "m84010_220919_232145_s1/160566292/ccs",
+ "m84010_220919_232145_s1/42601578/ccs",
+ "m84010_220919_232145_s1/256316517/ccs",
+ "m84010_220919_232145_s1/31195702/ccs",
+ "m84010_220919_232145_s1/67242785/ccs",
+ "m84010_220919_232145_s1/93850194/ccs",
+ "m84010_220919_232145_s1/234488088/ccs",
+ "m84010_220919_232145_s1/123077920/ccs",
+ "m84010_220919_232145_s1/133630596/ccs",
+ "m84010_220919_232145_s1/94311396/ccs",
+ "m84010_220919_232145_s1/205392489/ccs",
+ "m84010_220919_232145_s1/144509377/ccs",
+ "m84010_220919_232145_s1/117835491/ccs",
+ "m84010_220919_232145_s1/161617827/ccs",
+ "m84010_220919_232145_s1/40505388/ccs",
+ "m84010_220919_232145_s1/185862700/ccs",
+ "m84010_220919_232145_s1/239928816/ccs",
+ "m84010_220919_232145_s1/23986245/ccs",
+ "m84010_220919_232145_s1/71374259/ccs",
+ "m84010_220919_232145_s1/90441779/ccs",
+ "m84010_220919_232145_s1/114167967/ccs",
+ "m84010_220919_232145_s1/18875058/ccs",
+ "m84010_220919_232145_s1/32901645/ccs"
+ ]
+ },
+ "het_sites_not_used_in_phasing": [],
+ "homozygous_sites": [
+ "103621198_A_C"
+ ],
+ "haplotype_details": {
+ "AMY2A_hap1": {
+ "variants": [
+ "103621198_A_C"
+ ],
+ "boundary": [
+ 103616000,
+ 103631602
+ ],
+ "boundary_gene2": null,
+ "is_truncated": null
+ },
+ "AMY2A_hap2": {
+ "variants": [
+ "103619306_clip_5p",
+ "103619687_A_T",
+ "103619968_G_T",
+ "103620877_T_C",
+ "103621198_A_C",
+ "103625157_C_T",
+ "103628310_G_A",
+ "103628386_G_A",
+ "103629402_A_G",
+ "103629894_C_T",
+ "103630095_G_A",
+ "103630100_T_A",
+ "103630320_C_A",
+ "103630519_T_C",
+ "103630682_G_A",
+ "103630818_G_C",
+ "103630938_C_T",
+ "103630980_G_A",
+ "103631135_A_G"
+ ],
+ "boundary": [
+ 103619306,
+ 103631602
+ ],
+ "boundary_gene2": null,
+ "is_truncated": [
+ "5p"
+ ]
+ },
+ "AMY2A_hap3": {
+ "variants": [
+ "103619306_clip_5p",
+ "103619687_A_T",
+ "103619968_G_T",
+ "103620513_C_T",
+ "103620877_T_C",
+ "103621198_A_C",
+ "103623201_A_T",
+ "103623290_T_C",
+ "103623441_G_A",
+ "103623445_T_C",
+ "103623474_T_G",
+ "103623496_T_C",
+ "103623508_A_G",
+ "103623521_A_G",
+ "103623702_G_A",
+ "103623955_G_C",
+ "103624855_A_G",
+ "103624986_A_G",
+ "103628310_G_A",
+ "103628386_G_A",
+ "103629402_A_G",
+ "103629894_C_T",
+ "103630320_C_A",
+ "103630519_T_C",
+ "103630682_G_A",
+ "103630818_G_C",
+ "103630938_C_T",
+ "103630980_G_A",
+ "103631135_A_G"
+ ],
+ "boundary": [
+ 103619306,
+ 103631602
+ ],
+ "boundary_gene2": null,
+ "is_truncated": [
+ "5p"
+ ]
+ }
+ },
+ "nonunique_supporting_reads": {
+ "m84010_220919_232145_s1/37032652/ccs": [
+ "0221211111111111122222222222222",
+ "0222222222222222212222112222222"
+ ],
+ "m84010_220919_232145_s1/76153174/ccs": [
+ "0221211111111111122222222222222",
+ "0222222222222222212222112222222"
+ ],
+ "m84010_220919_232145_s1/164628172/ccs": [
+ "0221211111111111122222222222222",
+ "0222222222222222212222112222222"
+ ]
+ },
+ "read_details": {
+ "m84010_220919_232145_s1/105191321/ccs": "111111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/118035415/ccs": "111111111111111xxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/129631978/ccs": "111111111111111111111x111xxxxxx",
+ "m84010_220919_232145_s1/136516591/ccs": "1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/136973382/ccs": "1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/14813359/ccs": "111x111x1111111111x11xxxxxxxxxx",
+ "m84010_220919_232145_s1/157025376/ccs": "11111111111111111xxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/166532067/ccs": "111111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/169479399/ccs": "111111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/187045770/ccs": "111x1xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/192221852/ccs": "111xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/217318813/ccs": "11111xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/217453817/ccs": "11111xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/234558398/ccs": "11111xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/254870861/ccs": "11x111111111111xxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/27000903/ccs": "11111111111111111111xxxxxxxxxxx",
+ "m84010_220919_232145_s1/27723729/ccs": "11111xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/74059034/ccs": "111111111111111xxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/80875866/ccs": "1111xxxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/87753411/ccs": "11111111111111111111111111111xx",
+ "m84010_220919_232145_s1/235275715/ccs": "11111xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/176425930/ccs": "11xxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/89784573/ccs": "111xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/136121712/ccs": "111111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/18682974/ccs": "11111111111111111111111x1111111",
+ "m84010_220919_232145_s1/12587534/ccs": "111111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/74253073/ccs": "1111111111111111xxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/44501758/ccs": "11111xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/143201723/ccs": "111111xxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/91820624/ccs": "11111x1111111xxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/153753249/ccs": "111111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/14879985/ccs": "11111111111111111111xxxxxxxxxxx",
+ "m84010_220919_232145_s1/153424741/ccs": "11x1111111111x111111x1111111111",
+ "m84010_220919_232145_s1/26544903/ccs": "11x111xx1x11x11111xx1xx1xxxxxxx",
+ "m84010_220919_232145_s1/197265622/ccs": "111111111xxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/110497773/ccs": "111x111x11x1111x11xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/27331211/ccs": "111111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/94573236/ccs": "11x1x11x1xxxxx1111xx1x1xx1xx11x",
+ "m84010_220919_232145_s1/97125773/ccs": "1111111111111111111111111111111",
+ "m84010_220919_232145_s1/180816751/ccs": "1111111111111111111x1xx11111111",
+ "m84010_220919_232145_s1/237114866/ccs": "1111111111111111111111xxxxxxxxx",
+ "m84010_220919_232145_s1/149821339/ccs": "11111111111111xxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/216269367/ccs": "1x11x1111111111111x1111x11111x1",
+ "m84010_220919_232145_s1/196677792/ccs": "11x111111x111111111111111111111",
+ "m84010_220919_232145_s1/99157892/ccs": "11x111x1111x11x1111x1xxx1x1x11x",
+ "m84010_220919_232145_s1/21956866/ccs": "111111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/240125323/ccs": "1111111111111111111111111111111",
+ "m84010_220919_232145_s1/56954229/ccs": "111111111111111xxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/52498893/ccs": "1111111111111111111111111111111",
+ "m84010_220919_232145_s1/150867304/ccs": "111111111111111111x1111111111x1",
+ "m84010_220919_232145_s1/101914303/ccs": "1111111111111111111111111111111",
+ "m84010_220919_232145_s1/101649431/ccs": "0221211x1111x11112xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/114887705/ccs": "022121111111111112xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/117835491/ccs": "022xx222x2xxx22xx1x2xxx12x22222",
+ "m84010_220919_232145_s1/122687021/ccs": "0221x11111111111122222222222222",
+ "m84010_220919_232145_s1/123077920/ccs": "022222222222222xxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/129368388/ccs": "0222222222222222212xxxxxxxxxxxx",
+ "m84010_220919_232145_s1/131141751/ccs": "022x2111111111xxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/139592182/ccs": "02212111111111xxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/141689061/ccs": "0221211111111111122222222x22222",
+ "m84010_220919_232145_s1/158466184/ccs": "02212xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/160566292/ccs": "022222222222x22221xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/161617827/ccs": "022222222x222222212222112222x22",
+ "m84010_220919_232145_s1/163252758/ccs": "022121111111111112xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/176753643/ccs": "022121111111111112xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/205392489/ccs": "02222222222222222122xxxxxxxxxxx",
+ "m84010_220919_232145_s1/208540836/ccs": "0221211111111111122222222222222",
+ "m84010_220919_232145_s1/210043179/ccs": "02212x11111111111x2222222222222",
+ "m84010_220919_232145_s1/218632779/ccs": "022x211111111111xx222222222xxxx",
+ "m84010_220919_232145_s1/228329949/ccs": "02212111111111xxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/234488088/ccs": "02222222222xx222212222xxxxxxxxx",
+ "m84010_220919_232145_s1/235212752/ccs": "02222xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/23986245/ccs": "0222222222222xxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/256316517/ccs": "022222222222x22221xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/27201546/ccs": "02x2222222222222212222112222222",
+ "m84010_220919_232145_s1/27657150/ccs": "0221211111111111122x22222222222",
+ "m84010_220919_232145_s1/31195702/ccs": "02x22222222222222122221x2222x22",
+ "m84010_220919_232145_s1/32901645/ccs": "02x2xxxxx2xxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/33947882/ccs": "0221211111111111122222222222222",
+ "m84010_220919_232145_s1/48957376/ccs": "02212xxxxxxxxxxxxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/51057252/ccs": "022121111111111xxxxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/67242785/ccs": "022222222222222221xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/70913209/ccs": "0221211111111111122222222222222",
+ "m84010_220919_232145_s1/71374259/ccs": "022222222222222221xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/89655025/ccs": "022121111111111112xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/90115184/ccs": "0222222xxxxxx2222xxxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/92475988/ccs": "0221211111111111122222xxxxxxxxx",
+ "m84010_220919_232145_s1/94311396/ccs": "022222222222222221222211222x222",
+ "m84010_220919_232145_s1/90441779/ccs": "022222222x2x2222x1x22211x222222",
+ "m84010_220919_232145_s1/159453965/ccs": "x111111111111111111111111111111",
+ "m84010_220919_232145_s1/180683018/ccs": "xx11111111111111111111111111111",
+ "m84010_220919_232145_s1/104468013/ccs": "0xx12x1x111111111222222xx222x2x",
+ "m84010_220919_232145_s1/14289207/ccs": "xxx111111111111111xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/184815905/ccs": "xxx1211111111111122222222222222",
+ "m84010_220919_232145_s1/67505189/ccs": "xxx121111111111112xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/133630596/ccs": "xxx2222222222222212222112222222",
+ "m84010_220919_232145_s1/185862700/ccs": "xxx222x22x22222xxxx22x1xx22xx22",
+ "m84010_220919_232145_s1/52171480/ccs": "xxxx11111111111111111111x111111",
+ "m84010_220919_232145_s1/193528961/ccs": "xxxxx11111111111111111111111111",
+ "m84010_220919_232145_s1/109445955/ccs": "xxxxx2222222222221xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/230884747/ccs": "xxxxx1111111111111111111xxxxxxx",
+ "m84010_220919_232145_s1/93850194/ccs": "xxxxx2222x22222221222211x222222",
+ "m84010_220919_232145_s1/144509377/ccs": "xxxxx22222222222212222112222222",
+ "m84010_220919_232145_s1/249367055/ccs": "xxxxx11111111111111111111111111",
+ "m84010_220919_232145_s1/93590711/ccs": "xxxxx2222222222221xxxxxxxxxxxxx",
+ "m84010_220919_232145_s1/164697710/ccs": "xxxxx1111111111111111111x111111",
+ "m84010_220919_232145_s1/72026146/ccs": "xxxxx1111x11111111x1111111x11x1",
+ "m84010_220919_232145_s1/249301766/ccs": "xxxxxx1111111111111111111111111",
+ "m84010_220919_232145_s1/102829642/ccs": "xxxxxxxx1111x111122222222222222",
+ "m84010_220919_232145_s1/42601578/ccs": "xxxxxxxxxxxxxxx2212222112222222",
+ "m84010_220919_232145_s1/104073493/ccs": "xxxxxxxxxxxxxxx1111111111111111",
+ "m84010_220919_232145_s1/72552357/ccs": "xxxxxxxxxxxxxxx111x1111x1111111",
+ "m84010_220919_232145_s1/192154775/ccs": "xxxxxxxxxxxxxxx11111xxxxxxxxxxx",
+ "m84010_220919_232145_s1/200609047/ccs": "xxxxxxxxxxxxxxx2212222112222222",
+ "m84010_220919_232145_s1/233575979/ccs": "xxxxxxxxxxxxxxx1122222222222222",
+ "m84010_220919_232145_s1/131206958/ccs": "xxxxxxxxxxxxxxx11x2222222222222",
+ "m84010_220919_232145_s1/193335832/ccs": "xxxxxxxxxxxxxxxxx11111111111111",
+ "m84010_220919_232145_s1/221385343/ccs": "xxxxxxxxxxxxxxxxx222222x2x22222",
+ "m84010_220919_232145_s1/39849936/ccs": "xxxxxxxxxxxxxxxxxx1111111111111",
+ "m84010_220919_232145_s1/114167967/ccs": "xxxxxxxxxxxxxxxxxx2222112222222",
+ "m84010_220919_232145_s1/48431977/ccs": "xxxxxxxxxxxxxxxxxx2222222222222",
+ "m84010_220919_232145_s1/52823420/ccs": "xxxxxxxxxxxxxxxxxx1111111111111",
+ "m84010_220919_232145_s1/102501866/ccs": "xxxxxxxxxxxxxxxxxx222222x222222",
+ "m84010_220919_232145_s1/149422116/ccs": "xxxxxxxxxxxxxxxxxx1111111111111",
+ "m84010_220919_232145_s1/207165008/ccs": "xxxxxxxxxxxxxxxxxx1111111111111",
+ "m84010_220919_232145_s1/239928816/ccs": "xxxxxxxxxxxxxxxxxx2222112222222",
+ "m84010_220919_232145_s1/80549151/ccs": "xxxxxxxxxxxxxxxxxx1111111111111",
+ "m84010_220919_232145_s1/155127370/ccs": "xxxxxxxxxxxxxxxxxx2222222222222",
+ "m84010_220919_232145_s1/18875058/ccs": "xxxxxxxxxxxxxxxxxx222x1xx22x222",
+ "m84010_220919_232145_s1/263852387/ccs": "xxxxxxxxxxxxxxxxxx2222112222222",
+ "m84010_220919_232145_s1/248976641/ccs": "xxxxxxxxxxxxxxxxxx1111111111111",
+ "m84010_220919_232145_s1/253235773/ccs": "xxxxxxxxxxxxxxxxxx1111x11111111",
+ "m84010_220919_232145_s1/198184134/ccs": "xxxxxxxxxxxxxxxxxx2222112222222",
+ "m84010_220919_232145_s1/58721001/ccs": "xxxxxxxxxxxxxxxxxx111x111111111",
+ "m84010_220919_232145_s1/137038769/ccs": "xxxxxxxxxxxxxxxxxx1111111111xx1",
+ "m84010_220919_232145_s1/176493282/ccs": "xxxxxxxxxxxxxxxxxxx111x111x1111",
+ "m84010_220919_232145_s1/171445489/ccs": "xxxxxxxxxxxxxxxxxxxx11111111111",
+ "m84010_220919_232145_s1/24317018/ccs": "xxxxxxxxxxxxxxxxxxxx221xx222222",
+ "m84010_220919_232145_s1/210568702/ccs": "xxxxxxxxxxxxxxxxxxxx11111111111",
+ "m84010_220919_232145_s1/40505388/ccs": "xxxxxxxxxxxxxxxxxxxxx2112222222",
+ "m84010_220919_232145_s1/128258766/ccs": "xxxxxxxxxxxxxxxxxxxxxxxx1111111",
+ "m84010_220919_232145_s1/37032652/ccs": "xxxxxxxxxxxxxxxxxxxxxxxx2222222",
+ "m84010_220919_232145_s1/76153174/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx222222",
+ "m84010_220919_232145_s1/132711500/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx111111",
+ "m84010_220919_232145_s1/164628172/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxxxx2x22"
+ },
+ "genome_depth": 34.0,
+ "region_depth": {
+ "median": 83.0,
+ "percentile80": 91.0
+ },
+ "sample_sex": "female",
+ "fusions_called": null
+ },
+ "ARL17A": {
+ "total_cn": 2,
+ "gene_cn": null,
+ "final_haplotypes": {},
+ "two_copy_haplotypes": [],
+ "alleles_final": [],
+ "hap_links": {},
+ "highest_total_cn": 0,
+ "assembled_haplotypes": [],
+ "sites_for_phasing": [],
+ "unique_supporting_reads": {},
+ "het_sites_not_used_in_phasing": [
+ "46562251_A_T"
+ ],
+ "homozygous_sites": [
+ "46552316_CTTG_C",
+ "46552740_C_T",
+ "46553266_C_T",
+ "46553269_A_G",
+ "46554348_G_A",
+ "46554359_G_A",
+ "46554408_T_C",
+ "46555174_T_C",
+ "46555687_C_A",
+ "46556500_G_A",
+ "46559353_C_T",
+ "46561354_A_G",
+ "46562490_G_A",
+ "46563229_C_A",
+ "46563231_A_T",
+ "46563822_C_T",
+ "46564629_G_A",
+ "46565883_T_C",
+ "46566783_C_T",
+ "46568816_G_A",
+ "46569732_C_T",
+ "46569893_A_G",
+ "46571000_C_G",
+ "46571631_A_G",
+ "46571717_C_T",
+ "46572286_A_T",
+ "46572609_A_G",
+ "46572772_C_T",
+ "46572879_CAAGAAAGAA_C",
+ "46572979_G_A",
+ "46572988_AG_A",
+ "46573447_CAAA_C",
+ "46573920_G_A",
+ "46573927_A_G",
+ "46574067_C_G",
+ "46574418_C_G",
+ "46574982_C_G",
+ "46575195_A_G",
+ "46575285_G_C",
+ "46576563_A_T",
+ "46576641_T_C",
+ "46577000_T_C",
+ "46577050_A_C",
+ "46579335_A_G",
+ "46579462_G_C",
+ "46579788_C_T"
+ ],
+ "haplotype_details": null,
+ "nonunique_supporting_reads": {},
+ "read_details": {},
+ "genome_depth": 34.0,
+ "region_depth": {
+ "median": 26.0,
+ "percentile80": 28.0
+ },
+ "sample_sex": "female",
+ "fusions_called": null
+ },
+ "ikbkg": {
+ "total_cn": 4,
+ "gene_cn": 2,
+ "final_haplotypes": {
+ "1111111111111111111111221221": "ikbkg_hap1",
+ "1111111111111111111111222111": "ikbkg_hap2",
+ "2222222222222222222222333311": "ikbkg_pseudo_hap1",
+ "2222222222222222222222111211": "ikbkg_pseudo_hap2"
+ },
+ "deletion_haplotypes": [
+ "ikbkg_pseudo_hap1"
+ ],
+ "del_read_number": 18,
+ "two_copy_haplotypes": [],
+ "alleles_final": [],
+ "hap_links": {},
+ "highest_total_cn": 4,
+ "assembled_haplotypes": [
+ "1111111111111111111111221221",
+ "1111111111111111111111222111",
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "sites_for_phasing": [
+ "154555705_A_G",
+ "154555717_T_C",
+ "154555720_C_T",
+ "154555724_C_T",
+ "154555726_T_C",
+ "154555727_G_C",
+ "154555728_T_C",
+ "154555737_A_G",
+ "154555740_G_C",
+ "154555752_G_A",
+ "154555761_T_G",
+ "154555785_G_A",
+ "154555790_A_G",
+ "154555792_C_A",
+ "154555799_G_C",
+ "154555800_G_A",
+ "154555802_G_T",
+ "154555823_A_G",
+ "154555831_T_C",
+ "154555846_C_T",
+ "154555860_A_G",
+ "154555882_C_G",
+ "154559514_G_A",
+ "154560134_G_A",
+ "154562256_C_T",
+ "154567239_C_A",
+ "154569374_G_A",
+ "154569800_T_G"
+ ],
+ "unique_supporting_reads": {
+ "1111111111111111111111221221": [
+ "m84010_220919_232145_s1/159323328/ccs",
+ "m84010_220919_232145_s1/31854821/ccs",
+ "m84010_220919_232145_s1/98703109/ccs",
+ "m84010_220919_232145_s1/223941696/ccs",
+ "m84010_220919_232145_s1/249172079/ccs",
+ "m84010_220919_232145_s1/148313002/ccs",
+ "m84010_220919_232145_s1/208667294/ccs",
+ "m84010_220919_232145_s1/114102389/ccs",
+ "m84010_220919_232145_s1/119344939/ccs",
+ "m84010_220919_232145_s1/148964496/ccs",
+ "m84010_220919_232145_s1/266077670/ccs",
+ "m84010_220919_232145_s1/18615736/ccs"
+ ],
+ "1111111111111111111111222111": [
+ "m84010_220919_232145_s1/46012277/ccs",
+ "m84010_220919_232145_s1/239146397/ccs",
+ "m84010_220919_232145_s1/23859927/ccs",
+ "m84010_220919_232145_s1/97063602/ccs",
+ "m84010_220919_232145_s1/12914060/ccs",
+ "m84010_220919_232145_s1/159647632/ccs",
+ "m84010_220919_232145_s1/145555998/ccs",
+ "m84010_220919_232145_s1/82645192/ccs",
+ "m84010_220919_232145_s1/86447936/ccs",
+ "m84010_220919_232145_s1/165610265/ccs",
+ "m84010_220919_232145_s1/140182781/ccs",
+ "m84010_220919_232145_s1/60884695/ccs"
+ ],
+ "2222222222222222222222333311": [
+ "m84010_220919_232145_s1/41418819/ccs",
+ "m84010_220919_232145_s1/41418819/ccs_sup_154555611_2403",
+ "m84010_220919_232145_s1/138350101/ccs_sup_154555611_2403",
+ "m84010_220919_232145_s1/137434300/ccs",
+ "m84010_220919_232145_s1/107154013/ccs",
+ "m84010_220919_232145_s1/30670966/ccs",
+ "m84010_220919_232145_s1/30670966/ccs_sup_154555611_2403",
+ "m84010_220919_232145_s1/135795990/ccs",
+ "m84010_220919_232145_s1/138350101/ccs",
+ "m84010_220919_232145_s1/208339708/ccs_sup_154555611_2403",
+ "m84010_220919_232145_s1/208339708/ccs",
+ "m84010_220919_232145_s1/137434300/ccs_sup_154555611_2403",
+ "m84010_220919_232145_s1/107154013/ccs_sup_154568819_1567",
+ "m84010_220919_232145_s1/103877581/ccs",
+ "m84010_220919_232145_s1/135795990/ccs_sup_154555611_2403",
+ "m84010_220919_232145_s1/18153884/ccs",
+ "m84010_220919_232145_s1/18153884/ccs_sup_154555611_2403",
+ "m84010_220919_232145_s1/103877581/ccs_sup_154555611_2403"
+ ],
+ "2222222222222222222222111211": [
+ "m84010_220919_232145_s1/87231062/ccs",
+ "m84010_220919_232145_s1/38076833/ccs",
+ "m84010_220919_232145_s1/98700600/ccs",
+ "m84010_220919_232145_s1/198316098/ccs",
+ "m84010_220919_232145_s1/253564840/ccs",
+ "m84010_220919_232145_s1/184946835/ccs",
+ "m84010_220919_232145_s1/119083194/ccs",
+ "m84010_220919_232145_s1/125703228/ccs",
+ "m84010_220919_232145_s1/187700047/ccs",
+ "m84010_220919_232145_s1/92804704/ccs",
+ "m84010_220919_232145_s1/163909487/ccs",
+ "m84010_220919_232145_s1/233181539/ccs",
+ "m84010_220919_232145_s1/212077444/ccs",
+ "m84010_220919_232145_s1/260508118/ccs",
+ "m84010_220919_232145_s1/202638903/ccs",
+ "m84010_220919_232145_s1/145228008/ccs",
+ "m84010_220919_232145_s1/109975239/ccs",
+ "m84010_220919_232145_s1/250217560/ccs",
+ "m84010_220919_232145_s1/99946895/ccs",
+ "m84010_220919_232145_s1/237047588/ccs",
+ "m84010_220919_232145_s1/245959299/ccs",
+ "m84010_220919_232145_s1/162595150/ccs",
+ "m84010_220919_232145_s1/12325446/ccs"
+ ]
+ },
+ "het_sites_not_used_in_phasing": [
+ "154555814_AGG_A"
+ ],
+ "homozygous_sites": [
+ "154558768_G_C",
+ "154563953_C_T",
+ "154564829_C_A",
+ "154564863_C_T",
+ "154566050_G_C",
+ "154566528_G_A",
+ "154566738_T_C",
+ "154567828_A_G",
+ "154568154_G_A"
+ ],
+ "haplotype_details": {
+ "ikbkg_hap1": {
+ "variants": [
+ "154558768_G_C",
+ "154559514_G_A",
+ "154560134_G_A",
+ "154563953_C_T",
+ "154564829_C_A",
+ "154564863_C_T",
+ "154566050_G_C",
+ "154566528_G_A",
+ "154566738_T_C",
+ "154567239_C_A",
+ "154567828_A_G",
+ "154568154_G_A",
+ "154569374_G_A"
+ ],
+ "boundary": [
+ 154555700,
+ 154569698
+ ],
+ "boundary_gene2": [
+ 154634732,
+ 154648767
+ ],
+ "is_truncated": null
+ },
+ "ikbkg_hap2": {
+ "variants": [
+ "154558768_G_C",
+ "154559514_G_A",
+ "154560134_G_A",
+ "154562256_C_T",
+ "154563953_C_T",
+ "154564829_C_A",
+ "154564863_C_T",
+ "154566050_G_C",
+ "154566528_G_A",
+ "154566738_T_C",
+ "154567828_A_G",
+ "154568154_G_A"
+ ],
+ "boundary": [
+ 154555700,
+ 154569698
+ ],
+ "boundary_gene2": [
+ 154634732,
+ 154648767
+ ],
+ "is_truncated": null
+ },
+ "ikbkg_pseudo_hap1": {
+ "variants": [
+ "154555705_A_G",
+ "154555717_T_C",
+ "154555720_C_T",
+ "154555724_C_T",
+ "154555726_T_C",
+ "154555727_G_C",
+ "154555728_T_C",
+ "154555737_A_G",
+ "154555740_G_C",
+ "154555752_G_A",
+ "154555761_T_G",
+ "154555785_G_A",
+ "154555790_A_G",
+ "154555792_C_A",
+ "154555799_G_C",
+ "154555800_G_A",
+ "154555802_G_T",
+ "154555814_AGG_A",
+ "154555823_A_G",
+ "154555831_T_C",
+ "154555846_C_T",
+ "154555860_A_G",
+ "154555882_C_G",
+ "154558014_del_11700"
+ ],
+ "boundary": [
+ 154555700,
+ 154569698
+ ],
+ "boundary_gene2": [
+ 154634732,
+ 154648767
+ ],
+ "is_truncated": null
+ },
+ "ikbkg_pseudo_hap2": {
+ "variants": [
+ "154555705_A_G",
+ "154555717_T_C",
+ "154555720_C_T",
+ "154555724_C_T",
+ "154555726_T_C",
+ "154555727_G_C",
+ "154555728_T_C",
+ "154555737_A_G",
+ "154555740_G_C",
+ "154555752_G_A",
+ "154555761_T_G",
+ "154555785_G_A",
+ "154555790_A_G",
+ "154555792_C_A",
+ "154555799_G_C",
+ "154555800_G_A",
+ "154555802_G_T",
+ "154555814_AGG_A",
+ "154555823_A_G",
+ "154555831_T_C",
+ "154555846_C_T",
+ "154555860_A_G",
+ "154555882_C_G",
+ "154558768_G_C",
+ "154563953_C_T",
+ "154564829_C_A",
+ "154564863_C_T",
+ "154566050_G_C",
+ "154566528_G_A",
+ "154566738_T_C",
+ "154567239_C_A",
+ "154567828_A_G",
+ "154568154_G_A"
+ ],
+ "boundary": [
+ 154555700,
+ 154569698
+ ],
+ "boundary_gene2": [
+ 154634732,
+ 154648767
+ ],
+ "is_truncated": null
+ }
+ },
+ "nonunique_supporting_reads": {
+ "m84010_220919_232145_s1/115606361/ccs": [
+ "1111111111111111111111221221",
+ "1111111111111111111111222111"
+ ],
+ "m84010_220919_232145_s1/199297735/ccs": [
+ "1111111111111111111111221221",
+ "1111111111111111111111222111"
+ ],
+ "m84010_220919_232145_s1/257558192/ccs": [
+ "1111111111111111111111221221",
+ "1111111111111111111111222111"
+ ],
+ "m84010_220919_232145_s1/70254885/ccs": [
+ "1111111111111111111111221221",
+ "1111111111111111111111222111"
+ ],
+ "m84010_220919_232145_s1/70452363/ccs": [
+ "1111111111111111111111221221",
+ "1111111111111111111111222111"
+ ],
+ "m84010_220919_232145_s1/247401577/ccs": [
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/39387817/ccs": [
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/80418606/ccs": [
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/30278720/ccs": [
+ "1111111111111111111111221221",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/250678442/ccs": [
+ "1111111111111111111111221221",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/103551685/ccs": [
+ "1111111111111111111111221221",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/99290519/ccs": [
+ "1111111111111111111111221221",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/130941687/ccs": [
+ "1111111111111111111111222111",
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/206376270/ccs": [
+ "1111111111111111111111222111",
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/46072113/ccs": [
+ "1111111111111111111111222111",
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/109450413/ccs": [
+ "1111111111111111111111222111",
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/212079593/ccs": [
+ "1111111111111111111111222111",
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/225580484/ccs": [
+ "1111111111111111111111221221",
+ "1111111111111111111111222111",
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ],
+ "m84010_220919_232145_s1/22222503/ccs": [
+ "1111111111111111111111221221",
+ "1111111111111111111111222111",
+ "2222222222222222222222333311",
+ "2222222222222222222222111211"
+ ]
+ },
+ "read_details": {
+ "m84010_220919_232145_s1/115606361/ccs": "1111111111111111111111xxxxxx",
+ "m84010_220919_232145_s1/119344939/ccs": "1111111111111111111111221221",
+ "m84010_220919_232145_s1/145555998/ccs": "1x111x11x11111x111111xx22xxx",
+ "m84010_220919_232145_s1/148964496/ccs": "1111111111111111111111221xxx",
+ "m84010_220919_232145_s1/159647632/ccs": "1111111111111111111111222xxx",
+ "m84010_220919_232145_s1/165610265/ccs": "1111111111111111111111222111",
+ "m84010_220919_232145_s1/199297735/ccs": "111x111111x1111111111122xxxx",
+ "m84010_220919_232145_s1/208667294/ccs": "11111111xxxx11x1xx11112212xx",
+ "m84010_220919_232145_s1/223941696/ccs": "111xxx1111111111111111221xxx",
+ "m84010_220919_232145_s1/23859927/ccs": "1111111111111111111111222xxx",
+ "m84010_220919_232145_s1/257558192/ccs": "11111x11111111111111112xxxxx",
+ "m84010_220919_232145_s1/31854821/ccs": "11111111111111111111112212xx",
+ "m84010_220919_232145_s1/46012277/ccs": "11111111x1111111111x11222111",
+ "m84010_220919_232145_s1/70254885/ccs": "111111xx1x11111x11xx1xx2xxxx",
+ "m84010_220919_232145_s1/70452363/ccs": "111111111111111111111122xxxx",
+ "m84010_220919_232145_s1/135795990/ccs_sup_154555611_2403": "222222222222x2222222223333xx",
+ "m84010_220919_232145_s1/135795990/ccs": "222222222222x222222222333311",
+ "m84010_220919_232145_s1/137434300/ccs_sup_154555611_2403": "22222222222222222222223333xx",
+ "m84010_220919_232145_s1/137434300/ccs": "2222222222222222222222333311",
+ "m84010_220919_232145_s1/41418819/ccs_sup_154555611_2403": "2222x222x222x22222x22x3333xx",
+ "m84010_220919_232145_s1/41418819/ccs": "2222x222x222x22222x22x333311",
+ "m84010_220919_232145_s1/107154013/ccs": "222x22x22222222222222x333311",
+ "m84010_220919_232145_s1/109975239/ccs": "222222222222222222222211xxxx",
+ "m84010_220919_232145_s1/138350101/ccs_sup_154555611_2403": "222222x2x222222222x2223333xx",
+ "m84010_220919_232145_s1/138350101/ccs": "222222x2x222222222x222333311",
+ "m84010_220919_232145_s1/145228008/ccs": "222222x222222222222222111xxx",
+ "m84010_220919_232145_s1/163909487/ccs": "222222x2222222222222x21112xx",
+ "m84010_220919_232145_s1/18153884/ccs_sup_154555611_2403": "22222222222222222222223333xx",
+ "m84010_220919_232145_s1/18153884/ccs": "2222222222222222222222333311",
+ "m84010_220919_232145_s1/187700047/ccs": "222222222222222222222211xxxx",
+ "m84010_220919_232145_s1/198316098/ccs": "222222x222x2222222222211xxxx",
+ "m84010_220919_232145_s1/208339708/ccs_sup_154555611_2403": "22x222x2xx22222222222x3333xx",
+ "m84010_220919_232145_s1/208339708/ccs": "22x222x2xx22222222222x333311",
+ "m84010_220919_232145_s1/237047588/ccs": "2222222222222222222222111xxx",
+ "m84010_220919_232145_s1/245959299/ccs": "222222222222222222222211xxxx",
+ "m84010_220919_232145_s1/247401577/ccs": "2222222222222222222222xxxxxx",
+ "m84010_220919_232145_s1/250217560/ccs": "22222222222222222222221112xx",
+ "m84010_220919_232145_s1/253564840/ccs": "2222222222222222222222111211",
+ "m84010_220919_232145_s1/30670966/ccs_sup_154555611_2403": "22222222222222222222223333xx",
+ "m84010_220919_232145_s1/30670966/ccs": "2222222222222222222222333311",
+ "m84010_220919_232145_s1/39387817/ccs": "2222222222222222222222xxxxxx",
+ "m84010_220919_232145_s1/80418606/ccs": "2222222222222222222222xxxxxx",
+ "m84010_220919_232145_s1/103877581/ccs_sup_154555611_2403": "x222222xx22222222222223333xx",
+ "m84010_220919_232145_s1/103877581/ccs": "x222222xx2222222222222333311",
+ "m84010_220919_232145_s1/260508118/ccs": "xxxxxxxxxxxxxxxxxxxxxx111211",
+ "m84010_220919_232145_s1/239146397/ccs": "xxxxxxxxxxxxxxxxxxxxxx222xxx",
+ "m84010_220919_232145_s1/86447936/ccs": "xxxxxxxxxxxxxxxxxxxxxx222xxx",
+ "m84010_220919_232145_s1/12325446/ccs": "xxxxxxxxxxxxxxxxxxxxxx111211",
+ "m84010_220919_232145_s1/38076833/ccs": "xxxxxxxxxxxxxxxxxxxxxx11xxxx",
+ "m84010_220919_232145_s1/114102389/ccs": "xxxxxxxxxxxxxxxxxxxxxx221221",
+ "m84010_220919_232145_s1/249172079/ccs": "xxxxxxxxxxxxxxxxxxxxxx221221",
+ "m84010_220919_232145_s1/119083194/ccs": "xxxxxxxxxxxxxxxxxxxxxx1xxx11",
+ "m84010_220919_232145_s1/12914060/ccs": "xxxxxxxxxxxxxxxxxxxxxx222111",
+ "m84010_220919_232145_s1/125703228/ccs": "xxxxxxxxxxxxxxxxxxxxxxx11xxx",
+ "m84010_220919_232145_s1/140182781/ccs": "xxxxxxxxxxxxxxxxxxxxxxxx2111",
+ "m84010_220919_232145_s1/92804704/ccs": "xxxxxxxxxxxxxxxxxxxxxxxx1211",
+ "m84010_220919_232145_s1/233181539/ccs": "xxxxxxxxxxxxxxxxxxxxxxxx1211",
+ "m84010_220919_232145_s1/82645192/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx111",
+ "m84010_220919_232145_s1/18615736/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx22x",
+ "m84010_220919_232145_s1/162595150/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx211",
+ "m84010_220919_232145_s1/60884695/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx111",
+ "m84010_220919_232145_s1/266077670/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx221",
+ "m84010_220919_232145_s1/98700600/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx211",
+ "m84010_220919_232145_s1/30278720/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx2xx",
+ "m84010_220919_232145_s1/99946895/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx211",
+ "m84010_220919_232145_s1/159323328/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx221",
+ "m84010_220919_232145_s1/250678442/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx2xx",
+ "m84010_220919_232145_s1/97063602/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx111",
+ "m84010_220919_232145_s1/212077444/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx21x",
+ "m84010_220919_232145_s1/148313002/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx221",
+ "m84010_220919_232145_s1/202638903/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx211",
+ "m84010_220919_232145_s1/103551685/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx2x1",
+ "m84010_220919_232145_s1/184946835/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx211",
+ "m84010_220919_232145_s1/98703109/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx221",
+ "m84010_220919_232145_s1/99290519/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx2x1",
+ "m84010_220919_232145_s1/87231062/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxx211",
+ "m84010_220919_232145_s1/130941687/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxxx11",
+ "m84010_220919_232145_s1/206376270/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxxx11",
+ "m84010_220919_232145_s1/46072113/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxxx11",
+ "m84010_220919_232145_s1/109450413/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxxx11",
+ "m84010_220919_232145_s1/212079593/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxxx11",
+ "m84010_220919_232145_s1/107154013/ccs_sup_154568819_1567": "xxxxxxxxxxxxxxxxxxxxxx333311",
+ "m84010_220919_232145_s1/225580484/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxxxx1",
+ "m84010_220919_232145_s1/22222503/ccs": "xxxxxxxxxxxxxxxxxxxxxxxxxxx1"
+ },
+ "genome_depth": 34.0,
+ "region_depth": {
+ "median": 32.0,
+ "percentile80": 37.0
+ },
+ "sample_sex": "female",
+ "fusions_called": null
+ }
+}
\ No newline at end of file
diff --git a/tests/test_data/vcf/HG004_AMY2A_realigned_tagged.bam b/tests/test_data/vcf/HG004_AMY2A_realigned_tagged.bam
new file mode 100644
index 0000000..bf6855d
Binary files /dev/null and b/tests/test_data/vcf/HG004_AMY2A_realigned_tagged.bam differ
diff --git a/tests/test_data/vcf/HG004_AMY2A_realigned_tagged.bam.bai b/tests/test_data/vcf/HG004_AMY2A_realigned_tagged.bam.bai
new file mode 100644
index 0000000..6283036
Binary files /dev/null and b/tests/test_data/vcf/HG004_AMY2A_realigned_tagged.bam.bai differ
diff --git a/tests/test_data/vcf/HG004_ARL17A_realigned_tagged.bam b/tests/test_data/vcf/HG004_ARL17A_realigned_tagged.bam
new file mode 100644
index 0000000..135d19a
Binary files /dev/null and b/tests/test_data/vcf/HG004_ARL17A_realigned_tagged.bam differ
diff --git a/tests/test_data/vcf/HG004_ARL17A_realigned_tagged.bam.bai b/tests/test_data/vcf/HG004_ARL17A_realigned_tagged.bam.bai
new file mode 100644
index 0000000..5289dbe
Binary files /dev/null and b/tests/test_data/vcf/HG004_ARL17A_realigned_tagged.bam.bai differ
diff --git a/tests/test_data/vcf/HG004_ikbkg_realigned_tagged.bam b/tests/test_data/vcf/HG004_ikbkg_realigned_tagged.bam
new file mode 100644
index 0000000..577c6c6
Binary files /dev/null and b/tests/test_data/vcf/HG004_ikbkg_realigned_tagged.bam differ
diff --git a/tests/test_data/vcf/HG004_ikbkg_realigned_tagged.bam.bai b/tests/test_data/vcf/HG004_ikbkg_realigned_tagged.bam.bai
new file mode 100644
index 0000000..76c3911
Binary files /dev/null and b/tests/test_data/vcf/HG004_ikbkg_realigned_tagged.bam.bai differ
diff --git a/tests/test_f8_phaser.py b/tests/test_f8_phaser.py
index b8f0511..9933193 100644
--- a/tests/test_f8_phaser.py
+++ b/tests/test_f8_phaser.py
@@ -7,7 +7,7 @@
class TestF8Phaser(object):
cur_dir = os.path.dirname(__file__)
- sample_dir = os.path.join(cur_dir, "test_data")
+ sample_dir = os.path.join(cur_dir, "test_data", "f8")
def test_inversion(self):
sample_id = "inv"
@@ -18,7 +18,7 @@ def test_inversion(self):
config = update_config("f8")
phaser.set_parameter(config)
f8_call = phaser.call()
- assert f8_call.sv_called == {"int22h3_hap1": "inversion"}
+ assert f8_call.sv_called == {"f8_int22h3_hap1": "inversion"}
def test_deletion(self):
sample_id = "del"
@@ -29,4 +29,4 @@ def test_deletion(self):
config = update_config("f8")
phaser.set_parameter(config)
f8_call = phaser.call()
- assert f8_call.sv_called == {"int22h2_hap1": "deletion"}
+ assert f8_call.sv_called == {"f8_int22h2_hap1": "deletion"}
diff --git a/tests/test_phaser.py b/tests/test_phaser.py
index a946493..8026da5 100755
--- a/tests/test_phaser.py
+++ b/tests/test_phaser.py
@@ -18,6 +18,7 @@ def update_config(gene):
nchr_old = realign_region.replace(":", "_").replace("-", "_")
config.setdefault("nchr", nchr)
config.setdefault("nchr_old", nchr_old)
+ config.setdefault("nchr_length", 1000000000)
if "data" in config:
data_paths = config.get("data")
else:
@@ -266,3 +267,99 @@ def test_get_directed_links(self):
) = self.phaser.get_directed_links(new_reads, raw_read_haps, ass_haps, False)
assert directed_links == {"hap1-hap2": [1, 1]}
assert nondirected_links == {"hap1-hap2": [1, 1, 1]}
+
+ def test_update_twp_cp_in_fusion_cases(self):
+ haplotypes = {
+ "12121212": "hap1",
+ "01212120": "hap2",
+ "21212121": "hap3",
+ "02121210": "hap4",
+ }
+ two_cp_haps = Phaser.update_twp_cp_in_fusion_cases(haplotypes)
+ assert two_cp_haps == []
+
+ haplotypes = {
+ "12121212": "hap1",
+ "01212120": "hap2",
+ "21212121": "hap3",
+ }
+ two_cp_haps = Phaser.update_twp_cp_in_fusion_cases(haplotypes)
+ assert two_cp_haps == ["hap2"]
+
+ haplotypes = {
+ "01212120": "hap1",
+ "02121210": "hap2",
+ "21212121": "hap3",
+ }
+ two_cp_haps = Phaser.update_twp_cp_in_fusion_cases(haplotypes)
+ assert two_cp_haps == ["hap3"]
+
+ haplotypes = {
+ "0121212x": "hap1",
+ "21212121": "hap2",
+ "02121210": "hap3",
+ }
+ two_cp_haps = Phaser.update_twp_cp_in_fusion_cases(haplotypes)
+ assert two_cp_haps == []
+
+ def test_get_fusion_type(self):
+ self.phaser.call_fusion = "5p"
+ assert self.phaser.get_fusion_type("012121") == "duplication"
+ assert self.phaser.get_fusion_type("121210") == "deletion"
+ assert self.phaser.get_fusion_type("121211") is None
+
+ self.phaser.call_fusion = "3p"
+ assert self.phaser.get_fusion_type("012121") == "deletion"
+ assert self.phaser.get_fusion_type("121210") == "duplication"
+ assert self.phaser.get_fusion_type("121211") is None
+
+ def test_get_fusion_breakpoint_index(self):
+ breakpoint_index = self.phaser.get_fusion_breakpoint_index("121210", "111111122222")
+ assert breakpoint_index == 7
+
+ # PSV sequence does not agree with clips on the original haplotype
+ breakpoint_index = self.phaser.get_fusion_breakpoint_index("121210", "2222211111111")
+ assert breakpoint_index is None
+
+ breakpoint_index = self.phaser.get_fusion_breakpoint_index("012121", "2222211111111")
+ assert breakpoint_index == 5
+
+ # PSV sequence does not agree with clips on the original haplotype
+ breakpoint_index = self.phaser.get_fusion_breakpoint_index("012121", "111111122222")
+ assert breakpoint_index is None
+
+ breakpoint_index = self.phaser.get_fusion_breakpoint_index("112121", "111111122222")
+ assert breakpoint_index is None
+
+
+ def test_new_hap_for_breakpoint(self):
+ self.phaser.fusion_gene_def_variants = [
+ "1_A_T",
+ "3_C_T",
+ "5_A_T",
+ "7_C_T",
+ "9_A_T",
+ "11_C_T",
+ ]
+ self.phaser.homo_sites = ["7_C_T"]
+ self.phaser.het_sites = ["1_A_T", "3_C_T", "11_C_T"]
+ hap = "212"
+ new_hap, all_sites = self.phaser.new_hap_for_breakpoint(hap)
+ assert new_hap == "211212"
+ assert all_sites == self.phaser.fusion_gene_def_variants
+
+
+ self.phaser.fusion_gene_def_variants = []
+ new_hap, all_sites = self.phaser.new_hap_for_breakpoint(hap)
+ assert new_hap == "2122"
+ assert all_sites == ["1_A_T", "3_C_T", "7_C_T", "11_C_T"]
+
+ self.phaser.clip_3p_positions = [10]
+ new_hap, all_sites = self.phaser.new_hap_for_breakpoint(hap)
+ assert new_hap == "212"
+ assert all_sites == ["1_A_T", "3_C_T", "7_C_T"]
+
+ self.phaser.clip_5p_positions = [1]
+ new_hap, all_sites = self.phaser.new_hap_for_breakpoint(hap)
+ assert new_hap == "12"
+ assert all_sites == ["3_C_T", "7_C_T"]
diff --git a/tests/test_prepare_bam_and_vcf.py b/tests/test_prepare_bam_and_vcf.py
index 156ba09..3a2e8f8 100644
--- a/tests/test_prepare_bam_and_vcf.py
+++ b/tests/test_prepare_bam_and_vcf.py
@@ -1,64 +1,122 @@
import pytest
import os
+import json
from paraphase.prepare_bam_and_vcf import VcfGenerater
+from .test_phaser import update_config
class TestVcfGenerater(object):
cur_dir = os.path.dirname(__file__)
- sample_dir = os.path.join(cur_dir, "test_data")
- sample_id = "HG00733"
- vcf_generater = VcfGenerater(
- sample_id,
- sample_dir,
- None,
- )
+ sample_dir = os.path.join(cur_dir, "test_data", "vcf")
def test_get_var(self):
all_bases = ["A"] * 10
- var_seq, dp, ad, gt, qual = VcfGenerater.get_var(all_bases, "A")
+ var_seq, dp, ad, gt, qual, counter = VcfGenerater.get_var(all_bases, "A")
assert gt == "0"
assert var_seq == "A"
assert dp == 10
- assert ad == 10
+ assert ad == (10, 10)
all_bases = []
- var_seq, dp, ad, gt, qual = VcfGenerater.get_var(all_bases, "A")
+ var_seq, dp, ad, gt, qual, counter = VcfGenerater.get_var(all_bases, "A")
assert gt == "."
assert var_seq == "A"
assert dp == 0
- assert ad == 0
+ assert ad == (0, 0)
all_bases = ["T"] * 2
- var_seq, dp, ad, gt, qual = VcfGenerater.get_var(all_bases, "A")
+ var_seq, dp, ad, gt, qual, counter = VcfGenerater.get_var(all_bases, "A")
assert gt == "1"
assert var_seq == "T"
assert dp == 2
- assert ad == 2
+ assert ad == (0, 2)
all_bases = ["*"] * 3
- var_seq, dp, ad, gt, qual = VcfGenerater.get_var(all_bases, "A")
+ var_seq, dp, ad, gt, qual, counter = VcfGenerater.get_var(all_bases, "A")
assert gt == "1"
assert var_seq == "*"
assert dp == 3
- assert ad == 3
+ assert ad == (0, 3)
all_bases = ["T", "T", "A"]
- var_seq, dp, ad, gt, qual = VcfGenerater.get_var(all_bases, "A")
+ var_seq, dp, ad, gt, qual, counter = VcfGenerater.get_var(all_bases, "A")
assert gt == "1"
assert var_seq == "T"
assert dp == 3
- assert ad == 2
+ assert ad == (1, 2)
all_bases = ["A+2T", "A+2T", "A"]
- var_seq, dp, ad, gt, qual = VcfGenerater.get_var(all_bases, "A")
+ var_seq, dp, ad, gt, qual, counter = VcfGenerater.get_var(all_bases, "A")
assert gt == "1"
assert var_seq == "A+2T"
assert dp == 3
- assert ad == 2
+ assert ad == (1, 2)
all_bases = ["A+2T", "A+2T", "A+2T", "A+2T", "A"]
- var_seq, dp, ad, gt, qual = VcfGenerater.get_var(all_bases, "A")
+ var_seq, dp, ad, gt, qual, counter = VcfGenerater.get_var(all_bases, "A")
assert gt == "1"
assert var_seq == "A+2T"
assert dp == 5
- assert ad == 4
+ assert ad == (1, 4)
+
+ def test_convert_alt_record(self):
+ assert VcfGenerater.convert_alt_record("T", "A") == "A"
+ assert VcfGenerater.convert_alt_record("T", "TACG") == "T+3ACG"
+ assert VcfGenerater.convert_alt_record("TGC", "T") == "T-2NN"
+
+ def test_modify_hapbound(self):
+ assert VcfGenerater.modify_hapbound(1, 2, None) == "1-2"
+ assert VcfGenerater.modify_hapbound(1, 2, ["5p"]) == "1truncated-2"
+ assert VcfGenerater.modify_hapbound(1, 2, ["3p"]) == "1-2truncated"
+ assert (
+ VcfGenerater.modify_hapbound(1, 2, ["5p", "3p"]) == "1truncated-2truncated"
+ )
+
+ def test_run_without_realign(self):
+ sample_id = "HG004"
+ with open(os.path.join(self.sample_dir, "HG004.paraphase.json")) as f:
+ phase_calls = json.load(f)
+
+ # homozygous case
+ config = update_config("ARL17A")
+ vcf_generater = VcfGenerater(
+ sample_id,
+ self.sample_dir,
+ phase_calls["ARL17A"],
+ )
+ vcf_generater.set_parameter(config, tmpdir=self.sample_dir, prog_cmd="test")
+ variants_info, hap_info = vcf_generater.run_without_realign()
+ assert len(hap_info) == 2
+ assert hap_info[0][0] == "ARL17A_homozygous_hap1"
+ assert hap_info[1][0] == "ARL17A_homozygous_hap1_cp2"
+
+ # two-copy haplotypes
+ # truncated haplotypes
+ config = update_config("AMY2A")
+ vcf_generater = VcfGenerater(
+ sample_id,
+ self.sample_dir,
+ phase_calls["AMY2A"],
+ )
+ vcf_generater.set_parameter(config, tmpdir=self.sample_dir, prog_cmd="test")
+ variants_info, hap_info = vcf_generater.run_without_realign()
+ assert hap_info == [
+ ["AMY2A_hap1", 103616000, 103631602, None],
+ ["AMY2A_hap1_cp2", 103616000, 103631602, None],
+ ["AMY2A_hap2", 103619306, 103631602, ["5p"]],
+ ["AMY2A_hap3", 103619306, 103631602, ["5p"]],
+ ]
+
+ # ikbkg deletion, big SV
+ config = update_config("ikbkg")
+ vcf_generater = VcfGenerater(
+ sample_id,
+ self.sample_dir,
+ phase_calls["ikbkg"],
+ )
+ vcf_generater.set_parameter(config, tmpdir=self.sample_dir, prog_cmd="test")
+ variants_info, hap_info = vcf_generater.run_without_realign()
+ assert 154558014 in variants_info
+ assert ["154558014_DEL_154569698", ".", ".", [], "1", None] in variants_info[
+ 154558014
+ ]