Skip to content

Commit eb2e23a

Browse files
committed
v0.2.6.r296
1 parent 6481e74 commit eb2e23a

File tree

6 files changed

+19
-5
lines changed

6 files changed

+19
-5
lines changed

CHANGES.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## [v0.2.6] - 2025-03-20
1+
## [v0.2.6] - 2025-03-22
22
#### New features
33
- `collapse`, rescue the collapsed contigs
44
#### Enhancement

bin/cphasing-rs

-21.7 KB
Binary file not shown.

cphasing/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
3434
__license__ = "BSD"
3535
__status__ = "Development"
36-
__version__ = "0.2.6.r295"
36+
__version__ = "0.2.6.r296"
3737
__url__ = "https://github.com/wangyibin/CPhasing"
3838
__doc_url__ = "https://wangyibin.github.io/CPhasing"
3939
__epilog__ = f"""

cphasing/pqs.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -677,6 +677,7 @@ def to_hg_df(self, chunks, contig_idx,
677677
)
678678

679679
results = list(filter(lambda x: x is not None, results))
680+
results = list(filter(lambda x: len(x) > 0, results))
680681
if len(results) == 0:
681682
logger.warning("No data found in the given region.")
682683
return
@@ -988,10 +989,10 @@ def process_chunk_to_cool_global(chunk, binsize,
988989
chunk = chunk.filter(pl.col("mapq") >= min_mapq).drop("mapq")
989990

990991
bin1_id = (chunk["pos1"] // binsize) + chunk["chrom1"].map_elements(
991-
bin_offset_db.get
992+
bin_offset_db.get, skip_nulls=False
992993
).cast(schema["pos1"])
993994
bin2_id = (chunk["pos2"] // binsize) + chunk["chrom2"].map_elements(
994-
bin_offset_db.get
995+
bin_offset_db.get, skip_nulls=False
995996
).cast(schema["pos2"])
996997
chunk = (
997998
chunk.with_columns([bin1_id.alias("bin1_id"), bin2_id.alias("bin2_id")])
@@ -1053,6 +1054,8 @@ def process_chunk_hg(chunk_name, bed_dict, contigsizes,
10531054
columns = ["chrom1", "pos1", "chrom2", "pos2", "mapq"]
10541055
chunk = pl.scan_parquet(chunk_name).select(columns)
10551056
chunk_name = Path(chunk_name).stem
1057+
1058+
10561059
if min_mapq > 1:
10571060
chunk = chunk.filter(pl.col("mapq") >= min_mapq)
10581061

docs/faq.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
### The results of the first round partition are unsatisfactory.
22
In our two-round partition algorithm, the first round partition depends on the h-trans errors between homologous chromosomes; if you input a contig assembly with low level switch errors or input a high accuracy pore-c data, the h-trans will be not enough to cluster all contigs to correct homologous groups, resulting in unsatisfactory results. You can set the `-q1 0` for `hyperpartition` to increase the rate of h-trans errors. However, this parameter may raise error of `out of memory` when you input huge pore-c data in porec table or hic contacts in pairs file.
33

4+
### The total size of the chromosomes significantly smaller than the estimate genome size
5+
If the following two conditions exist, you can adjust the mode of the `cphasing pipeline` to either (` --preset sensitive`) or (`--preset very-sensitive`).
6+
1. The amount of entered data is low. 2. The input genome is relatively complex, with many homozygous or nearly homozygous regions. It should be noted that the above two modes will cause some very small contig to cluster or sort incorrectly. In the second case, greedy clustering may occurs, in which two highly homologous sets of chromosomes are grouped into one group.
7+
8+
49
### How to set the `-n` parameter when assembling an aneuploid genome.
510
The aneuploid genome, such as modern cultivated sugarcane, contains unequal homologous chromosomes. The `-n` parameter can be set to zero (`-n 10:0`) to automatically partition contigs into different chromosomes within a homologous group.
611
However, we also allow the user to input a file with two columns: the first column is the index(1-base) of the first round partition, and the second column is the chromosome number of each homologous. And then specified the `-n 10:second.number.tsv` in `cphasing pipeline` or `cphasing hyperpartition`.

docs/faq.zh.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
### 第一轮分组的结果不好:
2-
在我们的两轮聚类算法中,第一轮聚类依赖于同源染色体之间比对错误;如果用户输入低水平Switch error的contigs或输入高精度的Pore-c数据,h-*trans*将不足以支撑将来自同源染色体的contig聚到一起,这容易导致结果不理想。用户可以为`hyperpartition``pipeline`设置`-q1 0`以增加h-*trans*错误率。但是,当您在孔表或配对文件中输入大量的Pore-C数据时,此参数可能会引发内存不足的错误。
2+
在我们的两轮聚类算法中,第一轮聚类依赖于同源染色体之间比对错误;如果用户输入低水平Switch error的contigs或输入高精度的Pore-c数据,h-*trans*将不足以支撑将来自同源染色体的contig聚到一起,这容易导致结果不理想。用户可以为`hyperpartition``pipeline`设置`-q1 0`以增加h-*trans*错误率。但是,当您在porec.gz或pairs.gz中输入大量的互作数据时,此参数可能会引发内存不足的错误。
3+
4+
5+
### 挂载上的染色体总大小远低于预估基因组大小
6+
如果存在以下两种情况,可以通过调整 `cphasing pipeline`的模式至敏感(`--preset sensitive`)或者超敏感(`--preset very-sensitive`
7+
1. 输入的数据量低。2. 输入的基因组较为复杂,存在大量的纯合或者近乎纯合的区域。 需要注意的是,以上两种模式会让部分较碎的contig聚类或者排序错误。同时如果属于第二种情况,容易发生贪婪的聚类,即两条高度同源的染色体组被分到一组里面。
8+
39

410
### 如何在组装非整倍体基因组时设置`-n`参数:
511
非整倍体基因组,如现代栽培的甘蔗,包含数目不相等的同源染色体。我们建议`-n`参数可以设置为零(`-n 0:0`),让程序自动判别分组数

0 commit comments

Comments
 (0)