Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

非UTF-8编码的文件字节数获取是错误的 #985

Open
softmgr opened this issue Mar 10, 2025 · 1 comment
Open

非UTF-8编码的文件字节数获取是错误的 #985

softmgr opened this issue Mar 10, 2025 · 1 comment

Comments

@softmgr
Copy link

softmgr commented Mar 10, 2025

打开一个非UTF-8编码的文件,例如GB2312编码的文件,文件内容示例:

SciCall(SCI_GETSELTEXT, 0, 0);  //得到的是基于UTF-8编码的选中字节数。

将上述内容保存为ANSI格式文件。

选中一个英文字符,状态栏第4栏会显示为 "选中 1 / 1"。
选中中文部分的任意一个字符,状态栏第4栏会显示为 "选中 1 / 3"。----3个字节显然是错误的,因为一个中文字符占用2个字节。
同时,状态栏右下角得到的文件总长度也是错误的:它统计的字节数,是基于UTF-8编码得到的字节数,而非GB2312编码的字节数:这通过Windows文件「属性」可以看到正确的字节数。

@softmgr
Copy link
Author

softmgr commented Mar 11, 2025

看起来需要修改scintilla组件的源码才能解决。我试了一下,可以修改editor.cxx,通过得到的 selectedText.Data() 再拆分到每个字节判断ascii码,得到ANSI字符的实际长度,累加到一起,则可以得到真实的文件(选区)物理长度。测试200MB的文件,CPU耗时为0,即性能几乎无损。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant