How to replace pdf font and encoding with PyMuPDF or js port of mupdf? #4474
Replies: 2 comments
-
In general, you cannot do this in the literal sense of the word. The only thing you can do is extracting text and rewriting it with a new font. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much! I use https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/font-replacement, but it cannot work for me. I need to study the code, make some modifications, and see if it can work properly. |
Beta Was this translation helpful? Give feedback.
-
我有一些pdf文档,使用了宋体、黑体与楷体GB2312等汉字字体,但都是WinAnsi编码,所以在使用工具转WORD时,全部是乱码。有幸看到mupdf库,里面有一个js脚本fix-s22pdf.js,可以进行字体及编码的修改。代码中虽然是使用了宋体与黑体来替换,但是实际上是所有字体全部设置为内置的宋体。我了解了一下,发现addCJKFont实际上使用的都是内置宋体,内置黑体都不起作用。另外看到pymupdf/PyMuPDF-Utilities/font-replacement,针对此类pdf,有许多问题,最大的问题是没有替换到原来的中文字体及编码,也没有删除原文本,导致文本重叠。不知此问题该如何解决?使用JS可以正常工作,但全部只有内置宋体。使用pymupdf,该如何替换字体及编码?
I have some PDF documents that use Chinese fonts such as
Song typeface
,Heiti typeface
, andKai GB2312 typeface
, but they are all encoded inWinAnsi
, so when using the tool to convert todocx
of WORD, they are all garbled. I am fortunate to have seen themupdf
library, which contains a JS scriptfix-s22pdf.js
that allows for font and encoding modifications. Although the code usesSong typeface
andHeiti typeface
to replace them, in reality, all fonts are set to the built-inSong typeface
. I have looked into it and found thataddCJKFont
actually uses built-inSong typeface
, and the built-inHeiti typeface
does not work. Additionally, upon seeingpymupdf/PyMuPDF Utilities/font replacement
, there are many issues with this type of PDF. The biggest problem is that it does not replace the original Chinese font and encoding, nor does it delete the original text, resulting in text overlap. I don't know how to solve this problem? Using JS can work normally, but it only has built-in Song typeface. How to replace fonts and encodings with external fonts when using pymupdf?Beta Was this translation helpful? Give feedback.
All reactions