Skip to content

add_epub_dependency#3176

Merged
Dallas98 merged 1 commit into
ModelEngine-Group:developfrom
yzAiden:add_epub_dependency
Jun 3, 2026
Merged

add_epub_dependency#3176
Dallas98 merged 1 commit into
ModelEngine-Group:developfrom
yzAiden:add_epub_dependency

Conversation

@yzAiden

@yzAiden yzAiden commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

1)修改内容:加上pypandoc依赖。
2)pypandoc 作用:用于被第三方工具unstructured内部调用,将.epub文件转换为html格式,供其进一步提取纯文本内容。
3)潜在问题:Docker镜像中未包含Pandoc,运行时也无优雅降级,仅提供pypandoc仍无法成功转换文件。

@yzAiden yzAiden requested review from Dallas98 and WMC001 as code owners June 2, 2026 04:40
@Dallas98 Dallas98 merged commit 2ffde75 into ModelEngine-Group:develop Jun 3, 2026
12 checks passed
Comment thread sdk/pyproject.toml
"ijson==3.5.0",
"langchain-text-splitters==1.1.2",
"ebooklib==0.20",
"pypandoc==1.17",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增 pypandoc==1.17 依赖。pypandoc 是 Pandoc 的 Python 封装,需要系统安装 Pandoc 二进制文件才能正常工作。请确认:1) Docker 镜像中是否已包含 Pandoc?2) 如果未安装,运行时是否会优雅降级?3) 这个依赖是用于什么场景(如文档格式转换)?建议在 PR 描述中说明。

@yzAiden

yzAiden commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

该潜在问题在 #3217 中被解决。解决方式为:构建data-process镜像时安装pandoc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants