add_epub_dependency#3176
Merged
Merged
Conversation
JasonW404
reviewed
Jun 24, 2026
| "ijson==3.5.0", | ||
| "langchain-text-splitters==1.1.2", | ||
| "ebooklib==0.20", | ||
| "pypandoc==1.17", |
Member
There was a problem hiding this comment.
新增 pypandoc==1.17 依赖。pypandoc 是 Pandoc 的 Python 封装,需要系统安装 Pandoc 二进制文件才能正常工作。请确认:1) Docker 镜像中是否已包含 Pandoc?2) 如果未安装,运行时是否会优雅降级?3) 这个依赖是用于什么场景(如文档格式转换)?建议在 PR 描述中说明。
Contributor
Author
|
该潜在问题在 #3217 中被解决。解决方式为:构建data-process镜像时安装pandoc |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1)修改内容:加上pypandoc依赖。
2)pypandoc 作用:用于被第三方工具unstructured内部调用,将.epub文件转换为html格式,供其进一步提取纯文本内容。
3)潜在问题:Docker镜像中未包含Pandoc,运行时也无优雅降级,仅提供pypandoc仍无法成功转换文件。