You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@TomskDiver Hello there! 👋 I'm here to help you with bug fixes, questions, and contributions.
The regex pattern in the remove_images function of the MarkdownReader class should indeed be corrected to properly remove images from Markdown files. The current pattern r"!{1}\[\[(.*)\]\]" is incorrect. The correct pattern to match the standard Markdown image syntax ![alt_text](path_to_image_file) is:
pattern=r"!\[.*?\]\(.*?\)"
This pattern will accurately capture and remove images from the Markdown content, ensuring that the output is as expected [1].
Bug Description
Images from markdown file are not removed because error in regex pattern
pattern = r"!{1}\[\[(.*)\]\]"
: https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/markdown/base.py#L79But markdown syntax for images it
![alt_text](path_to_image_file)
(see https://www.markdownguide.org/basic-syntax/#images)May be correct regex is
!\[.*\]\(.*\)
FYI @hursh-desai, @jerry
Version
0.12.8
Steps to Reproduce
Test file: test.md
Output:
Must be like this:
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: