-
Notifications
You must be signed in to change notification settings - Fork 87
Fix/parsing logic error for sax #642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix/parsing logic error for sax #642
Conversation
Thank you for the pull request! ❤️The Scribe-Data team will do our best to address your contribution as soon as we can. If you're not already a member of our public Matrix community, please consider joining! We'd suggest that you use the Element client as well as Element X for a mobile app, and definitely join the |
Maintainer ChecklistThe following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :) |
|
Let's review this in the call we have later, all :) |
|
I will review today. Well done! ✨ |
DeleMike
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks okay @priyankaforu!
I would love to test it locally. I did not see the steps to reproduce the issue you had in #641, so it is hard for me to have the complete experience like you describe in the issue.
Could you add this, please?
Hey @DeleMike , did you try downloading autosuggestions from wiki data ? Because for autosuggestions the issue is , parser is not parsing the characters inside the tags, so to fix that we need to let Sax parser to read in between characters, and store them in the array. I just added the missing methods there :) If you still feel this is unclear, I can explain you better connecting with you, if possible whenever you have time |
|
Okay, thanks. I will connect w/ you in Matrix. |
DeleMike
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @priyankaforu on this! And thanks for the sync. I totally see it all in action!
One suggestion:
- We need a little bit more logs when parsing starts.
- Hopefully, #646 solves this problem.
Contributor checklist
pytestcommand as directed in the testing section of the contributing guideDescription
Fix:
The enwiki's ".ndjson" was always parsing without any data, and this is because the parser is unable to open the file elements to parse like for example "
<page> <title> Heading For The Article </title> </page>is not being read and called back to parse the array of articles/words that we need. So I have added the missing callback triggers for reading the texts inside those tags.added filtering to omit the redirect links to various sources and name space pages
I have also verified it by downloading 2 enwiki_dump files and trying to parse them into words 👇
@axif0 @andrewtavis @DeleMike Please have a look and lemme know if there are any corrections to be made or if my approach in understanding the issue is wrong
#641