Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

173 filter invalid area names #184

Merged
merged 11 commits into from
Oct 31, 2024
Merged

173 filter invalid area names #184

merged 11 commits into from
Oct 31, 2024

Conversation

i-be-snek
Copy link
Collaborator

@i-be-snek i-be-snek commented Oct 30, 2024

In this PR:

  • Invalid country names like "country1" or "Location 2" where the model is hallucinating its own instructions are filtered out.

  • Filtered-out locations appear in the log as errors when parsing events

     normalize_locations: 2024-10-30 16:56:08 ERROR    Input area=country2 type=<class 'str'> is not a valid area name
     normalize_locations: 2024-10-30 16:56:08 ERROR    Input area=country2 type=<class 'str'> is not a valid area name

To test this PR, try:

  • Modifying an llm output file and swapping some admin areas and location names to "Country22" or "Administrative Area3", then parse the file to ensure they are filtered out

@i-be-snek i-be-snek changed the title WIP: 173 filter invalid area names 173 filter invalid area names Oct 30, 2024
@i-be-snek i-be-snek requested a review from liniiiiii October 30, 2024 17:09
@i-be-snek
Copy link
Collaborator Author

@liniiiiii this can be reviewed whenever. But if #183 passes and makes it to main first, then I'll rebase this on main. 😃

@liniiiiii
Copy link
Collaborator

@liniiiiii this can be reviewed whenever. But if #183 passes and makes it to main first, then I'll rebase this on main. 😃

@i-be-snek , I reviewed #183 and approve that pr, could you rebase this pr, I will review later, thanks!

@i-be-snek
Copy link
Collaborator Author

@liniiiiii done! it's ready for a review now :)

Copy link
Collaborator

@liniiiiii liniiiiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing with manually adding some wrong "Location1" and "Adminstrative_Areas1" in the raw file, the parsing process correctly filtered them out in L1, L2 and L3. Approved this pr to main branch, thanks!
image

@liniiiiii liniiiiii merged commit 703ec60 into main Oct 31, 2024
1 check passed
@i-be-snek i-be-snek linked an issue Oct 31, 2024 that may be closed by this pull request
6 tasks
@i-be-snek i-be-snek deleted the 173-filter-invalid-area-names branch January 17, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

process full run database (to do branch)
2 participants