Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop events with no L1/L2/L3 impacts (or NULL impacts) #195

Merged
merged 6 commits into from
Nov 27, 2024

Conversation

i-be-snek
Copy link
Collaborator

@i-be-snek i-be-snek commented Nov 26, 2024

This PR:

  • Checks which events have no impacts on L1 (aka all Total_*_Min/Total_*_Max are empty)
  • Check if these events have any data in L2 or L3 -- they should not have any after applying the data gap filling.
  • If the events don't exist in L1, L2, or L3, they are dropped.

While working on this PR, I've discovered a bug: if an event ID doesn't exist in L2, but there is some L3 data, the data is not propagated to L2, and hence never moves up to L1. These events are NOT dropped in this PR, but are instead shown in the logs. In the entire database, there are only 17 instances (from 10 events) of this. This bug can be tackled in another PR.

In total, 94 event IDs are found to have no impact but only 84 event IDs are dropped because of the bug described above.

@i-be-snek i-be-snek changed the title WIP: 🚧 Drop events with no L1/L2/L3 impacts (or NULL impacts) Drop events with no L1/L2/L3 impacts (or NULL impacts) Nov 27, 2024
@i-be-snek i-be-snek requested a review from liniiiiii November 27, 2024 15:32
@i-be-snek
Copy link
Collaborator Author

Sorry, I forgot to push the last database release after this fix :) including now!

@liniiiiii
Copy link
Collaborator

liniiiiii commented Nov 27, 2024

I got this error when I try to open the branch,

Smudge error: Error downloading Database/output/full_run_25_deduplicated_data_gap_inflation_adjustment_drop_missing_event_ids/l1/0000.parquet (5237e47be04bb7f2ce29aead7de5c884b22688d48f3a544e4f81e535ef16dda0): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

And I checked the account, we over consumed the data storage, I will pay for it now, and to see if the issue solved, thanks!
image

@liniiiiii
Copy link
Collaborator

@i-be-snek , I successfully tested this pr, all events without L1-L3 impacts are filtered, and the events with L3 impacts but no L1 records are keeped which is fine in our case, and the approval in the review page is super slow and I can't type, therefore I approved the pr here, and merged it to the main, thanks!

@liniiiiii liniiiiii merged commit 3ddd749 into main Nov 27, 2024
1 check passed
@i-be-snek i-be-snek deleted the drop-l1-missing-all-impacts branch January 17, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants