Skip to content

refactor(event cache): consolidate logic around retrieving the latest gap #4733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 28, 2025

Conversation

bnjbvr
Copy link
Member

@bnjbvr bnjbvr commented Feb 27, 2025

This sits on top of #4731, which could be reviewed independently or as part of this PR (since it's only a single commit).

Before this PR, the logic when back-paginating is the following:

  • (1) try to back-paginate from the store
    • if there's a gap in the chunk, indicate so to the caller
    • load a chunk, if it's a gap, indicate so to the caller (if it's an events chunk, return it, and we're done)
    • if we've reached the end of the timeline, and haven't waited for a pagination token, maybe we should wait for one
  • (2) then if we had a gap, we do the following:
    • look at the in-memory linked chunk and try to find a gap
    • depending on whether we found one or not, wait for one from sync

One can see that the gap is looked for in multiple places. Also, each of the major steps involve taking and releasing the state lock, which means that in between, some other things can happen to the room event cache's state. In particular, it can be shrunk after step (1) but before step (2), confusing the code a lot. In particular, the test that got transformed into a regression test by enabling the event cache's storage showed the following:

  • step (1) happens, there are no events: load_more_events_backwards() returns "Gap", meaning "wait for an initial prev-batch token" in this case.
  • the linked chunk got shrunk because a limited sync was observed by the event cache,
  • step (2) observes there's no gap in the in-memory linked chunk but there are events: it concludes we've hit the end of the timeline, while we haven't.

The key here is that step (2) should've called load_more_event_backwards() again, to properly handle the consequences of shrinking (either prepend a new chunk of events, or load a latest gap).

This PR consolidates all this logic, so load_more_event_backwards() outcome gets more information:

  • a gap now includes the prev-batch token, which is passed to the networking step.
  • the outcome may also be "wait for an initial prev-batch token, because i've never seen one". Hopefully the logic in conclude_load_more_for_fully_loaded_chunk is sufficiently documented with the code comment.

When the outcome is WaitForInitialPrevToken, we do the race between waiting 3 seconds or getting a prev-batch token from sync, and restart by calling the load_more_event_backwards() again. This properly handles a linked chunk that's shrunk when the state lock was released, fixing the regression test.

As a result, get_or_wait_for_token becomes useless, as it was only used in tests, and its meaning wasn't quite correct in the presence of the event cache, because of lazy-loading. I've removed it; first I started to port the tests, but they didn't make a lot of sense anymore (the waiting doesn't happen in load_more_event_backwards(), so I would've had to call paginate_backwards_impl, turning those "unit" tests into full integration tests — which I didn't want to). All this code is heavily tested in event cache integration tests, or indirectly by the timeline tests as well, so I'm confident it's safe to remove these tests.

Part of #3280.

@bnjbvr bnjbvr requested a review from Hywan February 27, 2025 16:35
@bnjbvr bnjbvr requested a review from a team as a code owner February 27, 2025 16:35
@bnjbvr bnjbvr force-pushed the bnjbvr/simplify-pagination-conditions branch from bb3825e to 6f1a9c5 Compare February 27, 2025 16:40
Copy link

codecov bot commented Feb 27, 2025

Codecov Report

Attention: Patch coverage is 95.91837% with 2 lines in your changes missing coverage. Please review.

Project coverage is 86.14%. Comparing base (55143e1) to head (32e3975).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
crates/matrix-sdk/src/event_cache/room/mod.rs 90.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4733      +/-   ##
==========================================
- Coverage   86.15%   86.14%   -0.02%     
==========================================
  Files         291      291              
  Lines       34287    34276      -11     
==========================================
- Hits        29539    29526      -13     
- Misses       4748     4750       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Hywan Hywan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. So. Those are relatively important changes and I've no comment about it: not because I don't understand it, but because I think it's more robust to put things this way, and the changes make a lot of sense to me. Thanks.

The removal of get_or_wait_for_token is welcomed, embedding the logic inside LoadMoreEventsBackwardsOutcome is a nice idea.

The new loop in pagination that takes a new (write) lock guard for every iteration makes it more robust again state changes during calls to load_more_events_backwards(). Passing the prev_token directly from LoadMoreEventsBackwardsOutcome (so from what we have in the store!) to paginate_backwards_with_network doesn't solve the problem of concurrent paginations, but it does solve a race about the prev_token. All in all, it's easier to think about this code and its flow.

Thank you.

@bnjbvr bnjbvr force-pushed the bnjbvr/simplify-pagination-conditions branch from 6f1a9c5 to e0b9627 Compare February 28, 2025 11:07
@bnjbvr bnjbvr enabled auto-merge (rebase) February 28, 2025 11:07
@bnjbvr bnjbvr force-pushed the bnjbvr/simplify-pagination-conditions branch from e0b9627 to 32e3975 Compare February 28, 2025 11:46
@bnjbvr bnjbvr merged commit f7297ed into main Feb 28, 2025
42 checks passed
@bnjbvr bnjbvr deleted the bnjbvr/simplify-pagination-conditions branch February 28, 2025 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants