Cache content not found #707

jsharkey13 · 2025-07-04T12:55:07Z

Currently we have a problem with optional page fragments, such as the seasonal warning message on the contact us form, or the help fragments which don't have to exist.

In the old world, for one that does not exist: when it is queried for by a user, we go to ElasticSearch and discover is doesn't exist, but the cache contains only valid content and so we cannot add anything to the cache. When the next user loads the page, we ought already to know that this will fail, but since we couldn't cache it, we make the same unnecessary query to ES.

This PR moves the GitContentManager caches to always cache ResultsWrappers; these can always exist even when there are no results, and thus a failed lookup result can be cached. (I did initially experiment with using Optionals back in 2023 - rebasing the commit to more recently - but I decided that ResultsWrappers were cleaner). For the sake of avoiding unsafe casts, move to using two caches separated by type; I don't think this will increase the memory usage since it is the same stuff being cached but I suppose cleanup may be triggered differently due to different access patterns for the two caches.

This also moves to using an atomic get() call; this means that the stats recorded for Prometheus will actually be useful, since now there can actually be cache misses (previously we carefully checked whether a value was present first, non-atomically).

There were three GCM methods to refactor, but one of them was only used by an unnecessary and unused glossary terms endpoint; so to avoid refactoring it I just deleted the endpoint and then the method.

If we attempt to load a non-existent piece of content, say a page fragment used optionally in the frontend, then we will fail to find it in ElasticSearch and return that failure. But repeated requests will continue to go to ElasticSearch rather than be served by the content cache, even though we know they will all fail. Rather than directly cache the Content objects, cache Optionals, which allow Optional.empty() for the "not found" case. Also move to the more atomic .get(key, Callable) form to fetch the missing value if it is absent. This better guarantees that the Optional really does exist. Having built this, I am not actually sure it is cleaner than just having a marker singleton for "not found".

This helps remove one of the cache methods that would otherwise need rewriting.

Rather than using Optionals, instead cache a ResultsWrapper object; this can just contain an empty list in the Not Found case, and a single result in the successful getById case. To avoid unsafe casts, use separate caches for DOs and DTOs; this is perhaps unnecessary but does make the Java a lot cleaner. This also moves the DTO method to using the idiomatic and atomic get() method of the cache, which is again cleaner. By using this method, we should actually get usable stats from the stats collection on the caches; previously we never saw a miss because we always checked before trying to get something from the cache.

codecov · 2025-07-04T12:58:40Z

Codecov Report

Attention: Patch coverage is 75.86207% with 7 lines in your changes missing coverage. Please review.

Project coverage is 36.32%. Comparing base (5f2d34d) to head (55e4bc2).
Report is 24 commits behind head on master.

Files with missing lines	Patch %	Lines
...am/cl/dtg/segue/dao/content/GitContentManager.java	75.86%	5 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #707      +/-   ##
==========================================
+ Coverage   36.29%   36.32%   +0.02%     
==========================================
  Files         536      536              
  Lines       23737    23722      -15     
  Branches     2868     2864       -4     
==========================================
+ Hits         8615     8616       +1     
+ Misses      14259    14243      -16     
  Partials      863      863

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

src/main/java/uk/ac/cam/cl/dtg/segue/dao/content/GitContentManager.java

barna-isaac

Looks great! I've manually tested the change by starting the server in debug mode locally, sending
curl http://localhost:8080/isaac-api/api/pages/fragments/contact_intro, as well as curl http://localhost:8080/isaac-api/api/pages/fragments/non-existent several times, and observing that the callback only invokes the first time.

I understand what really fixes the caching issue is that we now use cache.get() rather than an if-else that only modifies the cache on success. I'd like to point out that this is just a performance improvement, and the observable behaviour is unchanged, so much so that we still produce log messages each time somebody requests a non-existent fragment. I think this is great, and I'm only mentioning it because the wording on the ticket suggests the log messages are what prompted the PR in the first place.

Regarding the included refactors, I'm glad the cache is now type-safe, and think that separating the caches makes things both safer and easier to understand. I'd also like to highlight that this PR does not increase the cache size -- it just separates what used to be cached twice, under a different key in the same cache, to be stored in two different caches.

Another decision @jsharkey13 mentions is that he's considered storing nulls as well as empty Optionals before settling on ResultsWrapper. I was happy to see the experiment with Optional's documented in a commit! Given that the methods return unwrapped values, and that termSearch returns ResultsWrappers, I think caching ResultsWrappers is a good choice that lets us avoid a larger refactor. Still, if we were to perform a larger refactor, I think it would improve things to consistently use a single wrapper that lets us avoid unwrapping results to frequently. If ResultsWrapper had a map method, we could write code like

 ResultsWrapper<String> rawResults = searchProvider.termSearch(...);
 return rawResults.map(this::mapFromStringListToContentList);

instead of

 ResultsWrapper<String> rawResults = searchProvider.termSearch(...);
 List<Content> searchResults = mapper.mapFromStringListToContentList(rawResults.getResults());
 return new ResultsWrapper<>(searchResults, rawResults.getTotalResults());

. I don't have a strong preference between using our own classes and Optional, although Optional comes with such methods already. If we also managed to extract logging to some different place, these methods would become a lot simpler.

barna-isaac · 2025-07-07T11:47:28Z

src/main/java/uk/ac/cam/cl/dtg/segue/api/GlossaryFacade.java

-     * @return segue version as a string wrapped in a response.
-     */
-    @GET
-    @Path("terms/{term_id}")


Now that this endpoint is gone, I suggest also deleting https://github.com/isaacphysics/isaac-react-app/blob/7b33e847ea1a968d83e1b2dd882091af641a7694/src/app/services/api.ts#L151

Yep, that was on my radar for once this is merged.

src/main/java/uk/ac/cam/cl/dtg/segue/dao/content/GitContentManager.java

jsharkey13 · 2025-07-07T13:01:42Z

I don't see much difference between the two styles of map-ing you show; they both make the same number of new objects, and the existing one is at least clear that it's a new wrapper being returned.

The existing implantation also has some odd behaviour; if the string fails to map, we just silently skip that object and return a shorter list. It is more obvious this could happen when the mapping is explicit, less so when we make a .map() method that isn't quite a true map function 🤷

jsharkey13 · 2025-07-07T13:05:24Z

I gave up on Optional because of the casting issues, and we would still have those even with the separate caches because sometimes we cache a single object and sometimes a list so there is no one type to choose. (This matters less now that I have reduced it to two methods, only one for each cache).

But the ResultsWrapper stores the total number of results, which is displayed to the user in the frontend and is not the number of results returned. So we'd need another class other than List to put inside the Optional anyway to store this metadata; at which point we have reinvented ResultsWrapper.

barna-isaac · 2025-07-07T13:13:08Z

@jsharkey13 thanks for the explanation, I've been wondering about the purpose of ResultsWrapper and this helps!

jsharkey13 added 3 commits July 4, 2025 10:36

Remove unused glossary endpoint and cached method

86cc41b

This helps remove one of the cache methods that would otherwise need rewriting.

github-advanced-security bot found potential problems Jul 4, 2025

View reviewed changes

src/main/java/uk/ac/cam/cl/dtg/segue/dao/content/GitContentManager.java Dismissed Show dismissed Hide dismissed

barna-isaac approved these changes Jul 7, 2025

View reviewed changes

barna-isaac merged commit c9111f9 into master Jul 15, 2025
5 checks passed

barna-isaac deleted the feature/cache-content-not-found branch July 15, 2025 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache content not found #707

Cache content not found #707

Uh oh!

jsharkey13 commented Jul 4, 2025

Uh oh!

codecov bot commented Jul 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

barna-isaac left a comment •

edited

Loading

Uh oh!

barna-isaac Jul 7, 2025

Uh oh!

jsharkey13 Jul 7, 2025

Uh oh!

Uh oh!

jsharkey13 commented Jul 7, 2025

Uh oh!

jsharkey13 commented Jul 7, 2025

Uh oh!

barna-isaac commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

Cache content not found #707

Cache content not found #707

Uh oh!

Conversation

jsharkey13 commented Jul 4, 2025

Uh oh!

codecov bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

barna-isaac left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

barna-isaac Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

jsharkey13 Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jsharkey13 commented Jul 7, 2025

Uh oh!

jsharkey13 commented Jul 7, 2025

Uh oh!

barna-isaac commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 4, 2025 •

edited

Loading

barna-isaac left a comment •

edited

Loading