-
Notifications
You must be signed in to change notification settings - Fork 7
Cache content not found #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If we attempt to load a non-existent piece of content, say a page fragment used optionally in the frontend, then we will fail to find it in ElasticSearch and return that failure. But repeated requests will continue to go to ElasticSearch rather than be served by the content cache, even though we know they will all fail. Rather than directly cache the Content objects, cache Optionals, which allow Optional.empty() for the "not found" case. Also move to the more atomic .get(key, Callable) form to fetch the missing value if it is absent. This better guarantees that the Optional really does exist. Having built this, I am not actually sure it is cleaner than just having a marker singleton for "not found".
This helps remove one of the cache methods that would otherwise need rewriting.
Rather than using Optionals, instead cache a ResultsWrapper object; this can just contain an empty list in the Not Found case, and a single result in the successful getById case. To avoid unsafe casts, use separate caches for DOs and DTOs; this is perhaps unnecessary but does make the Java a lot cleaner. This also moves the DTO method to using the idiomatic and atomic get() method of the cache, which is again cleaner. By using this method, we should actually get usable stats from the stats collection on the caches; previously we never saw a miss because we always checked before trying to get something from the cache.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #707 +/- ##
==========================================
+ Coverage 36.29% 36.32% +0.02%
==========================================
Files 536 536
Lines 23737 23722 -15
Branches 2868 2864 -4
==========================================
+ Hits 8615 8616 +1
+ Misses 14259 14243 -16
Partials 863 863 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
src/main/java/uk/ac/cam/cl/dtg/segue/dao/content/GitContentManager.java
Dismissed
Show dismissed
Hide dismissed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I've manually tested the change by starting the server in debug mode locally, sending
curl http://localhost:8080/isaac-api/api/pages/fragments/contact_intro
, as well as curl http://localhost:8080/isaac-api/api/pages/fragments/non-existent
several times, and observing that the callback only invokes the first time.
I understand what really fixes the caching issue is that we now use cache.get()
rather than an if-else
that only modifies the cache on success. I'd like to point out that this is just a performance improvement, and the observable behaviour is unchanged, so much so that we still produce log messages each time somebody requests a non-existent fragment. I think this is great, and I'm only mentioning it because the wording on the ticket suggests the log messages are what prompted the PR in the first place.
Regarding the included refactors, I'm glad the cache is now type-safe, and think that separating the caches makes things both safer and easier to understand. I'd also like to highlight that this PR does not increase the cache size -- it just separates what used to be cached twice, under a different key in the same cache, to be stored in two different caches.
Another decision @jsharkey13 mentions is that he's considered storing null
s as well as empty Optional
s before settling on ResultsWrapper
. I was happy to see the experiment with Optional
's documented in a commit! Given that the methods return unwrapped values, and that termSearch
returns ResultsWrapper
s, I think caching ResultsWrapper
s is a good choice that lets us avoid a larger refactor. Still, if we were to perform a larger refactor, I think it would improve things to consistently use a single wrapper that lets us avoid unwrapping results to frequently. If ResultsWrapper
had a map
method, we could write code like
ResultsWrapper<String> rawResults = searchProvider.termSearch(...);
return rawResults.map(this::mapFromStringListToContentList);
instead of
ResultsWrapper<String> rawResults = searchProvider.termSearch(...);
List<Content> searchResults = mapper.mapFromStringListToContentList(rawResults.getResults());
return new ResultsWrapper<>(searchResults, rawResults.getTotalResults());
. I don't have a strong preference between using our own classes and Optional
, although Optional
comes with such methods already. If we also managed to extract logging to some different place, these methods would become a lot simpler.
* @return segue version as a string wrapped in a response. | ||
*/ | ||
@GET | ||
@Path("terms/{term_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this endpoint is gone, I suggest also deleting https://github.com/isaacphysics/isaac-react-app/blob/7b33e847ea1a968d83e1b2dd882091af641a7694/src/app/services/api.ts#L151
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was on my radar for once this is merged.
I don't see much difference between the two styles of The existing implantation also has some odd behaviour; if the string fails to map, we just silently skip that object and return a shorter list. It is more obvious this could happen when the mapping is explicit, less so when we make a |
I gave up on Optional because of the casting issues, and we would still have those even with the separate caches because sometimes we cache a single object and sometimes a list so there is no one type to choose. (This matters less now that I have reduced it to two methods, only one for each cache). But the |
@jsharkey13 thanks for the explanation, I've been wondering about the purpose of |
Currently we have a problem with optional page fragments, such as the seasonal warning message on the contact us form, or the help fragments which don't have to exist.
In the old world, for one that does not exist: when it is queried for by a user, we go to ElasticSearch and discover is doesn't exist, but the cache contains only valid content and so we cannot add anything to the cache. When the next user loads the page, we ought already to know that this will fail, but since we couldn't cache it, we make the same unnecessary query to ES.
This PR moves the
GitContentManager
caches to always cacheResultsWrappers
; these can always exist even when there are no results, and thus a failed lookup result can be cached. (I did initially experiment with using Optionals back in 2023 - rebasing the commit to more recently - but I decided thatResultsWrappers
were cleaner). For the sake of avoiding unsafe casts, move to using two caches separated by type; I don't think this will increase the memory usage since it is the same stuff being cached but I suppose cleanup may be triggered differently due to different access patterns for the two caches.This also moves to using an atomic
get()
call; this means that the stats recorded for Prometheus will actually be useful, since now there can actually be cache misses (previously we carefully checked whether a value was present first, non-atomically).There were three GCM methods to refactor, but one of them was only used by an unnecessary and unused glossary terms endpoint; so to avoid refactoring it I just deleted the endpoint and then the method.