-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-14453: Fix bugs in installercache #7205
base: master
Are you sure you want to change the base?
Conversation
@paul-maidment: This pull request references MGMT-14453 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: paul-maidment The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@paul-maidment: This pull request references MGMT-14453 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
0369404
to
715734c
Compare
fa9cd8d
to
27013f9
Compare
27013f9
to
285a8bd
Compare
// runParallelTest launches a batch of fetches from the installer cache in order to simulate multiple requests to the same node | ||
// releases are automatically cleaned up as they are gathered | ||
// returns the first error encountered or nil if no error encountered. | ||
runParallelTest := func(params []launchParams) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is not a unit test, but we already have many "unit-tests" which are not properly unit.
As long as it's not slowing down the total "unit-test" time, I think it could be a good test to have in case this code needs to be changed in the future (deadlock here would be horrible).
We could argue that we could split properly unit and more "integration" tests by tag, and execute them separately, but I think it'd be outside the scope of this PR.
Wouldn't ginkgo support parallel execution of table tests? That would make it way more readable than currently. If ginkgo does not support it, I think we should improve readability of these functions
cd5ac30
to
55b7676
Compare
Added something to the docs directory to describe the functionality |
ab62cb3
to
65f68d5
Compare
@paul-maidment let's hold this PR or change it to WIP until we finish the code review process - there is no need to run tests every few minutes. |
65f68d5
to
b52b84e
Compare
4239446
to
b67780b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking much better after the last round of changes.
testGet := func(releaseID, version string, clusterID strfmt.UUID, expectCached bool) (string, string) { | ||
workdir := filepath.Join(manager.config.CacheDir, "quay.io", "release-dev") | ||
fname := filepath.Join(workdir, releaseID) | ||
mockReleaseCalls(releaseID, version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this depend on expectCached
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean exactly?
But the intent here is to be able to pass expectations about the cached status within the test so that assertions may be made.
It seems to be doing the job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm asking if we want to ensure we're not calling the release code if we're expecting the release to be cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, TBH, calling release.Cleanup
is not a huge issue, we need to remember that this is just cleaning up the hardlinks to simulate a normal flow of the system.
Even if something is cached, a hardlink is issued every time someone requests that release via Get
Clearing the hardlink may assist the eviction of that release in some cases (as there will be no hardlinks) but I don't think any of the cases in testGet
are doing anything like this.
In fact, I just tested and for these tests, we should never fail to call release.Cleanup
as some of the tests are checking scenarios where space is at a premium. If we don't clean up, we get deadlocked!
So TLDR. I don't think we need to change this.
_, err = os.Stat(r2) | ||
Expect(os.IsNotExist(err)).To(BeFalse()) | ||
_, err = os.Stat(r3) | ||
Expect(os.IsNotExist(err)).To(BeFalse()) | ||
|
||
By("verify that the links were purged") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just testing that you called Cleanup
in testGet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tests that cleanup actually cleans the links.
Code that was here before in any case, I tried not to interfere too much with existing tests
b67780b
to
b8cf08d
Compare
48e34dc
to
78b6d48
Compare
77da59c
to
6938c2f
Compare
…in the installer cache This PR is for the purpose of resolving multiple bugs within the installer cache, due to the poor condition of the current cache, it makes sense to fix this in a single PR. * https://issues.redhat.com/browse/MGMT-14452 Installer cache removes in-used cached image when out of space * https://issues.redhat.com/browse/MGMT-14453 INSTALLER_CACHE_CAPACITY small value cause to assisted-service crash * https://issues.redhat.com/browse/MGMT-14457 Installer cache - fails to install when running parallel with same version * Additionally, the cache did not respect limits, so this has been addressed here. Fixes: I have implemented fixes for each of the following issues. * Mutex was ineffective as not instantiated corrctly, leading to [MGMT-14452](https://issues.redhat.com//browse/MGMT-14452), [MGMT-14453](https://issues.redhat.com//browse/MGMT-14453). * Naming convention for hardlinks changed to be UUID based to resolve [MGMT-14457](https://issues.redhat.com//browse/MGMT-14457). * Any time we either extract or use a release, the modified time must be updated, not only for cached releases. This was causing premature pruning of hardlinks. * LRU cache order updated to be based on microseconds instead of seconds. * Eviction checks updated to consider max release size and also cache threshold. * We now check there is enough space before writing. * During eviction - releases without hard links will be evicted before releases with hard links.
6938c2f
to
f4842bf
Compare
This PR is for the purpose of resolving multiple bugs within the installer cache, due to the poor condition of the current cache, it makes sense to fix this in a single PR.
Installer cache removes in-used cached image when out of space
INSTALLER_CACHE_CAPACITY small value cause to assisted-service crash
Installer cache - fails to install when running parallel with same version
Fixes:
I have implemented fixes for each of the following issues.