-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-17771: Adds enhancement for FIPS with multiple RHEL installer versions #6290
base: master
Are you sure you want to change the base?
Conversation
@carbonin: This pull request references MGMT-17771 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: carbonin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
docs/enhancements/multi-rhel-fips.md
Outdated
## Summary | ||
|
||
In order for an OpenShift cluster to be considered FIPS compliant the installer | ||
must be run on a system with FIPS compliant openssl libraries. This means using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the system running the installer would need to have FIPS mode enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that's not really relevant for this topic. Do you still want it added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without fips mode enabled, the binary won't use openssl at all, so I think it's a relevant detail to include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three parts of the problem are:
- you must use the dynamically-linked binary (
openshift-baremetal-install
/openshift-install-fips
) b/c sometimes it will need to do FIPS - if you use a dynamically linked library, you must use a RHEL 9 userspace from 4.16 and may use one for pre-4.16
- but if you want to do FIPS pre-4.16 you must use a RHEL 8 userspace
docs/enhancements/multi-rhel-fips.md
Outdated
present on our container image. Today this is not a problem because all | ||
`openshift-install` binaries in use have been expecting to link to RHEL 8 based | ||
openssl libraries, but now OpenShift 4.16 will ship and installer that requires | ||
RHEL 9 libraries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we over-emphasizing the details of openssl and dynamic linking in this paragraph? AIUI nothing about how that works is changing. The second sentence seems to imply that there might be a statically-linked binary available, or that it's an option, but is that the case?
Can we boil this down to:
When FIPS mode is enabled on the host running the installer:
- for openshift < 4.16, the installer must run in a RHEL 8 userspace
- for openshift >= 4.16, the installer must run in a RHEL 9 userspace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sentence seems to imply that there might be a statically-linked binary available, or that it's an option, but is that the case?
Yes, there is. Actually I'm working on a bug right now that is solved by using the openshift-install
binary from 4.16. https://issues.redhat.com/browse/OCPBUGS-33227
That said, it is that simple. I just figured it would be easier to do this split in any case.
If we also want to consider the static binary for non-fips cases we just add complexity (3 cases instead of just 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would someone obtain a statically vs dynamically linked build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of https://issues.redhat.com/browse/CORS-3418 openshift-install
is statically linked and as of https://issues.redhat.com/browse/CORS-3419 it supports baremetal installs.
openshift-baremetal-install
is now only intended for use with FIPS use-cases as that's the only time you'd need a dynamically linked binary. https://issues.redhat.com/browse/WRKLDS-1171 added a new command to oc release extract
so it could be referenced as openshift-install-fips
which more accurately represents its purpose.
docs/enhancements/multi-rhel-fips.md
Outdated
assisted-service will take over as usual from there. | ||
|
||
Each of these containers should maintain its own cache of installers as | ||
assisted-service does now and should be built with the required packages to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What packages are required? Is this just implying that one image will have a RHEL 8 base, and the other a RHEL 9 base, each of which comes with different content? Or are there literally specific packages that need to be installed with dnf in our Containerfiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess the openshift-client
oc
CLI would? Or is that something we'd extract from the release image specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure. I'll add this to open questions, but I'd expect the required libraries it to be documented with the installer somewhere.
|
||
### Open Questions | ||
|
||
1. What does the API look like for the runner containers? What data should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple start would be to have a shared volume among all three containers.
- assisted-service would invoke an installer within the correct "runner container" and wait
- the installer would generate content and write it to the shared volume
- assisted-service would get the signal that the installer run was complete and get a reference to the location for its output
The shared storage could be ephemeral; it just needs to be persistent for the life of the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long do we expect the openshift-install binary to take? It should be quick enough that we could keep an http request open, and write the response to the client when it's done, right? No need for an asynchronous task API?
Maybe the exception would be if the installer binary isn't already in the local cache. Should assisted-service continue managing one cache that gets mounted into each installer-runner container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cache will be managed by the installer runner container and this container will also upload the resulting artifacts to the required location (either the shared data volume or s3).
I don't expect the operation to take too long. I think we could to a single http request for this. Even if we need to tweak the timeout parameters I'd rather that than build an async system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also regarding the storage side of this.
The cache and the place where the generated files are stored after the installer is run is already a volume (specifically a PV) and this is what I intended to use for the shared storage, so I think that's a non-issue.
My question here was what exactly will we need to pass over the API call.
docs/enhancements/multi-rhel-fips.md
Outdated
|
||
## Design Details [optional] | ||
|
||
The new runner containers will expose a HTTP server listening only for local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look reasonable?
- A new
installer-runner
service will be created, written in go - The installer-runner will be compiled twice: once in a RHEL 8 builder image, and once in a RHEL 9 builder image, with each resulting binary being placed into a RHEL base image of corresponding version.
docs/enhancements/multi-rhel-fips.md
Outdated
the installer. This seems possible, but doesn't make much sense if we want to | ||
run a persistent service rather than a job per cluster. Running a job per | ||
cluster wouldn't scale particularly well and would also be an even larger | ||
architectural change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring k8s to run Jobs would also break compatibility for running assisted-installer with podman.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we support assisted-installer on podman.
Also at some point podman will get it: containers/podman#17011.
The assisted-service can still be persistent it can use k8s jobs for executing the openshift-installer create-ignition manifests commands.
Why using a job is considered "a larger architectural change" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't offer normal product support for it. But as a project: it works, we invite people to try it, and people use and depend on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using a job is considered "a larger architectural change" ?
I suppose it's not that much larger really, but it makes the runtime more complicated.
Running jobs over the kube API feels heavier to me than making an http call to a persistently running container in the same pod (effectively an RPC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more difficult to arrange for a shared filesystem with a Job.
You also have to account for all of the things that could happen when scheduling the job, including that it might end up on a node that fails, or on a node that gets drained right as it starts, etc. It would definitely require an asynchronous approach to watching and waiting for the work to be completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do go with Jobs
then both ABI and podman would either need to stick with the current method of running the installer on the service container (which might be fine for ABI since it will always be using the matching image, and podman because we'll never support a FIPS cluster that way) or find some other way to spin up the image from the release to run the installer. In any case we end up with needing to support and maintain two methods for running the installer which isn't ideal.
I'm also worried about just how many jobs we might end up running at once.
I don't know exactly what the peak number of installs at a time is on the SaaS, but I know it's at least several hundred simultaneously and several thousand in a few hours when the scale team is testing the on-prem deployment through MCE, @akrzos likely has those numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we don't offer normal product support for the podman runtime method, I don't see a reason why someone couldn't use it on a FIPS-enabled system to install a FIPS-enabled cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we don't offer normal product support for the podman runtime method, I don't see a reason why someone couldn't use it on a FIPS-enabled system to install a FIPS-enabled cluster.
I don't disagree, but the question is more whether we need the upstream project to support installing both pre-4.16 and post-4.16 clusters with FIPS from the same podman deployment and not ask the user to create separate podman deployments. That sounds like the kind of requirement that products have but which upstream projects can dispense with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth I added some more detail about these alternatives to the actual enhancement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't build a different assisted-installer service to install one version of openshift vs. another. It's just one service that theoretically covers them all.
In theory we could make two parallel builds available and tell users to run one stack for version <=4.15 and the other for versions >4.16, but that seems like more work than just solving the problem properly for everyone whether they're running the upstream or downstream build.
docs/enhancements/multi-rhel-fips.md
Outdated
## Design Details [optional] | ||
|
||
The new runner containers will expose a HTTP server listening only for local | ||
connections. assisted-service will POST to this server when it needs manifests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of local connections, it's worth considering a unix socket instead of the network stack. We did that for communicating with ansible-runner from an operator process. It's a simpler approach when all communication is local; no TLS, no auth, no DNS, no network protocols, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I agree. I think a socket would work well for this.
but that still requires us to publish a new one for the architecture | ||
assisted-service will not be using. | ||
|
||
## Design Details [optional] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This service closely matches what we need to implement. It could almost be borrowed as-is, mostly modifying what action it takes when receiving a request. Its purpose is to receive events from ansible-runner on a local unix socket shared between the containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, looks about right to me, thanks for the reference
docs/enhancements/multi-rhel-fips.md
Outdated
- Dynamically determining a given release's RHEL version. The RHEL version will | ||
be a part of configuration tied to the minor release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, how will this look? Will it be an index we maintain like the default_os/release images files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that, but now I'm rethinking it.
Maybe it would be better to just switch on if the version is 4.16 or later.
@mhrivnak mentioned OKD in this context. What do the cluster versions look like when OKD is in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image list trimmed:
oc adm release info quay.io/openshift/okd@sha256:46b462be1e4c15ce5ab5fba97e713e8824bbb9f614ac5abe1be41fda916920cc
Name: 4.15.0-0.okd-2024-03-10-010116
Digest: sha256:46b462be1e4c15ce5ab5fba97e713e8824bbb9f614ac5abe1be41fda916920cc
Created: 2024-03-10T06:53:44Z
OS/Arch: linux/amd64
Manifests: 707
Metadata files: 1
Pull From: quay.io/openshift/okd@sha256:46b462be1e4c15ce5ab5fba97e713e8824bbb9f614ac5abe1be41fda916920cc
Release Metadata:
Version: 4.15.0-0.okd-2024-03-10-010116
Upgrades: <none>
Metadata:
Component Versions:
kubernetes 1.28.7
machine-os 39.20240210.3 Fedora CoreOS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we'd still get 4.15.0-0.okd-2024-03-10-010116
in this case.
I mean I suppose if no one is doing FIPS with OKD then we should be able to apply the same rule and it will work out. (4.16 OKD would go to the el9 runner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not always use fips compliant installer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will, but which FIPS compliant installer we use (rhel8 or rhel9) will be determined by the release version.
docs/enhancements/multi-rhel-fips.md
Outdated
Two additional containers will run alongside assisted-service in the same pod. | ||
These "installer-runner" containers will, when triggered by assisted-service, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these containers expected to always run? Like even during non-fips mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the idea, yeah.
My intention is to use these containers for both fips and non-fips cases as well since there shouldn't really be a difference and if we were to only use them for fips they would get tested very infrequently.
generated. The runner container will respond with any error that occurred while | ||
generating the manifests. | ||
|
||
### Open Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the new microservice to have its own repo? That may be a good end-state for the sake of isolated testing and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I briefly called this out in New Images
under Implementation Details
, but if you think it would be good to put somewhere else or just talked about more prominently then we can do that.
I don't think it's an open question though. For this pass I'm going to create the service in assisted-service
using the existing code to run the installer. Extracting and splitting everything up would definitely come later.
- Allow for a single installation of assisted-service to install FIPS-compliant | ||
clusters using `openshift-install` binaries built against RHEL 8 or RHEL 9 | ||
|
||
- Allow for FIPS compliant clusters to be installed from the SaaS offering or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does SaaS requires also apps SRE to chime in and install FIPS compliant clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have an open task for app-sre https://issues.redhat.com/browse/SDE-3692
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, also this comment was more about the solution allowing for this possibility not necessarily that, once implmented, we will be able to install FIPS from the SaaS. The main goal is to make FIPS work for both rhel8 and rhel9 installer releases.
Creating, maintaining, and releasing additional images is a non-trivial amount | ||
of additional work. | ||
|
||
## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're saying that <4.16 is not fips complaint anyhow, can we say that for those releases we gonna use the statically linked one and for >=4.16, dynamically linked with rhel9? Can we use single existing container then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're saying that. OpenShift can be installed in FIPS mode today, and that's been the case for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ACM/SaaS world? or in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general and in ACM IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sources for "OpenShift can be installed in FIPS mode today" include 4.12 docs for FIPS installation. And ROSA FedRAMP docs:
ROSA only uses FIPS-validated modules to process cryptographic libraries.
docs/enhancements/multi-rhel-fips.md
Outdated
## Proposal | ||
|
||
Two additional containers will run alongside assisted-service in the same pod. | ||
These "installer-runner" containers will, when triggered by assisted-service, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triggered how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plan right now is for an http API, I'll add it here.
docs/enhancements/multi-rhel-fips.md
Outdated
the installer. This seems possible, but doesn't make much sense if we want to | ||
run a persistent service rather than a job per cluster. Running a job per | ||
cluster wouldn't scale particularly well and would also be an even larger | ||
architectural change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we support assisted-installer on podman.
Also at some point podman will get it: containers/podman#17011.
The assisted-service can still be persistent it can use k8s jobs for executing the openshift-installer create-ignition manifests commands.
Why using a job is considered "a larger architectural change" ?
What about ABI? Wont it be affected by this change? |
ABI doesn't actually have this problem since they build assisted-service on the base image that corresponds with the installer, but making this change might affect them if we don't also continue to support the current way we run the installer (in the assisted-service container). Also it depends on what you mean by "affect". If, after this change, they also deploy assisted as described in the enhancement (with the sidecar containers) then things should just work for them, but this will affect them in that they will likely need to update the deployment code for their assisted-service instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about ABI? Wont it be affected by this change?
We'll want to keep the same implementation, so we'll likely need to deploy the other container in ABI as well.
The actual issue doesn't affect ABI because we only deploy one version of OpenShift from each assisted-service, so we make sure it always matches.
docs/enhancements/multi-rhel-fips.md
Outdated
## Summary | ||
|
||
In order for an OpenShift cluster to be considered FIPS compliant the installer | ||
must be run on a system with FIPS compliant openssl libraries. This means using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three parts of the problem are:
- you must use the dynamically-linked binary (
openshift-baremetal-install
/openshift-install-fips
) b/c sometimes it will need to do FIPS - if you use a dynamically linked library, you must use a RHEL 9 userspace from 4.16 and may use one for pre-4.16
- but if you want to do FIPS pre-4.16 you must use a RHEL 8 userspace
Creating, maintaining, and releasing additional images is a non-trivial amount | ||
of additional work. | ||
|
||
## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered installing the RHEL 8 OpenSSL 1.1 into the RHEL 9 container alongside OpenSSL 3.0?
(Note: I haven't tried this to see what conflicts.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but I'm a bit worried about landing this in time and this approach feels like it has more unknown potential pitfalls than the approach described here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did bring up the alternative, which should probably be added to this doc, of extracting an entire rhel 8 userspace within a rhel 9 userspace, and using chroot to run an installer that needs rhel 8. But that brings other complexities around image build and management that probably aren't worthwhile compared to just running two copies of a small process in two different container images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't need chroot. We can use ldpreload env to point to the relevant path when installing <4.16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that this could be as simple as installing a separate ssl version somewhere else and setting some env vars when running the installer?
I can give that a try if there's a chance it might work.
Will ssl be the only library I need to override in this way? Everything else will be backward compatible? Will this actually be FIPS compliant? Are there any other requirements we would also need to satisfy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running a rhel8 container (our current released image) the files provided by the older openssl rpm are these:
bash-4.4$ rpm -ql openssl-libs-1.1.1k-12.el8_9.x86_64
/etc/pki/tls
/etc/pki/tls/certs
/etc/pki/tls/ct_log_list.cnf
/etc/pki/tls/misc
/etc/pki/tls/openssl.cnf
/etc/pki/tls/private
/usr/lib/.build-id
/usr/lib/.build-id/14
/usr/lib/.build-id/14/0abbb0c09726652dd61128b74b8bb5010f5542
/usr/lib/.build-id/42
/usr/lib/.build-id/42/9301d8c47d78f29d7a39fe01d8076e7b656e4e
/usr/lib/.build-id/6f
/usr/lib/.build-id/6f/619a1566052cd522eb945d91197ad60852a8f8
/usr/lib/.build-id/bb
/usr/lib/.build-id/bb/0f1c2857f6f9e4f68c7c5dc36a41c441318eca
/usr/lib/.build-id/e1
/usr/lib/.build-id/e1/1d63787a0230c919507d7ebc73e8af7e8e8a2b
/usr/lib64/.libcrypto.so.1.1.1k.hmac
/usr/lib64/.libcrypto.so.1.1.hmac
/usr/lib64/.libssl.so.1.1.1k.hmac
/usr/lib64/.libssl.so.1.1.hmac
/usr/lib64/engines-1.1
/usr/lib64/engines-1.1/afalg.so
/usr/lib64/engines-1.1/capi.so
/usr/lib64/engines-1.1/padlock.so
/usr/lib64/libcrypto.so.1.1
/usr/lib64/libcrypto.so.1.1.1k
/usr/lib64/libssl.so.1.1
/usr/lib64/libssl.so.1.1.1k
/usr/share/licenses/openssl-libs
/usr/share/licenses/openssl-libs/LICENSE
I suppose I could use a multi-stage build to copy those files into a special directory on the final (rhel9) image then try use LD_PRELOAD
to push those library versions instead of the ones installed on the system.
I'd need a fully FIPS enabled environment to test if it's working though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LD_PRELOAD
(or rather, LD_LIBRARY_PATH
which I think is what you're going for here) is not really the issue - libssl.so.1.1 and libssl.so.3.0 are different major versions of the shared library, so AIUI the binary will know to load the right one. This issue is what other files the RPM installs that might conflict. Mostly just those .cnf files by the looks of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you can install different versions of the same RPM on the same system.
Also I don't know if the older ssl package is even available in the el9 repos.
My idea was to use a multi-stage container build, install the ssl rpm on an el8 image then copy the files directly into a directory in later stage. Then when we know we need to run the el8 binary we'd set whatever envs we need to allow it to find the old so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully I'll have an environment to test this in this morning and then if it works I'll run it by the FIPS experts to figure out if this process would be considered compliant.
In order for an OpenShift cluster to be considered FIPS compliant the installer must be run on a system with FIPS mode enabled and with FIPS compliant openssl libraries installed. This means using a dynamically linked `openshift-install` binary against the openssl libraries present on our container image. Today this is not a problem because all `openshift-install` binaries in use have been expecting to link to RHEL 8 based openssl libraries, but now OpenShift 4.16 will ship an installer that requires RHEL 9 libraries. This will require assisted-service to maintain a way to run the `openshift-install` binary in a compatible environment for multiple openssl versions. Specifically FIPS-enabled installs for pre-4.16 releases will need to be run on an el8 image and 4.16 and later releases will need to be run on an el9 image (regardless of FIPS). Resolves https://issues.redhat.com/browse/MGMT-17771
c1f6adf
to
6366631
Compare
1. Create a separate userspace for el8 libraries and chroot when those libraries | ||
are required. | ||
- This seems a bit complicated and it will likely make our image quite a bit | ||
larger than it already is (~1.3G). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out and was able to get both installers to run in a container on a FIPS-enabled RHEL9 host using chroot
for the rhel8 one. I only needed to copy /usr
from the el8 image and create the top-level symlinks in the chroot target dir. Based on my tests with the centos images this adds about 200M to the image size.
I still need to get assurance that this would actually result in a FIPS-compliant cluster, but if it does it seems like the most promising option to me.
I think this is worth a POC. @mhrivnak @romfreiman What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this approach cause a problem for image scanning and tracking of packages for CVEs and such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked about this in the FIPS-related internal channels and it turns out nothing like this would be considered FIPS-compliant so this disqualifies any approach that doesn't run the installer on a base container image that it is expecting to run on.
I'll add this to the enhancement text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chroot :)
78e9ee1
to
0b01fd9
Compare
0b01fd9
to
9a69ecd
Compare
image *must* match the installer for the resulting cluster to be considered | ||
FIPS-compliant so none of these multi-library options are valid. | ||
|
||
### Publish multiple assisted-service images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romfreiman does this look like what you were talking about?
Tried to capture it here.
|
||
### Publish multiple assisted-service images | ||
|
||
It's likely that a user installing in FIPS mode will only be interested in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we think this is true?
Everyone starts with a single version. But existing clusters and their use cases lag behind new clusters and new use cases. I would expect that most users end up with a mix of openshift versions over time.
Keep in mind that users need the ability to re-provision their clusters at any time, for recovery from all kind of unexpected events. And a growing number of customers depend on regular "cluster as a service" provisioning. Telling such a user that they can only (re-)provision a subset of openshift versions would generally be a big disadvantage.
Even a large-scale user with relatively uniform versions is not going to upgrade all of their clusters at once. It would not be ideal to put them in a position where they can only provision clusters for the newest openshift versions in their fleet.
Is there reason to think that a FIPS user would have different patterns and expectations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that this is an assumption on my part, but my experience with even moderately conservative customers is that they have a single version of OCP that is vetted and allowed in the org (at least in production).
My guess was that any user interested in full FIPS compliance would be even more strict than that, but I have no actual users to back any of this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for upgrade and reprovisioning. I do consider this option a temporary solution that buys us time to properly think through and vet something that would work more generically. Ideally in the version following the one this alternative would be implemented in we'd implement the main body of this enhancement (or something similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They start with a single vetted version, then that progresses over time, but upgrades lag.
I think it's still better to implement this enhancement as described, which would fully preserve the existing use cases and user experience, and then optimize it later as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhrivnak all good. can we find such customer now? we'll have it as an rfe and improve in the next release. I dont think we should restructure the whole service 2 weeks before the release.
for next release, lets reconsider.
choose which to deploy based on configuration (likely an annotation since a more | ||
robust solution would be preferred in the future). | ||
|
||
For example, in the case that a user knew they wanted to deploy OCP 4.14 from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the user who owns the configuration of the management cluster itself is the same user who is provisioning clusters. But often that role is separate. The "Cluster Creator" would need to work with the cluster admin to re-configure assisted-installer to switch it from el8 mode to el9 mode. Then what if they need to switch back? This could become a real hassle.
It would help to have automation that recognizes when the config doesn't work for a desired install operation, but resolving that could be a pain. Even for cases where it is the same person doing both personas, it's inconvenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already something that the application admin is choosing to a degree when they set the available OS images, so I don't think it's too much of a stretch for them to also communicate which OCP versions will be in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. So then the time will come when the cluster creator asks the admin to add a new version that requires removal of all other versions. Then they'll have to negotiate an upgrade plan with all stakeholders to get the whole fleet safely over the hump, etc etc. Doesn't that seem like a pain? If we can avoid creating that burden for our customers, we'll all be better-off.
@carbonin: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
In order for an OpenShift cluster to be considered FIPS compliant the installer must be run on a system with FIPS compliant openssl libraries. This means using a dynamically linked
openshift-install
binary against the openssl libraries present on our container image. Today this is not a problem because allopenshift-install
binaries in use have been expecting to link to RHEL 8 based openssl libraries, but now OpenShift 4.16 will ship and installer that requires RHEL 9 libraries.This will require assisted-service to maintain a way to run the
openshift-install
binary in a compatible environment for multiple openssl versions.List all the issues related to this PR
https://issues.redhat.com/browse/MGMT-17771
What environments does this code impact?
How was this code tested?
Checklist
docs
, README, etc)Reviewers Checklist