-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KMS as Tapp #91
base: master
Are you sure you want to change the base?
KMS as Tapp #91
Conversation
Support for getting app keys from local key provider
Add workaround for the network issue of the keyprovider
7fb876c
to
c70a6ed
Compare
make docker restart always and run as daemon
#[derive(Deserialize, Serialize, Debug, Clone)] | ||
pub struct LocalConfig { | ||
pub kms_url: Option<String>, | ||
pub tproxy_url: Option<String>, | ||
pub pccs_url: Option<String>, | ||
pub docker_registry: Option<String>, | ||
pub host_api_url: String, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for what purpose and where is this struct being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The information is generated in host teepod and passed into CVM via a config.json
.
Then it is used in CVM:
kms_url
is used to connect to KMS to get app keys.pccs_url
is used during verifying the quote of key-provider.tproxy_url
is used to setup the tproxy net if enabled.docker_registry
is used as the official docker hub registry cache.host_api_url
is a VSOCK address used to notify teepod of the CVM boot progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have follow up questions here:
- does it make sense to have an attestation with this initial configuration hash passed by the host such that the dev can make sure the initial boot was with the correct config? Because IIUC, this config is not part of the CVM's measurements (RTMR0-2). So probably using either extended within RTMR[3] or user-report-data upon requesting attestation quote to verify the boot-up process of the CVM, correct?
- if the answer to the above question is yes, then could you refer me to that piece please?
- slightly unrelated question just for my understanding: if a TApp is deployed on a similar CVM base image and same config. How does the dev differentiate their instance from others trying to forge the same identity? Does this require pre-deployment setup such as calculating the TApp measurements and deploying AppAuth for each TApp. Then the light client within the CVM image would do checking the owner and compose hash of the App and then pulls it from the docker_registry? Also, vice-versa, does the CVM needs to verify the identity of the owner prior the light client checking and pulling the TApp container? I am asking this question because not sure if the dev/owner is doing the deployment themselves on bare-metal or there is an intermediary host, like Azure/GCP, that initiate the deployments and has control over the initial host provided config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good questions.
Answers to question 1 and 2:
We currently don't measure the LocalConfig. Instead of trusting these URLs, we do more concrete verifications individually.
- For
kms_url
, instead of trusting the URL, the CVM connects to KMS by RA-TLS. The KMS verifies CVM's RA quote. And the CVM measures the root public key of KMS to RTMR3, and the public key is also registered on-chain. Users can compare the root public key in RTMR3 against the on-chain value to ensure the CVM gets keys from the given KMS. - For
pccs_url
, the URL is always not trusted, but verified by the root CA certificate in the verification library. - For
tproxy_url
, it is also protected by bidirectional RA-TLS verification. - For
docker_registry
, the URL is also not trustable. The best practice is to always use image hashes in thedocker-compose.yaml
so that the docker engine would guarantee the image integrity after pulling the images. - For
host_api_url
, we don't need to trust it as it is used to emit boot progress and act as a proxy for requests to the key-provider.
Regarding question 3:
Before deploying a Tapp CVM, an app-id
is required to be registered in the KmsAuth contract and assign with an AppAuth contract to control which compose hash is allowed for the Tapp.
CVMs request app keys from KMS at boot time. The KMS asks the KmsAuth/AppAuth whether the compose-hash and other MRs are allowed to boot. If either the KmsAuth or AppAuth rejects the boot request, the KMS will not distribute the keys and the CVM will abort the boot.
Add attestation doc
return '0x' + hex; | ||
} | ||
|
||
async checkBoot(bootInfo: BootInfo, isKms: boolean): Promise<BootResponse> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review hint: This is where the KMS ask the smart contract for permission.
|
||
impl OnboardRpc for OnboardHandler { | ||
async fn bootstrap(self, request: BootstrapRequest) -> Result<BootstrapResponse> { | ||
let keys = Keys::generate(&request.domain).context("Failed to generate keys")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review hint: Generating KMS root keys
}) | ||
} | ||
|
||
async fn get_kms_key(self) -> Result<KmsKeyResponse> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint: Here the running KMS shares the keys to new KMS instance.
AuthApi::Dev { dev } => Ok(BootResponse { | ||
is_allowed: true, | ||
reason: "".to_string(), | ||
tproxy_app_id: dev.tproxy_app_id.clone(), | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint: Allow any app boot for local dev instance.
AuthApi::Webhook { webhook } => { | ||
let client = reqwest::Client::new(); | ||
let path = if is_kms { | ||
"bootAuth/kms" | ||
} else { | ||
"bootAuth/app" | ||
}; | ||
let url = url_join(&webhook.url, path); | ||
let response = client.post(&url).json(&boot_info).send().await?; | ||
if !response.status().is_success() { | ||
bail!("Failed to check boot auth: {}", response.text().await?); | ||
} | ||
Ok(response.json().await?) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint: Asking the webhook for boot authority.
let cert_pair = generate_ra_cert(tmp_ca.temp_ca_cert, tmp_ca.temp_ca_key)?; | ||
let ra_client = RaClientConfig::builder() | ||
.tls_no_check(false) | ||
.tls_built_in_root_certs(false) | ||
.remote_uri(kms_url.clone()) | ||
.tls_client_cert(cert_pair.cert_pem) | ||
.tls_client_key(cert_pair.key_pem) | ||
.tls_ca_cert(tmp_ca.ca_cert.clone()) | ||
.cert_validator(Box::new(|cert| { | ||
let Some(cert) = cert else { | ||
bail!("Missing server cert"); | ||
}; | ||
let Some(usage) = cert.special_usage else { | ||
bail!("Missing server cert usage"); | ||
}; | ||
if usage != "kms:rpc" { | ||
bail!("Invalid server cert usage: {usage}"); | ||
} | ||
Ok(()) | ||
})) | ||
.build() | ||
.into_client() | ||
.context("Failed to create client")?; | ||
let kms_client = kms_rpc::kms_client::KmsClient::new(ra_client); | ||
let response = kms_client | ||
.get_app_key(GetAppKeyRequest { | ||
app_compose: fs::read_to_string(host_shared.dir.app_compose_file()) | ||
.context("Failed to read app compose file")?, | ||
}) | ||
.await | ||
.context("Failed to get app key")?; | ||
{ | ||
let (_, ca_pem) = x509_parser::pem::parse_x509_pem(tmp_ca.ca_cert.as_bytes()) | ||
.context("Failed to parse ca cert")?; | ||
let x509 = ca_pem.parse_x509().context("Failed to parse ca cert")?; | ||
let id = hex::encode(x509.public_key().raw); | ||
let provider_info = KeyProviderInfo::new("kms".into(), id); | ||
emit_key_provider_info(&provider_info)?; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint (Important): Requesting app keys from kms here. The pubkey in the ca cert, used by remote endpoint of the TLS, is written to RTMR3 as provider info.
let Some(usage) = cert.special_usage else { | ||
bail!("Missing server cert usage"); | ||
}; | ||
if usage != "kms:rpc" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint: A real kms only issues certs with usage=kms:rpc
to themselves to server rpc, nerver issues it to apps.
let provision = host | ||
.get_sealing_key() | ||
.await | ||
.context("Failed to get sealing key")?; | ||
// write to fs | ||
let app_keys = | ||
gen_app_keys_from_seed(&provision.sk).context("Failed to generate app keys")?; | ||
let keys_json = serde_json::to_string(&app_keys).context("Failed to serialize app keys")?; | ||
fs::write(self.app_keys_file(), keys_json).context("Failed to write app keys")?; | ||
|
||
// write to RTMR | ||
let provider_info = KeyProviderInfo::new("local-sgx".into(), hex::encode(provision.mr)); | ||
emit_key_provider_info(&provider_info)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint: For local key provider, the sgx mr_enclave
is written to RTMR3. The sgx quote is verified in fn get_sealing_key
.
let key_provider = &self.app.config.key_provider; | ||
if !key_provider.enabled { | ||
bail!("Key provider is not enabled"); | ||
} | ||
let response = get_key(request.quote, key_provider.address, key_provider.port) | ||
.await | ||
.context("Failed to get sealing key from key provider")?; | ||
|
||
Ok(GetSealingKeyResponse { | ||
encrypted_key: response.encrypted_key, | ||
provider_quote: response.provider_quote, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint: Request from one CVM, delegate to the sgx key provider.
let ra_validator = RaValidator { | ||
remote_app_id: self.app_keys.tproxy_app_id.clone(), | ||
}; | ||
let cert_client = | ||
CertRequestClient::create(&self.app_keys, self.local_config.pccs_url.as_deref()) | ||
.await | ||
.context("Failed to create cert client")?; | ||
let client_key = | ||
KeyPair::generate_for(&PKCS_ECDSA_P256_SHA256).context("Failed to generate key")?; | ||
let config = CertConfig { | ||
org_name: None, | ||
subject: "tappd".to_string(), | ||
subject_alt_names: vec![], | ||
usage_server_auth: false, | ||
usage_client_auth: true, | ||
ext_quote: true, | ||
}; | ||
let client_certs = cert_client | ||
.request_cert(&client_key, config) | ||
.await | ||
.context("Failed to request cert")?; | ||
let client_cert = client_certs.join("\n"); | ||
let ca_cert = self.app_keys.ca_cert.clone(); | ||
let client = RaClientConfig::builder() | ||
.remote_uri(url) | ||
.maybe_pccs_url(self.local_config.pccs_url.clone()) | ||
.tls_client_cert(client_cert) | ||
.tls_client_key(client_key.serialize_pem()) | ||
.tls_ca_cert(ca_cert) | ||
.tls_built_in_root_certs(false) | ||
.tls_no_check(self.app_keys.tproxy_app_id == "any") | ||
.cert_validator(Box::new(move |cert| ra_validator.validate(cert))) | ||
.build() | ||
.into_client() | ||
.context("Failed to create RA client")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint: Connect to tproxy here. If the KMS(smart contract) allow to connect to any tproxy, then we don't validate remote cert, otherwise, do bidirectional ra verification, and ensure the remote app-id matches the one KMS returned.
@h4x3rotab I've added some review hints. |
@@ -0,0 +1,80 @@ | |||
// SPDX-License-Identifier: MIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Hint: And please pay attention to the smart contracts.
} | ||
|
||
// Function to check if KMS is allowed to boot | ||
function isKmsAllowed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something for right now, but something we have spent a lot of time discussing is censorship resistance of de-registering measurements.
The worst case for kms is one where:
- a vulnerability is discovered, and the corresponding measurement de-registered
- however, existing kms instances are blocked from processing new blocks, and therefore don't actually see the de-registration transaction
- an attacker can spend as long as they need in order to extract the root secret, since the kms instance will never see the de-registration transaction
- the de-registration might as well not have happened at all
To counteract this scenario, the kms instance needs to make sure it's not being censored — neither in the short or in the long term. What we have arrived at as a possible solution is interacting onchain (kms at startup and then periodically requiring a random nonce to appear onchain), and limiting the number of actions allowed without forcing an onchain interaction. This way you would never allow an unlimited attempts at breaking a kms instance, since there's only so many attempts before you need to either reboot the instance (which will require onchain interaction) or are blocked by the action limit (which also requires onchain interaction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent consideration.
Let me see if I understand the proposed solution correctly:
The KMS node needs to interact with the chain both at startup and periodically by sending transactions, to verify it has access to up-to-date chain data.
I'm wondering if we could also consider using trusted time sources like NTS as an additional verification mechanism? Specifically, we could compare the block timestamp with the trusted timestamp - if the difference is within an acceptable threshold, it would help ensure the node is operating with fresh data rather than being fed stale blocks by an attacker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The KMS node needs to interact with the chain both at startup and periodically by sending transactions, to verify it has access to up-to-date chain data.
I think that's one way of doing it, yes
I'm wondering if we could also consider using trusted time sources like NTS
I think so. Interacting with the blockchain is of course involved and costly — it'd be best to avoid it. With NTS specifically it would be interesting, because we are avoiding relying on local time — only considering what we are getting from NTS and the timestamp of the current chain head. I guess the thing to make sure is that NTS can't be replayed/delayed in a way that defeats the verification (but my guess is that it shouldn't be — as longs as it's https it could only be delayed, but never reordered) and the verification is as good as NTS itself (pretty good if you ask me in this scenario).
For what it's worth I think resolving this issue could be done at a later time
Cc @MoeMahhouk since you were thinking about nts recently
"OK" | ||
} | ||
|
||
if !kms_config.onboard.auto_bootstrap_domain.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if each instance should be whitelisted on chain before being allowed to onboard
The added visibility might be well worth it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device-id (which is implemented as a hash of the ppid) is whitelisted. Do you think this is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd guess that onboarding to kms (sharing the secret) is the most security-critical component, so I'd think that it'd be worth having an onchain trace of it happening — ideally sharing the secret requires an interactive onchain challenge (which would also force up-to-date view of the chain), but I'd imagine at least that all honest KMS instances would make it known that a given device requested the kms key (be it on chain or elsewhere) for transparency
cert_validator: Option<CertValidator>, | ||
} | ||
|
||
impl RaClientConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't think using ratls is an issue, we have lately been moving away from relying on RA. Rather we exchange the regular TLS certificates along with their attestations (an example alternative implementation would have each kms instance register their locally generated TLS cert+attestation on boot). If registration prior to onboarding is anyway something you'd want, then this could simplify the implementation a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the RA either at the TLS level, HTTP level, or the Application level.
Each has its advantages and disadvantages.
- The benefits of the Application level are that it's easy to implement in different languages and web frameworks.
- The benefits of the TLS level are that it is easier to abstract, the attestation stuff is not needed to defined in each RPC definition schema; once a TLS connection is attested, all transferred data over the connection is reliable.
We can easily reuse the same framework across several services, so far:- KMS: onboarding RPC, key providing RPC
- tproxy: registering RPC
- CVM: Inter-App Secure Communication (WIP)
The client side can simply use a curl
command with an RA-TLS enabled cert to request those different services.
We may do the abstraction at the Application level, but that requires considerations about potential MITM attacks which are still complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nice thing about doing it at the Application level is that it's more front-and-center, while ratls does hide the details a bit
CVM: Inter-App Secure Communication (WIP)
Do keep in mind establishing the ratls channel does take considerably longer (~1s for Azure for example, if you need to fetch pccs it's also going to take a while). Performance considerations is one thing that pushed us towards exchanging attested tls certs beforehand, and using regular tls for all but the initial connection. Ratls is I think fine for registering & other non-performance critical use cases, but inter-app communication could need the connections to be established faster
This PR implements an onchain governed KMS which supports deploying itself as a Tapp on dstack. See the doc for the details.
Key rotation is not implemented in this PR. Since dstack support base image upgrading, I think we can implement key rotation in future versions.
@Leechael Please pay attention to the RPC changes in KMS and Tappd.
The reproducibility of the Tapp docker image is not done yet. Here is a temporary alternative which builds the image on-site: