-
Notifications
You must be signed in to change notification settings - Fork 411
MSC4048: Authenticated key backup #4048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6d0d9e2
a3751f5
97c3711
af43417
1757284
428b53d
6ed48ef
3b34542
2bf49b4
1403cb4
77401a0
05f1fee
20b767f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,275 @@ | ||
# MSC4048: Authenticated key backup | ||
|
||
The [server-side key | ||
backups](https://spec.matrix.org/unstable/client-server-api/#server-side-key-backups) | ||
allows clients to store event decryption keys so that when the user logs in to | ||
a new device, they can decrypt old messages. The current algorithm encrypts | ||
the event keys using an asymmetric algorithm, allowing clients to upload keys to | ||
the backup without necessarily giving them the ability to read from the | ||
backup. For example, this allows for a partially-trusted client to be able to | ||
read (and save the keys for) current messages, but not read old messages. | ||
|
||
However, since the event decryption keys are encrypted using an asymmetric | ||
algorithm, this allows anyone who knows the public key to write to the backup. | ||
As a result, keys loaded from the backup must be marked as unauthenticated, | ||
leading to [usability | ||
issues](https://github.com/vector-im/element-web/issues/14323). | ||
|
||
[MSC3270](https://github.com/matrix-org/matrix-spec-proposals/pull/3270) tries | ||
to fix this issue by using a symmetric, authenticated encryption algorithm, | ||
which ensures that only someone who knows the secret key can write to the | ||
backup. However this removes the ability for a client to be able to write to | ||
the backup without being able to read from it. | ||
Comment on lines
+21
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This MSC could really do with explaining why the ability to write to the backup but not read from it it is useful. Something something bots, I gather? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not just bots, but also as a prerequisite component for less privileged clients which don't have the ability to read all of history while still being able to participate in a room. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do any such clients or bots exist currently? I feel like a bot or client writing but not reading from backup really should be an implementation requirement of this MSC to prove that it is actually useful to keep the ability to write but not read from the backup. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure whether they exist at the moment. But retaining the ability to have clients of reduced power (rather than requiring that they are all of equal, maximal power) seems like an obviously useful property. Can you expand on how an example implementation would prove this further than a thought experiment? Or why the benefits are not obvious from the thought experiment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not obvious to me that a non-maximally-powerful client is useful. Maybe I have not been privy to some discussions about such potential use-cases? Even the bot case that @richvdh mentioned doesn't really make sense to me. Why would I provide the bot access to my account? I also don't know of any other chat network which has the concept of less-privileged clients (so as far as I know we are not copying prior art). It feels like such a client would be very confusing to end users. I'm just a bit wary of doing things because it's a "useful property" when the property has not actually been shown to be useful (think: the overcomplicated megolm ratchet which has a mechanism for skipping by multiples of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I know, the power asymmetry was deliberately factored into the current key backup mechanism, hence its asymmetric design. So while I fully sympathise with overengineering concerns, the switch to a symmetric design would lead to a loss of functionality. Which makes me think arguments to the contrary would be helpful: why an asymmetric design is not useful. In this particular case, I feel the utility of the asymmetry is easy to demonstrate. Privacy oriented applications such as Signal deliberately provide limited history support, which makes them naturally more suitable for highly sensitive environments and provides additional assurance against your entire conversation history falling into the wrong hands. On the other hand, Matrix goes all in and history is first class replicated to all devices. A way to disable history access on a particular device dramatically reduces this gap by giving you the ability to, for example, provision a phone from which you can talk securely with your contacts without the risk of your entire conversation history being disclosed if it's lost. However, given the asymmetric design, you still retain the ability to view those conversations from a more secure device such as a workstation with good physical security. There is one large plot hole in the above: currently typical clients provide each other verified client with its own copy of the cryptographic user identity, negating the benefits described above (if you have the MSK, you can just ask for the write part of the backup key). So to actually benefit from this, this would need to be coupled with a way to withhold the identity from some devices. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With MSC4153 and the general push for Invisible Crypto, there will basically never be a case where you don't perform cross-signing and share the master key. I still think that having a client in the wild demonstrating an actual use-case for asymmetry would be ideal here, however I'm not willing to die on this hill. |
||
|
||
We propose to continue using an asymmetric encryption algorithm in the backup, | ||
but to ensure authenticity by producing a MAC using a key derived from the | ||
backup's decryption key. | ||
|
||
## Proposal | ||
|
||
A user who has a key backup derives a new backup MAC key by performing HKDF on | ||
the backup decryption key (as raw unencoded bytes) with no salt and an info | ||
parameter of `"MATRIX_BACKUP_MAC_KEY"` and generating 32 bytes (256 bits): | ||
|
||
backup_mac_key = HKDF("", decryption_key, "MATRIX_BACKUP_MAC_KEY", 32) | ||
|
||
The backup MAC key can be shared using [the Secrets | ||
module](https://spec.matrix.org/unstable/client-server-api/#secrets) using the | ||
name `m.megolm_backup.v1.mac`. Note that if the backup decryption key (the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we have a bit more detail on that, and maybe the use case? WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See also #4048 (comment) for some more nuance on this. TL;DR, I would:
Avoiding the storage of the MAC key will lower complexity and ease reasoning, because we will avoid hard-to-debug situations, such as the SSSS containing only the MAC key, and therefore all clients being able to write to the backup but none of them being able to read it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that it feels weird to store an intermediate value (the MAC key) in SSSS. |
||
secret using the name `m.megolm_backup.v1`) is shared, then the backup MAC key | ||
does not need to be shared as it can be derived from the backup decryption | ||
key. Since the backup decryption key is usually stored in Secret Storage, the | ||
backup MAC key does not need to be stored. | ||
|
||
### `m.backup.v2.curve25519-aes-sha2` | ||
|
||
A new backup algorithm is defined, identified by the name | ||
"`m.backup.v2.curve25519-aes-sha2`". In addition to incrementing the version | ||
number, this name drops the "megolm", as it is expected that other types of | ||
keys may be stored in it, for example [MLS | ||
groups](https://github.com/matrix-org/matrix-spec-proposals/pull/4038). | ||
|
||
The intention of creating a new backup algorithm is to prevent an attacker from | ||
uploading additional keys that cannot be authenticated. | ||
|
||
The `auth_data` is the same as with `m.megolm_backup.v1.curve25519-aes-sha2`. | ||
|
||
The `session_data` is constructed as follows: | ||
|
||
1. Encode the session key to be backed up as a JSON object using the | ||
`SessionDataV2` format defined below. | ||
2. Generate an ephemeral curve25519 key, and perform an ECDH with the ephemeral | ||
key and the backup’s public key to generate a shared secret. The public half | ||
of the ephemeral key, encoded using unpadded base64, becomes the `ephemeral` | ||
property of the `session_data`. | ||
3. Using the shared secret, generate 80 bytes by performing an HKDF using | ||
SHA-256 as the hash, with a salt of 32 bytes of 0, and with the empty string | ||
as the info. The first 32 bytes are used as the AES key, the next 32 bytes | ||
are discarded, and the last 16 bytes are used as the AES initialization | ||
vector. (This is the same as the key generation for | ||
`m.megolm_backup.v1.curve25519-aes-sha2`, except that the generated MAC key | ||
is discarded since it is unused.) | ||
Comment on lines
+65
to
+71
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any reason to derive 80 bytes when we are discarding 32 of them? It seems it's just for compatibility with the (broken) |
||
4. Stringify the JSON object, and encrypt it using AES-CBC-256 with PKCS#7 | ||
padding. This encrypted data, encoded using unpadded base64, becomes the | ||
`ciphertext` property of the `session_data`. | ||
5. Encode the `session_data` as canonical JSON, as would be done when [signing | ||
JSON](https://spec.matrix.org/unstable/appendices/#signing-details), and | ||
calculate the HMAC-SHA-256 MAC using the backup MAC key. The MAC is | ||
base64-encoded (unpadded), and becomes the `backup_mac` property of the | ||
`unsigned` property of `session_data`. | ||
|
||
Thus the `session_data` property has `ephemeral`, `ciphertext`, and `unsigned` | ||
properties, with the `unsigned` property having a `backup_mac` property. | ||
Keys without an `unsigned`.`backup_mac` property, or with an incorrect MAC, | ||
must be ignored. | ||
|
||
When verifying the MAC, the `session_data` is encoded as canonical JSON, | ||
following the procedure as when signing JSON. That is, any additional | ||
properties, other than `signatures` and `unsigned`, are included. By putting | ||
the MAC in `unsigned` this allows clients to reuse existing code used for | ||
serializing JSON for signing. | ||
|
||
The `SessionDataV2` has algorithm-dependent and algorithm-independent | ||
properties. The algorithm-independent properties are: | ||
|
||
- `algorithm`: (required string) the end-to-end message encryption algorithm that the | ||
key is for. The values are the same as for the `algorithm` property in the | ||
`m.room_key` event. For example, for Megolm keys, this is | ||
`m.megolm.v1.aes-sha2`. | ||
- `unauthenticated`: (optional string) if not present, the key is considered to | ||
be authenticated, that is, the device that uploaded the key to the backup | ||
believes that the key belongs to the recorded sender, as defined by the key | ||
algorithm (with `m.megolm.v1.aes-sha2`, the sender is given in the | ||
`sender_key` property). A key is considered to be authenticated if: a) the | ||
key was received via an Olm-encrypted `m.room_key` event from the | ||
`sender_key`, b) the key was received via a trusted key forward | ||
([MSC3879](https://github.com/matrix-org/matrix-spec-proposals/pull/3879)), | ||
or c) the key was downloaded from the key backup where it is marked as | ||
authenticated, and the data can be authenticated (for example using the | ||
method from this proposal). | ||
|
||
If the key is not considered to be authenticated, this property indicates the | ||
source of the key. Currently defined values are: `m.undefined`, which | ||
indicates that the source is not specified; `m.legacy-v1`, which indicates | ||
that the key was an unauthenticated key from a | ||
`m.megolm_backup.v1.curve25519-aes-sha2` backup ([see | ||
below](#migrating-keys)); and `m.forwarded_room_key`, which indicates that | ||
the key came from an untrusted key forward. (FIXME: do we also want to | ||
encode the source of the key forward?) Clients may create other values to | ||
specify other sources, using the Java package naming convention; clients | ||
should treat unknown values as `m.undefined`. | ||
|
||
For the `m.megolm.v1.aes-sha2` algorithm, the algorithm-dependent properties | ||
are the `forwarding_curve25519_key_chain`, `sender_claimed_keys`, `sender_key`, | ||
and `session_key` properties defined for | ||
`m.megolm_backup.v1.curve25519-aes-sha2`. | ||
|
||
### `m.megolm_backup.v1.curve25519-aes-sha2` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just want to clarify my understanding here. We are modifying the definition of the existing Why? I don't see any reason for uploading new keys to an old backup. |
||
|
||
Megolm keys may be uploaded to a `m.megolm_backup.v1.curve25519-aes-sha2` | ||
backup using the `m.backup.v2.curve25519-aes-sha2` format, provided the | ||
`session_data` also contains the `mac` property as required for the | ||
`m.megolm_backup.v1.curve25519-aes-sha2` algorithm. | ||
|
||
The [construction of the `session_data` | ||
property](https://spec.matrix.org/unstable/client-server-api/#backup-algorithm-mmegolm_backupv1curve25519-aes-sha2) | ||
thus becomes: | ||
|
||
1. Encode the session key to be backed up as a JSON object using the | ||
`SessionData`. | ||
2. Generate an ephemeral Curve25519 key, and perform an ECDH with the ephemeral | ||
key and the backup’s public key to generate a shared secret. The public half | ||
of the ephemeral key, encoded using unpadded base64, becomes the `ephemeral` | ||
property of the `session_data`. | ||
3. Using the shared secret, generate 80 bytes by performing an HKDF using | ||
SHA-256 as the hash, with a salt of 32 bytes of 0, and with the empty string | ||
as the info. The first 32 bytes are used as the AES key, the next 32 bytes | ||
are used as the MAC key, and the last 16 bytes are used as the AES | ||
initialization vector. | ||
4. Stringify the JSON object, and encrypt it using AES-CBC-256 with PKCS#7 | ||
padding. This encrypted data, encoded using unpadded base64, becomes the | ||
`ciphertext` property of the `session_data`. | ||
5. Pass the raw encrypted data (prior to base64 encoding) through HMAC-SHA-256 | ||
using the MAC key generated above. The first 8 bytes of the resulting MAC | ||
are base64-encoded, and become the `mac` property of the `session_data`. | ||
Comment on lines
+152
to
+154
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not how the current algorithm works. See matrix-org/matrix-spec#1712 It passes an empty byte string instead of the raw encrypted data. |
||
6. Encode the `session_data` as canonical JSON, as would be done when [signing | ||
JSON](https://spec.matrix.org/unstable/appendices/#signing-details), and | ||
calculate the HMAC-SHA-256 MAC using the backup MAC key. The MAC is | ||
base64-encoded (unpadded), and becomes the `backup_mac` property of the | ||
`unsigned` property of `session_data`. | ||
|
||
FIXME: should the server compare the `unsigned`.`backup_mac` property when a | ||
client uploads a key to the backup, when deciding whether to keep the existing | ||
key or replace it with a new key? | ||
|
||
To simplify logic, clients may treat `m.backup.v2.curve25519-aes-sha2`-format | ||
keys with the same semantics as `m.megolm_backup.v1.curve25519-aes-sha2` keys | ||
when they are in a `m.megolm_backup.v1.curve25519-aes-sha2` backup. That is, | ||
clients may treat all keys in a `m.megolm_backup.v1.curve25519-aes-sha2` backup | ||
as being unauthenticated, regardless of the presence or absence of the | ||
`unsigned`.`backup_mac` property in the cleartext `session_data` property. | ||
|
||
#### Migrating keys | ||
|
||
When migrating keys from a `m.megolm_backup.v1.curve25519-aes-sha2` backup to a | ||
`m.backup.v2.curve25519-aes-sha2` backup, keys without a | ||
`unsigned`.`backup_mac` property in the cleartext `session_data` property, or | ||
with an invalid MAC, must have the `unauthenticated` property set to | ||
`m.legacy-v1` in the encrypted `SessionData`, regardless of whether the key | ||
originally had an `unauthenticated` property, and a `unsigned`.`backup_mac` | ||
property added to the cleartext `session_data`. If the same backup decryption | ||
key is used for the old and new backups, keys that have an existing | ||
`unsigned`.`backup_mac` property with a valid MAC may be uploaded to the new | ||
backup unchanged, as they will be valid | ||
`m.backup.v2.curve25519-aes-sha2`-format keys. | ||
|
||
## Potential issues | ||
|
||
For users with existing backups, in order to start storing backup keys using | ||
this format, the user may need to enter their Secret Storage key so that the | ||
client can obtain the backup decryption key, if it does not already have it | ||
cached, in order to derive the backup MAC key. If a user has multiple clients, | ||
one client may try to obtain the backup MAC key from other clients using Secret | ||
Sharing, but it does not have a way of knowing which clients, if any, have the | ||
backup MAC key. | ||
|
||
## Alternatives | ||
|
||
As mentioned above, we could switch to using a symmetric encryption algorithm | ||
for the key backup. However, this is not backwards-compatible, and does not | ||
allow for clients that can write to the backup without reading. | ||
|
||
Rather than using a new MAC key, we could use an existing signing key, such as | ||
one of the cross-signing keys. This would remove the need for users to enter | ||
their Secret Storage key to add the new signing key. However, this means that | ||
a user cannot create a key backup without also using cross-signing. Using a | ||
separate key also allows the user to give someone else (such as a bot) | ||
permission to write to their backups without allowing them to perform any | ||
cross-signing operations. | ||
|
||
A previous version of this MSC used a signing key that was generated randomly. | ||
The method presented in the current version has the following advantages: | ||
|
||
- No changes to `AuthData` are necessary, so a new backup version is not | ||
required. | ||
- A MAC is faster to calculate. The main advantage of a signature is that it | ||
allows one to verify the signature without knowing the private key, but in | ||
this case, reading is a more privileged action than writing, and writers | ||
already need to know the private/secret key. | ||
- Since the MAC key is derived from the decryption key, two clients can be | ||
upgraded at the same time without interfering with each other, as they will | ||
derive the same MAC key. | ||
- The MAC is calculated after encryption, and hence is verified before | ||
decryption, so we know that it is authenticated before we do any processing | ||
on it. | ||
|
||
A disadvantage of the currently-proposed method versus the previous proposal is | ||
that migration requires that the user gives the client access to the backup | ||
decryption key in order to derive the MAC key. However, in both proposals, | ||
most clients would require that the user enter their default SSSS key, which | ||
would give them access to the decryption key anyways. | ||
|
||
## Security considerations | ||
|
||
Being able to prove authenticity of keys may affect the deniability of | ||
messages: if a user has a Megolm session in their key backup that is MAC'ed by | ||
their backup MAC key, and the session data indicates that it originated from | ||
one of their devices, this could be used as evidence that the Megolm session | ||
did in fact come from them. | ||
|
||
This is somewhat mitigated by the fact that obtaining the Megolm session | ||
requires the decryption key for the backup. In addition, the deniability | ||
property mainly refers to the fact that a recipient cannot prove the | ||
authenticity of the message to a third party, and usually is not concerned with | ||
preventing self-incrimination. And in fact, a confiscated device may already | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm. Just because I created a megolm session doesn’t mean that I was the one who encrypted the messages in it, as megolm is symmetric? So proving I own the creation of a key doesn’t achieve much in terms of deniability aiui; a given message could have been fabricated by the other party? (at least until you try to send a msg with the same ratchet key - but i guess the same would be true if the megolm session was entirely fabricated, in terms of happening at the wrong place relative to other megolm sessions) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Megolm session has a signing key that only the creator knows the private part. So while anyone can encrypt a message with the Megolm session, they won't be able to produce a correct signature, so the message won't be validated. |
||
have enough information to sufficiently prove that the device's owner sent a | ||
message. | ||
|
||
## Unstable prefix | ||
|
||
Until this MSC is accepted, the following unstable names should be used: | ||
|
||
- the algorithm name `org.matrix.msc4048.curve25519-aes-sha2` should | ||
be used in place of the name `m.backup.v2.curve25519-aes-sha2`. | ||
- the property name `org.matrix.msc4048.unauthenticated` should be used in place | ||
of `unauthenticated` in the `SessionData` object, | ||
- the property name `org.matrix.msc4048.backup_mac` should be used in place of | ||
the `backup_mac` property in the `unsigned` property, | ||
- the SSSS identifier `org.matrix.msc4048.mac` should be used in place of | ||
`m.megolm_backup.v1.mac`. | ||
|
||
### Migration to stable names | ||
|
||
After this MSC is accepted, clients that understand the | ||
`org.matrix.msc4048.curve25519-aes-sha2` algorithm name should | ||
migrate the user to a backup using the accepted version of the | ||
`m.backup.v2.curve25519-aes-sha2` algorithm. Keys that use the unstable | ||
property names should be re-uploaded using the stable names. | ||
|
||
This includes migrating | ||
`org.matrix.msc4048.curve25519-aes-sha2`-format keys uploaded to | ||
`m.megolm_backup.v1.curve25519-aes-sha2` backups. | ||
|
||
## Dependencies | ||
|
||
None |
Uh oh!
There was an error while loading. Please reload this page.