-
Notifications
You must be signed in to change notification settings - Fork 665
Add manifest extension AoT #14128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add manifest extension AoT #14128
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14128
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 4 Cancelled Jobs, 2 Unrelated FailuresAs of commit 0b0880b with merge base 66639e4 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D82052721 |
This PR needs a
|
extension/manifest/_manifest.py
Outdated
|
||
def to_bytes(self) -> bytes: | ||
"""Returns the binary representation of the Manifest. Written | ||
bottom up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, explain why it's written bottom up
"Returns the binary representation of the Manifest. Written bottom up to allow for BC considerations. The compatibility-preserving way to make changes is to increase the header's length field and add new fields at the top. This means we can always check the last n bytes for the magic and size, and then load the full footer."
extension/manifest/_manifest.py
Outdated
+self.padding_size.to_bytes(4, byteorder=_MANIFEST_BYTEORDER) | ||
# uint32_t: Size of this manifest. This makes it easier to add new | ||
# fields to this header in the future. Always use the proper size | ||
# (i.e., ignore self.length) since there's no reason to create an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is self.length used for?
Oh the padding isnt needed since the alignment doesnt matter since we reconstruct byte by byte anyway. Ill remove |
Is there a strong reason to not include this in the core AOT code (as opposed to a dedicated extension?). I don't have a super strong opinion on this, but I do worry about the growing number of fine-grained extensions being confusing to users and compromising UX. |
e82daa2
to
9c31b7a
Compare
Summary: Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later. A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too. Differential Revision: D82052721
This pull request was exported from Phabricator. Differential Revision: D82052721 |
Summary: Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later. A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too. Differential Revision: D82052721
9c31b7a
to
22ff2b7
Compare
This pull request was exported from Phabricator. Differential Revision: D82052721 |
Summary: Pull Request resolved: pytorch#14128 Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later. A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too. Differential Revision: D82052721
22ff2b7
to
6466f05
Compare
I could put the aot stuff in core and put the runtime reader in extension? The modularity is a feature for embedded. We shouldnt expose so many options in mobile builds which is why we have the presets. |
Summary: Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later. A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too. Differential Revision: D82052721
6466f05
to
03be89a
Compare
Summary: Pull Request resolved: pytorch#14128 Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later. A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too. Differential Revision: D82052721
This pull request was exported from Phabricator. Differential Revision: D82052721 |
03be89a
to
0b0880b
Compare
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write a docblock about an example usage of manifest file for higher layer consumers. Also mention that manifest is a mechanism, not a security policy. And explicitly say that consumers implements appropriate security for their threat model
# 1. Generate PTE file
pte_data = serialize_pte_binary(program)
# 2. Create cryptographic signature of PTE data
signature = sign_with_private_key(pte_data, private_key) # e.g., RSA, ECDSA
# 3. Append manifest with signature
manifest = Manifest(signature=signature)
pte_with_manifest = append_manifest(pte_data, manifest)
Verification Process
# 1. Extract manifest from end of file
manifest = Manifest.from_bytes(file_data)
# 2. Extract PTE data (using program_offset)
pte_data = file_data[:-(manifest_length + padding)]
# 3. Verify signature with public key
is_valid = verify_signature(pte_data, manifest.signature, public_key)
# Unique ID for the data the manifest was appended to. Often this might contain | ||
# a crytographic signature for the data. | ||
signature: bytes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine people can use this for other use-cases besides security, such as saving arbitrary serializable metadata.
For instance, saving tokenizer.json file location etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine people can use this for other use-cases besides security, such as saving arbitrary serializable metadata.
It wasnt really the intent. I chose this impl here because I wanted a really light weight way to attach security information or other core metadata about the pte.
If we want it to store arbitrary user defined things like a json then I dont really think appending to the .pte is the correct solution, just shove it all in a zip would be my opinion.
# Unique ID for the data the manifest was appended to. Often this might contain | ||
# a crytographic signature for the data. | ||
signature: bytes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you add timestamp field too?
|
||
EXPECTED_MAGIC: ClassVar[bytes] = b"em00" | ||
|
||
MAX_SIGNATURE_SIZE: ClassVar[int] = 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can just do one load at runtime. Instead of 2 loads or a stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
512 should also cover the vast majority of cryptographic signature algorithms I saw.
return data | ||
|
||
@staticmethod | ||
def from_bytes(data: bytes) -> "_ManifestLayout": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For large files you have to read the whole thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add more methods like from file that just load the last MAX_SIZE bytes. Or stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont really expect people to be verifying the signature in python though. Its mostly just there for testing.
# Unique ID for the data the manifest was appended to. Often this might contain | ||
# a crytographic signature for the data. | ||
signature: bytes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this?
@dataclass
class Manifest:
type: str # "signature", "checksum", "metadata", etc.
version: int
payload: bytes
timestamp: Optional[int]
attributes: Dict[str, str]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is version the version of the manifest struct or user specified?
If a user wanted to have multiple things then would you expect them to daisy chain manifests?
What is attributes?
Summary:
Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later.
A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too.
Differential Revision: D82052721