Skip to content

Conversation

JacobSzwejbka
Copy link
Contributor

Summary:
Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later.

A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too.

Differential Revision: D82052721

Copy link

pytorch-bot bot commented Sep 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14128

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 4 Cancelled Jobs, 2 Unrelated Failures

As of commit 0b0880b with merge base 66639e4 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D82052721

Copy link

github-actions bot commented Sep 9, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.


def to_bytes(self) -> bytes:
"""Returns the binary representation of the Manifest. Written
bottom up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, explain why it's written bottom up

"Returns the binary representation of the Manifest. Written bottom up to allow for BC considerations. The compatibility-preserving way to make changes is to increase the header's length field and add new fields at the top. This means we can always check the last n bytes for the magic and size, and then load the full footer."

+self.padding_size.to_bytes(4, byteorder=_MANIFEST_BYTEORDER)
# uint32_t: Size of this manifest. This makes it easier to add new
# fields to this header in the future. Always use the proper size
# (i.e., ignore self.length) since there's no reason to create an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is self.length used for?

@JacobSzwejbka
Copy link
Contributor Author

JacobSzwejbka commented Sep 9, 2025

Oh the padding isnt needed since the alignment doesnt matter since we reconstruct byte by byte anyway. Ill remove

@GregoryComer
Copy link
Member

Is there a strong reason to not include this in the core AOT code (as opposed to a dedicated extension?). I don't have a super strong opinion on this, but I do worry about the growing number of fine-grained extensions being confusing to users and compromising UX.

JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Sep 10, 2025
Summary:

Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later.

A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too.

Differential Revision: D82052721
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D82052721

JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Sep 10, 2025
Summary:

Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later.

A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too.

Differential Revision: D82052721
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D82052721

JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Sep 10, 2025
Summary:
Pull Request resolved: pytorch#14128

Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later.

A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too.

Differential Revision: D82052721
@JacobSzwejbka
Copy link
Contributor Author

Is there a strong reason to not include this in the core AOT code (as opposed to a dedicated extension?). I don't have a super strong opinion on this

I could put the aot stuff in core and put the runtime reader in extension? The modularity is a feature for embedded. We shouldnt expose so many options in mobile builds which is why we have the presets.

JacobSzwejbka added a commit to JacobSzwejbka/executorch-1 that referenced this pull request Sep 10, 2025
Summary:

Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later.

A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too.

Differential Revision: D82052721
Summary:
Pull Request resolved: pytorch#14128

Add some infra for us to optionally add some key structured data to the end of a pte. This diff is around enabling users to easily tag their model with a cryptographic signature. Has room to expand later.

A key design motivation is it would be ideal if this is transparent to the rest of the extensions we have today. Im claiming the prime footer real estate for this which is unused today by anything in tree. This should let it be composable with other formats like bundledProgram too.

Differential Revision: D82052721
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D82052721

Comment on lines +11 to +12


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a docblock about an example usage of manifest file for higher layer consumers. Also mention that manifest is a mechanism, not a security policy. And explicitly say that consumers implements appropriate security for their threat model

# 1. Generate PTE file
pte_data = serialize_pte_binary(program)

# 2. Create cryptographic signature of PTE data
signature = sign_with_private_key(pte_data, private_key)  # e.g., RSA, ECDSA

# 3. Append manifest with signature
manifest = Manifest(signature=signature)
pte_with_manifest = append_manifest(pte_data, manifest)

Verification Process

# 1. Extract manifest from end of file
manifest = Manifest.from_bytes(file_data)

# 2. Extract PTE data (using program_offset)
pte_data = file_data[:-(manifest_length + padding)]

# 3. Verify signature with public key
is_valid = verify_signature(pte_data, manifest.signature, public_key)

# Unique ID for the data the manifest was appended to. Often this might contain
# a crytographic signature for the data.
signature: bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine people can use this for other use-cases besides security, such as saving arbitrary serializable metadata.

For instance, saving tokenizer.json file location etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine people can use this for other use-cases besides security, such as saving arbitrary serializable metadata.

It wasnt really the intent. I chose this impl here because I wanted a really light weight way to attach security information or other core metadata about the pte.

If we want it to store arbitrary user defined things like a json then I dont really think appending to the .pte is the correct solution, just shove it all in a zip would be my opinion.

# Unique ID for the data the manifest was appended to. Often this might contain
# a crytographic signature for the data.
signature: bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add timestamp field too?


EXPECTED_MAGIC: ClassVar[bytes] = b"em00"

MAX_SIGNATURE_SIZE: ClassVar[int] = 512
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can just do one load at runtime. Instead of 2 loads or a stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

512 should also cover the vast majority of cryptographic signature algorithms I saw.

return data

@staticmethod
def from_bytes(data: bytes) -> "_ManifestLayout":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large files you have to read the whole thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add more methods like from file that just load the last MAX_SIZE bytes. Or stream.

Copy link
Contributor Author

@JacobSzwejbka JacobSzwejbka Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont really expect people to be verifying the signature in python though. Its mostly just there for testing.

# Unique ID for the data the manifest was appended to. Often this might contain
# a crytographic signature for the data.
signature: bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this?

  @dataclass
  class Manifest:
      type: str  # "signature", "checksum", "metadata", etc.
      version: int
      payload: bytes 
      timestamp: Optional[int]
      attributes: Dict[str, str] 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is version the version of the manifest struct or user specified?

If a user wanted to have multiple things then would you expect them to daisy chain manifests?

What is attributes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants