Skip to content

feat!: [RFC] Preliminaries for linking by name #2143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

acl-cqc
Copy link
Contributor

@acl-cqc acl-cqc commented Apr 29, 2025

Not ready yet, but hoping for feedback ASAP; I expect this to be a prereq for #2108 as it defines how linking should operate.

  • Add link_name: Option<String> to FuncDefn (closes Decide on which functions are "public" #1752). This must be None if the FuncDefn isn't direct child of the Module root.
  • validate checks that the public children of a Module have unique names (also FuncDecls - i.e. don't allow exporting a FuncDefn and importing the same name)
    • TODO: test this
  • Use FuncDefns as entrypoints in dead code / dead function elimination, some breaking interface changes here
    • TODO: dead code elimination not really tested here (not at all for exported Consts)
    • It might/probably make(s) sense to use these in all dataflow analyses that care about entry points...e.g. that would include constant folding (i.e. default to joining inputs of public functions with Top unless explicitly disabled)
  • Various faffing around with factory methods and builder define_function methods
    • define_function_link_name is very long; maybe define_function_ln? (ln is the Linux linker after all!) Maybe also add ModuleBuilder::define_private_function
  • Mark FuncDefn as #[non_exhaustive]
  • TODO Also for now I have bodged the hugr-model import.rs (@zrho)
  • TODO: Update hugr-py

BREAKING CHANGE: FuncDefn has an extra field, recommend using new, new_public or new_private

@hugrbot
Copy link
Collaborator

hugrbot commented Apr 29, 2025

This PR contains breaking changes to the public Rust API.

cargo-semver-checks summary

--- failure struct_marked_non_exhaustive: struct marked #[non_exhaustive] ---

Description:
A public struct has been marked #[non_exhaustive], which will prevent it from being constructed using a struct literal outside of its crate. It previously had no private fields, so a struct literal could be used to construct it outside its crate.
      ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
     impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/struct_marked_non_exhaustive.ron

Failed in:
struct FuncDefn in /home/runner/work/hugr/hugr/PR_BRANCH/hugr-core/src/ops/module.rs:57
struct FuncDefn in /home/runner/work/hugr/hugr/PR_BRANCH/hugr-core/src/ops/module.rs:57

--- failure function_parameter_count_changed: pub fn parameter count changed ---

Description:
A publicly-visible function now takes a different number of parameters.
      ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
     impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/function_parameter_count_changed.ron

Failed in:
hugr_passes::remove_dead_funcs now takes 1 parameters instead of 2, in /home/runner/work/hugr/hugr/PR_BRANCH/hugr-passes/src/dead_funcs.rs:147

@acl-cqc acl-cqc force-pushed the acl/linking_prelims branch from 5788e0f to f15eccb Compare April 29, 2025 16:28
Copy link

codecov bot commented Apr 29, 2025

Codecov Report

Attention: Patch coverage is 85.16484% with 27 lines in your changes missing coverage. Please review.

Project coverage is 82.06%. Comparing base (5108297) to head (3e52b41).

Files with missing lines Patch % Lines
hugr-core/src/builder/dataflow.rs 59.09% 8 Missing and 1 partial ⚠️
hugr-core/src/hugr/validate.rs 60.86% 9 Missing ⚠️
hugr-passes/src/dead_code.rs 66.66% 4 Missing ⚠️
hugr-core/src/builder/build_traits.rs 87.50% 0 Missing and 2 partials ⚠️
hugr-core/src/builder/module.rs 90.90% 2 Missing ⚠️
hugr-passes/src/monomorphize.rs 96.77% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2143      +/-   ##
==========================================
- Coverage   82.10%   82.06%   -0.04%     
==========================================
  Files         230      230              
  Lines       40943    41014      +71     
  Branches    37044    37115      +71     
==========================================
+ Hits        33616    33660      +44     
- Misses       5363     5392      +29     
+ Partials     1964     1962       -2     
Flag Coverage Δ
python 85.63% <ø> (ø)
rust 81.69% <85.16%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@acl-cqc acl-cqc requested review from zrho and doug-q April 30, 2025 09:09
@doug-q
Copy link
Collaborator

doug-q commented Apr 30, 2025

I've not reviewed the code, but the description is so good I can give initial feedback from that"

So when a FuncDefn with public == true is "lifted" to a module the public flag "turns on"? This seems a bit dangerous. Lifting functions to modules is very normal and useful. I don't see a better path though. This is a case for emitting a warning (non-module-child has link name)

  • An alternative would be to have a separate field link_name: Option<String>. Preferences? Factory/builder methods might default to using the same string for both kinds of name. However, name then has no use in the code, so might be better replaced by name/description metadata....?

This is my preference. I expect the "String" to grow in complexity over time, linking is complicated. One might want to link two hugrs together, i.e. one uses a "public" function of the other, but that function doesn't want to be actually exposed as a public symbol after conversion to llvm. There is weak linking, ODR. I expect we will want more than zero of these features.

Yes name can be moved to metadata. Or it can stay where it is because it's Very Important Metadata.

Nice to have: it would be great if we were able to be unconcerned with naming generated functions (e.g. monomorphise)

  • Or even just make name an Option - all (Module-child) functions are exported unless they are anonymous?

I think there is benefit in maintaining the distinction between "link name" and "user-written-name".

  • validate checks that the public children of a Module have unique names (also FuncDecls - i.e. don't allow exporting a FuncDefn and importing the same name)
  • Duplicate "name"s should be allowed.
  • The link names of module child FuncDefns should be unique, and distinct from link names of module child FuncDecls
  • multiple FuncDecls with the same signature should be able to share link names (the two funcdecls are aliased)
  • It should be valid for FuncDecls to not have link names. Obviously you can't codegen this yet, but I expect such HUGRs to be useful.
  • TODO: this should also incorporate AliasDefn/AliasDecl
    We don't use these, let's delete them
  • It might/probably make(s) sense to use these in all dataflow analyses that care about entry points...e.g. that would include constant folding (i.e. default to joining inputs of public functions with Top unless explicitly disabled)

Yes, once we have a definition of public dataflow analysis should use this

  • Various faffing around with factory methods and builder define_function methods

Unclear whether this would be part of 0.16.0 release but I suspect "too late for that deadline".
.

  • TODO Also for now I have bodged the hugr-model import.rs (@zrho)
  • TODO: Update hugr-py

BREAKING CHANGE: FuncDefn has an extra field, recommend using new, new_public or new_private

Perhaps we should mark FuncDefn (and everything else?) #[non_exhaustive] ?

Do you think we should change "FuncDecls must be children of module" here?

I do not think we should attempt to link Consts. One can just write a function that returns the const. Yes we need const de-duplication, but we need it without linking too.

@acl-cqc
Copy link
Contributor Author

acl-cqc commented May 1, 2025

Thanks @doug-q, great points! :)

So when a FuncDefn with public == true is "lifted" to a module the public flag "turns on"? This seems a bit dangerous. Lifting functions to modules is very normal and useful.

Yes, good point. I've made it an error to have an exported FuncDefn that isn't at the top-level.

  • An alternative would be to have a separate field link_name: Option<String>. Preferences? Factory/builder methods might default to using the same string for both kinds of name. However, name then has no use in the code, so might be better replaced by name/description metadata....?

This is my preference. I expect the "String" to grow in complexity over time, linking is complicated. One might want to link two hugrs together, i.e. one uses a "public" function of the other, but that function doesn't want to be actually exposed as a public symbol after conversion to llvm. There is weak linking, ODR. I expect we will want more than zero of these features.

Yes name can be moved to metadata. Or it can stay where it is because it's Very Important Metadata.

Yup ok, I've kept name but not enforced uniqueness, and added optional link_name, with uniqueness requirement

Nice to have: it would be great if we were able to be unconcerned with naming generated functions (e.g. monomorphise)

Right. So monomorphise now mangles link_name only, and leaves name unchanged. (Tests need updating here.)

  • multiple FuncDecls with the same signature should be able to share link names (the two funcdecls are aliased)

yup, good point, done

  • It should be valid for FuncDecls to not have link names. Obviously you can't codegen this yet, but I expect such HUGRs to be useful.

This is the one I haven't really seen the use case for. Seems to me that whereas a FuncDefn has both name and optionally link-name, a FuncDecl has only (but always) link-name. What else does a FuncDecl do except open the way for linking?

Perhaps we should mark FuncDefn (and everything else?) #[non_exhaustive] ?

Yup, done

Do you think we should change "FuncDecls must be children of module" here?

Now that every Hugr has a module root, no, I think we can insist every FuncDecl lives there.

I do not think we should attempt to link Consts. One can just write a function that returns the const. Yes we need const de-duplication, but we need it without linking too.

Removed.

@acl-cqc acl-cqc force-pushed the acl/linking_prelims branch from 7fb8962 to aaeffa6 Compare May 12, 2025 17:52
@zrho
Copy link
Contributor

zrho commented May 14, 2025

I think it is important that functions, aliases, etc. can be referred to by a name. This is not immediately clear when just concentrating on writing the in-memory data structure of the compiler, but becomes much more apparent when considering other aspects, including but not limited to linking. These other aspects have been neglected so far, which I think can be felt very acutely. Names are in particular essential for the text format. That hugr-core has this design choice in contrast to every other IR known to me I think is caused by it being designed with only the in-memory representation in mind.

How about this: the current name becomes name_hint (or title or something like that) and is "Very Important Metadata". The proposed link_name becomes name. Both are optional. There are no rules for name_hint, but when a name is given then it must follow uniqueness/scoping rules. When exporting from core to model everything gets a name; if it didn't have one already, one is created from node id and name_hint. Stuff that is marked as public for linking must have a name in core. With that we can still avoid having to deal with the apparently intractable problem of alpha renaming in core, which appears to be the motivation behind the non-functional names so far, and at the same time acknlowedge that names are useful for more than just linking.

@acl-cqc
Copy link
Contributor Author

acl-cqc commented May 16, 2025

Thanks, both. I like Lukas' suggestion (rename "Metadata" name to title or even description; and link_name to name). That name must be provided when the thing is public is what I'm aiming for here ;) but @lukas do you mean you want to allow a (ex-link_)name even when not public?

Also it seems odd for metadata that is used only by humans to be compulsory. I guess (if there is no uniqueness requirement) one can provide an empty string (which, incidentally, would be another way to declare that a function has no link-name; empty string is, after all, not a legal name).

Finally @doug-q the idea that monomorphization leaves human-name unchanged but mangles link-name is kinda annoying in that monomorphization still acts on functions that have no link-name, but then the human-name does not give any way to tell them apart. I guess one then has to inspect the type of the function, which is much more cumbersome.

@zrho
Copy link
Contributor

zrho commented May 16, 2025

Yes, there's an argument to allow for names on non-public things. This keeps names as the general mechanism to refer to symbols such as functions, but allows to internally go for a nameless representation when convenient, e.g. while rewriting. The philosophy then would be that everything has a name, but for some things we don't care what it is.

The human readable title should be optional. If hugr-core had a nicer interface for metadata already, it probably wouldn't even need to be blessed metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decide on which functions are "public"
4 participants