Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent handling of potentially build failing changes between different rust teams #145

Open
weiznich opened this issue Jan 12, 2025 · 24 comments
Labels
A-policy Area: Issues related to Project policy. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. I-council-nominated This issue nominated for discussion in a meeting. proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. T-leadership-council Team: Leadership Coucil

Comments

@weiznich
Copy link

Based on a suggestion from @oli-obk I would like to fill the following topic for discussion by the leadership council:

In several recent discussion with the cargo team I observed that from my point of view this team has a rather different approach to deal with potential build breaking changes than other teams. From previous interactions with the compiler and the language team I'm used to my feedback/concerns are recognized and mostly addressed with at least some future incompatibility warning. From a more broad look at how these two teams act this seems to be often the desired way to try to warn about potential code that might stop compiling in future versions/with different settings. Sometimes it even seems like certain features are blocked on figuring out a better diagnostic solution. A recent example here is the discussion around proc-macro diagnostics in rust-lang/rust#54140, which indicates that future steps to stabilization requires figuring out the how to improve user facing ways to suppress warning generated by proc-macros.

I personally feel that the cargo team has a different approach to handle such concerns. Based on my previous interactions with that team it feels for me like they are more interested in shipping features, instead of covering all edge cases. That's surely a good goal and teams should certainly have a large freedom over what they are allowed to decide to ship, but I personally feel that this becomes problematic as soon as it results in for some users hard to understand build failures caused by these new features. In my view this becomes questionable if these decisions are in direct contradiction with previous rust project wide decisions. The explanation I got privately so far from the cargo team indicates that it is hard to impossible to ship future compatibility warnings for the features in question.

I'm personally aware of at least the following problematic changes:

  • Change the default resolver to 2 with the 2021 edition, without providing a general warning mechanism for potential broken builds (there were warnings focused on specific crates, shipped by the compiler team)
  • The upcoming change of the default resolver version to 3 with the 2024 edition
  • The proposed new feature unification changes in RFC-3692

First of all I would to clarify that all of these features have there use case and there are good reasons for changing these defaults. I'm not proposing to roll back any of that, I rather want to emphasis that the cargo team should take more care around the rough edges there and follow the examples given by other teams. For each of the changes I presented examples of code that would fail to compile under the new proposed defaults/rules and that did not produce warnings with the old defaults. For each of these examples I essentially got the feedback from the cargo team that there is not much they are willing to do there.

From my understanding of what is allowed and what is not allowed to change via an edition based on what's described in RFC-2052 the first two changes violate these requirements as the RFC clearly state: "Warning-free code on edition N must compile on edition N+1 and have the same behavior.". Additionally the RFC also states that crates can opt in into the edition on their own pace. This is very much not the case for resolver related changes, as the final application crate decides which resolver version is used. To be fair here: The initial edition RFC seems to be written very much from a compiler/lang/libs team perspective and not from a cargo perspective with no way to ever incorporate these kind of changes. Additionally it would be much more transparent for the user that changes the edition at the top of the tree to see potential problems before they change the edition and then fill issues with upstream crates (as the compilation errors happen there).

That all written I want to end this comment with emphasizing again that I do not call for rolling back any of these changes, as it's up to the relevant teams to decide which features they ship and which they won't ship. What I would like to start with this comment is a project wide discussion around minimal standards for shipping such build breaking changes consciously.

@oli-obk oli-obk added A-policy Area: Issues related to Project policy. I-council-nominated This issue nominated for discussion in a meeting. labels Jan 12, 2025
@traviscross
Copy link
Contributor

traviscross commented Jan 19, 2025

Some personal thoughts on what might help us when taking up this potentially-challenging issue:

@weiznich: Since a set of alleged failings by the Cargo team is used as the sole motivation here, this could benefit, in my view, from quantitative and qualitative details about the effect of these changes on users. E.g., how many crates were affected and needed to migrate, how substantial was that migration effort, what mitigating or migration steps were taken by the Cargo team, and what mitigations or other steps do you feel were unreasonably not taken and how would those have affected, quantitatively and qualitatively, the cost to users? Is that analysis something you might be able to provide for our review?

@Manishearth / @rust-lang/devtools: Does T-devtools, the parent team, wish to handle this in some manner before it reaches the council? Otherwise, does the parent team wish to state for our benefit either its support or lack thereof for the reasonableness of the Cargo team's actions?

As I know this has already been a frustrating matter for the Cargo team in other venues, I won't ask for a response here to the claims or the framing, but the team should certainly have that opportunity, if it so wishes, so:

cc @rust-lang/cargo

@Manishearth
Copy link
Member

Generally, my feeling here is that different Rust artifacts may need to differ on different stability policies, and that's often fine. For example, "will not fail the build" is explicitly not a part of Clippy's stability policy, since it would completely freeze clippy in time, and clippy is mostly a tool you run on your own code.

The compiler, lang, and clippy teams do have an explicitly written down stability policy. As I understand it, no other teams do. I think @weiznich should bring up their concerns directly to the Cargo team first (unless they have done so already and it's not been linked here?) and the Cargo team can try to formulate a consistent position on this, ideally ending up in an RFC. I don't actually think bubbling it up to the council is the right call unless it cannot be resolved at the Cargo venue. I do not think we should have a norm of bubbling things up to the council when a team does something that someone disagrees with and it hasn't yet been discussed in public at the team level.

The devtools team is a thin umbrella team and as such I don't think is the appropriate venue for this discussion either, if it does need to bubble up the devtools team can help but council should be involved.

I do think there is interesting fodder for a council discussion here: what are the broad principles behind Rust's stability policy that we do want subteams to care about? For example, a general thread that is commonly found with Rust's overall attitude towards stability is that things can break but generally the final user of the code is empowered to fix things (except for future incompat breakages). There's a distinction between "someone else's code broke and I can't fix it" and "my code broke" or "someone else's code broke but I can fix it". Clippy builds breaking is an example of "my code broke". Resolver changes are also an example of "someone else's code broke but I can fix it".

The idea behind this general principle is to look at the point of stability, which is to avoid major headaches for the person triggering a build or whatever. Clippy broke? That's fine, that's a minor headache, your build is not blocked by clippy. Dependency suddenly stopped compiling? Not fine, you're forced to go fix that dependency. Updating the edition broke a dependency due to resolver changes? You're not forced to update the edition, this is an equivalent headache to that of updating a dependency and dealing with incompatible transitive deps.

It might be good to get that general idea written down so that teams can be encouraged to discuss and publish stability policies. If it turns out that teams are writing stability policies more liberal than the council desires, then that's a good time to start talking about consistency and stuff, but I think the way to start this is to let teams figure out their needs and try to RFC something first (perhaps with help from the council on general principles), rather than decisions at the council level.

@Manishearth
Copy link
Member

Also I do want to highlight @weiznich's comment about the edition policy not including Cargo from the get-go. Yeah, when we first planend the edition it was more from a lang/libs perspective, with some Cargo stuff but not much. If the Cargo team is penning a stability policy, it should also pen an edition policy as an addendum to the existing one. In general the idea of "lint for things that are breaking" is quite good to stick to once you have a definition of what is breaking, and "require autofixes for most cases" is also quite good.

Resolver changes are kinda interesting here because Cargo absolutely could lint "hey you're on an old resolver, set the resolver to 3", I'm not sure if cargo fix's infrastructure currently supports fixing that kind of thing, but it should probably be explored within the edition lint mechanisms. But since changing your default resolver only affects your own local builds, not users of the crates, it's fundamentally different from code-based edition changes where the change if improperly handled will break crate users. And perhaps that's enough to justify avoiding linting, but it's an explicit discussion Cargo should have if they RFC how editions work for them.

I do not think reverse dependencies opting in to these things should be considered breaking, however. I don't really see how the feature unification RFC is breaking. It lets crate users opt in to something new, that may break deps. That's the normal state of being: you can do that with a bad call to cargo update, or by pulling in a dep with incompatible version specifications, and when you do that you may ask your dependencies to change in certain ways, and all of that is fine.

@oli-obk
Copy link

oli-obk commented Jan 20, 2025

I don't actually think bubbling it up to the council is the right call unless it cannot be resolved at the Cargo venue.

To the best of my knowledge this is how it is from weiznich's perspective, which is why they want clarification with pretty much what you told them (different teams have different policies) (which I also stated). At some point individual team member statements do not suffice, and a team statement is warranted.

@weiznich
Copy link
Author

As @oli-obk already pointed out: I already raised this to the cargo team several times and I'm personally more than disappointed with their response.

I would like to address a few questions that are already raised to hopefully clear up some of the details of the previous discussions:


@traviscross

Since a set of alleged failings by the Cargo team is used as the sole motivation here, this could benefit, in my view, from quantitative and qualitative details about the effect of these changes on users. E.g., how many crates were affected and needed to migrate, how substantial was that migration effort, what mitigating or migration steps were taken by the Cargo team, and what mitigations or other steps do you feel were unreasonably not taken and how would those have affected, quantitatively and qualitatively, the cost to users? Is that analysis something you might be able to provide for our review?

It's hard for and outstanding person to give hard ecosystem wide numbers here. What I can do is to report the issues I personally encountered or that I'm aware of and give my assessment on the impact. Given that I only have a partial view of what the cargo team actually does and what reports they receive I would expect my view to be a rather incomplete and represent only a smaller fraction of similar issues.

I will list that for each change separately, as it differs based on the state of the relevant feature.

Change the default resolver to 2 with the 2021 edition, without providing a general warning mechanism for potential broken builds (there were warnings focused on specific crates, shipped by the compiler team)

This change broke the build of diesel 1.4.7 back then. There was no warning about this until I were made aware about that change due to a diesel user reporting a compilation failure in diesel using a beta or nightly compiler. Afterwards I implemented a fix for the broken build in diesel + raised a discussion at internals.rust-lang.org. As result of this discussion a targeted warning for diesel was implemented in cargo (rust-lang/cargo#9602). I already suggested to go with a more general warning about this case back then as could be seen by the comments on the PR.

This likely also affected other crates as indicated for example by this crater run: https://github.com/rust-lang/rust/pull/87190#issuecomment-905250803.t

Debugging such feature issues is still a problem in the ecosystem today, see for example https://users.rust-lang.org/t/why-doesnt-cargo-compile-with-a-union-of-features/120716 which affects other popular crates as well.

The upcoming change of the default resolver version to 3 with the 2024 edition

As pointed out during the stabilization this change can break the build of any dependency chain that contains a non-correct minimal version of a sub-dependency. Depending on the existence of a Cargo.lock file that might only happen after a cargo update run, or already after changing the edition. Now it's unquestionable a bug in the dependency to specify a not correct minimal version of a sub dependency, but that's something that was silently ignored in the past, even with no stable way to test for it. From my personal experience with testing minimal versions with the unstable -Z minimal-version even popular crates fail to declare the correct minimal version of their dependencies. Based on that I would expect this to effect rather a lot of crates.

The proposed new feature unification changes in rust-lang/rfcs#3692

That feature is not implemented yet. Given the proposed changes are similar to what the resolver="2" change did, I expect that to have a similar effect. I provided an example of a broken build during the RFC process. It's unclear how common this kind of built-failures would be without future research. During the RFC process I also suggested several times to address these broken builds with better diagnostics, which publicly got ignored by the team.

For me the common thing of all these changes are their failure mode: The user changes something in their workspace setting and then something breaks in a dependency. From my experience as diesel maintainer that often means these users will go and complain to the author of the crate with the build error, which requires these maintainers to spend quite a lot of time to explain these users that their configuration is causing this build problem. So from my point of view it looks like the cargo team is trying to avoid the work for implementing "proper" diagnostics for these cases in favor of just letting crate maintainers do that additional support work.


@Manishearth

Also I do want to highlight @weiznich's comment about the edition policy not including Cargo from the get-go. Yeah, when we first planend the edition it was more from a lang/libs perspective, with some Cargo stuff but not much. If the Cargo team is penning a stability policy, it should also pen an edition policy as an addendum to the existing one. In general the idea of "lint for things that are breaking" is quite good to stick to once you have a definition of what is breaking, and "require autofixes for most cases" is also quite good.

I would like to highlight this section as well. In my opinion it would have been the task to the cargo team to figure out exactly this stability policy and how it want's to handle edition changes since the last edition, as this is demonstrably a known issue. Currently it just feels like the cargo team doesn't have a policy on their own, but also ignores existing more general policies like the edition RFC.

But since changing your default resolver only affects your own local builds, not users of the crates, it's fundamentally different from code-based edition changes where the change if improperly handled will break crate users. And perhaps that's enough to justify avoiding linting, but it's an explicit discussion Cargo should have if they RFC how editions work for them.

I do not think reverse dependencies opting in to these things should be considered breaking, however. I don't really see how the feature unification RFC is breaking. It lets crate users opt in to something new, that may break deps. That's the normal state of being: you can do that with a bad call to cargo update, or by pulling in a dep with incompatible version specifications, and when you do that you may ask your dependencies to change in certain ways, and all of that is fine.

As written before: I kind of disagree about the assessment of breakage here. Sure it doesn't break all builds of all crates depending on for example diesel. Nevertheless the point is that from the perspective of the user that made that local change it breaks the build of something outside their perceived control and so they go to the maintainer of the crate that's broken from their perceptive and report an issue there. For sufficiently popular crates that effectively means they must support all possible build configurations, which might be unfortunately impossible due to other limitations (e.g there is no feature sharing between a proc-macro crate and the parent crate reexporting it, so any feature resolver change has the ability to break assumptions there).

I also want to highlight that this could easily lead to situations where a popular crate decides to drop support for resolver = "1" and relay on features from resolver="2", while different popular crate decides not to support resolver = "2" because the relay on features from resolver="1". That sounds like going straight to a ecosystem split where you cannot combine different parts seamlessly anymore. Fortunately that didn't happen back then, but I would like to emphasis that this is a possible result of such changes.

@oli-obk
Copy link

oli-obk commented Jan 20, 2025

I think the discussion here is not useful. The high level ask of the council is

"does the edition RFC apply to non-libs/lang/compiler teams in the way that weiznich believes that it does, or do teams make their own decisions as in this case the cargo team has stated". While I believe it's the latter, my personal opinion on its own doesn't matter here on its own, so the whole thing got escalated further.

@Manishearth
Copy link
Member

Manishearth commented Jan 20, 2025

As @oli-obk already pointed out: I already raised this to the cargo team several times

I think it's highly unproductive to bring up this conversation without this context from the get go, FWIW.

Nevertheless the point is that from the perspective of the user that made that local change it breaks the build of something outside their perceived control and so they go to the maintainer of the crate that's broken from their perceptive and report an issue there.

Yes, but there's a crucial difference: this happened due to a user's change, not a compiler change. The user changed their code, and then they needed a dependency fix. That is not at all an unusual process. The user isn't forced to change their code, they are able to wait out the fixes.

It happens right now with trait changes, and it happens with crate resolution already. It's quite normal: sometimes a user wants to do something and has to wait for some dependencies to be on board to get it.

The contrast with breakages is that "you can always update your toolchain" is a property that Rust really really cares about, and does not want that to be gated on fixing deps.

From my experience as diesel maintainer that often means these users will go and complain to the author of the crate with the build error, which requires these maintainers to spend quite a lot of time to explain these users that their configuration is causing this build problem.

I do recognize there is a perspective issue here, but to some extent IMO this is just part and parcel of being a maintainer. I've had people complain to me about "breakage" on my repos that is not breakage according to the Rust ecosystem norms, and ended up sinking a lot of time to explain to them that they were using the API incorrectly or making incorrect assumptions about its behavior. I don't think "this feels like a breakage to crate users" automatically makes it an actual stability breakage from Rust's perspective.

@oli-obk
Copy link

oli-obk commented Jan 20, 2025

That is my fault. I asked for the removal of something in the original text, but that contained the context, too

@Manishearth
Copy link
Member

Ah, okay, thanks!

@Manishearth
Copy link
Member

Manishearth commented Jan 20, 2025

Anyway, as a non council member, here are the next steps I would suggest, based on what I understand as the intent of the edition and stability policies (having helped work on parts of both):

  • Cargo starts work on an RFC detailing their own stability policy and how it interacts with the edition policy. It covers more detail on "action at a distance" issues with dependencies. Doesn't have to be an RFC, even documenting their current de facto policy would be useful, and if people wish to have public debate on it then people can open their own RFCs asking for change.
  • Council briefly discusses what bounds such policies need to stay within, if any, and what needs to happen for that to change. My suggestion is that there are no such bounds, the point of the RFC process is to figure them out, and council members can weigh in on the Cargo RFC if needed.
  • Council briefly discusses if any ongoing change in Cargo needs to be stopped. My suggestion is that the answer here is no, it appears to all fit into extant policy with ambiguity, and to me that is insufficient for emergency council intervention, a thing that our governance is not really designed for.
  • Council at some point documents principles behind the stability policy.
  • Council at some point encourages the rest of the Rust project to build stability policies

Remember, the teams are largely autonomous when it comes to running their projects. The council is not designed as the higher up that reviews and changes decisions made by a subteam unless things have gotten really bad. It can take a birds eye view and come up with cross cutting policies, or state opinions, but I don't actually think the current governance is designed for council to tell a subteam "no we think you should do this a different way" in any binding way.

@weiznich
Copy link
Author

I would like to reiterate again that I do not call for any changes to be stopped. I think having a RFC that details the stability policy of Cargo would already help.

What I personally would like to see discussed (at some point) in addition to the points @Manishearth wrote down is how change that might be perceived as breaking for whatever reason should be softened. From my perspective the compiler and the language team does a good job there by providing future compatibility lints/edition lints to point to problematic parts of the code. It might be meaningful to adopt as general recommendation, after there is likely an important reason why RFC's like the original edition RFC did choose to require such warnings.

Finally I would like to disagree with not putting any bounds on stability guarantees of the team policies. From an adversary stand point this would allow a team to basically do anything, so to choose a drastic unrealistic example: Nobody could stop the cargo team for dropping toml as file format for Cargo.toml files and replacing it with brainfuck if the team itself decides to do that. To be clear: I don't expect any team to do something like that, that's merely an extreme example to demonstrate the implications of literally no bounds. In my opinion there should be a coarse cross cutting policies that sets some bounds what's considered within and outside of the stability guarantee given by the overall project. One could even argue that such a cross cutting set of polices already exists with the edition RFC's and https://github.com/rust-lang/rfcs/blob/master/text/1122-language-semver.md. In that case it would be up to the teams own stability policies to detail how the existing framework should be applied to their unit of work.

@Manishearth
Copy link
Member

Manishearth commented Jan 20, 2025

From an adversary stand point this would allow a team to basically do anything

Yes, we trust our teams, our governance is built on that principle.

The idea is that when it comes to stewarding the artifacts they are responsible for, these teams know best, and we have existing processes designed to keep decisionmaking open in that context. If something extreme happens it is an anomaly and can be dealt with1, but that should only be for things that are an anomaly. The governance is not designed or equipped to deal with the situation where the council wishes to ask a team to make a different decision. Council is in charge of various cross cutting matters like communications and inter team relations, but it's not otherwise a "higher up". One of the main impetuses behind the new governance model was to deemphasize the central leadership team as being "above" other teams, it wasn't in the past either, but people did trip over that implication.

Footnotes

  1. Litmus test: when a team does something that more or less everyone else disagrees with, as opposed to something that reasonable people may agree or disagree on.

@weiznich
Copy link
Author

@Manishearth I think the fundamental disagreement point here is: Is the stability guarantee a cross team concern or not. For me it's closely related to communication and one of the Rust core values, therefore I would consider it as cross team concern. At least at the level of some basic guidelines that the Rust project is stable and what that means. I agree that the exact details (like adding new trait implementations is not considered to be breaking) are up to the individual teams, but I disagree about delegating everything to the teams.

For me personally that's also a matter of trust, especially that I already got the argument "it's better for the majority of users, so it's fine to break you" as argument a few times. If it turns out that stability is purely a thing teams can decide on their own on, that's for me the point where I substantially loose trust in the stability guarantee. I will then likely adjust diesel stability guarantees to exempt changes required to fix language/tooling build breaking changes from diesel stability guarantees. In the end that decision is up to the council.

@Manishearth
Copy link
Member

Yeah, I guess it's fair to ask if the council wishes to start having such cross cutting guidelines. We currently have none in the project around this kind of thing.

If it turns out that stability is purely a thing teams can decide on their own on, that's for me the point where I substantially loose trust in the stability guarantee. I will then likely adjust diesel stability guarantees to exempt changes required to fix language/tooling build breaking changes from diesel stability guarantees. In the end that decision is up to the council.

"adjust diesel's stability guarantees to exempt changes required ..." is exactly the right type of outcome from this IMO. It makes perfect sense for me for Diesel (and other crates) to not consider "your crate doesn't work with resolver 3 in my deptree" to be a high priority "breakage", though ideally Diesel should still consider it something that should be fixed at some pace. Again, this is the standard policy most crates already have for incompatible deps: people will come up to you and ask you to update your deps, and typically you do it (barring MSRV/etc concerns), but it's not a breakage either even though the impetus is fixing broken builds downstream. I don't see this as particularly a new problem.

Having a published stability policy from Rust and Cargo makes it easier for crates to make such determinations and cite the policy when it comes up, certainly.

IMO the question of trust is not that relevant: teams should publish stability policies, and then you can trust those, and build policies around that. The policy you suggest for Diesel does not, to me, read as a lack of trust in Rust's stability policy, but rather the logical separation of responsibilities when it comes to things breaking. When something breaks, it might be the responsibility of the person holding the thing, their dependencies, or the toolchain. In this case, Cargo appears to be stating that it is not their responsibility, and since Cargo is not declaring any antipatterns around this, it's not the responsibility of the dependencies either. So it's the responsibility of the person holding the thing.

@traviscross
Copy link
Contributor

traviscross commented Jan 21, 2025

Here's my concern. I don't think it's fair to the Cargo team for us to engage in a broad policy discussion on this topic under a cloud of specific unsettled claims against the Cargo team.

The narrative above claims that the Cargo team's actions are out of alignment with those of other teams, that the team is "more interested in shipping features, instead of covering all edge cases", that this resulted in "for some users hard to understand build failures caused by these new features", and that these actions "are in direct contradiction with previous rust project wide decisions".

If we want to have a broad policy discussion, we could do that at any time, and a separate issue for that could be filed. This issue is unmistakably putting forward that we should institute a policy so as to fix a particular alleged problem with the behavior of the Cargo team.

In deciding whether or not there is a problem, we should of course not second-guess details. The standard of review I'd suggest is to ask whether the Cargo team acted reasonably (that is, not unreasonably) and in reasonable alignment with our shared project values. If ever a top-level team (or one under a thin parent team) routinely weren't, that probably would be a matter for the council.

To me, the key ingredient to demonstrating such a failure in this case would be to show that these actions caused foreseeable widespread net harm and disruption to users to such a degree that it would contravene our values and make the actions of the team, in failing to avoid or mitigate that foreseeable harm, or in failing to recognize that harm as a problem and themselves take corrective steps, seem objectively unreasonable to us.

That's what I'm not yet seeing any evidence of, and if such evidence can't be put forward, then I'd like us to close this.

@Manishearth
Copy link
Member

Yes, I 100% agree. Unless a sub team has done something super egregious, this type of thing should not bubble up to the council in this way. I don't think it's particularly productive. Like I said, the council is not the manager of the teams, they are relatively autonomous around this.

It's totally fine for the council to have a discussion about the broader policy here, and they probably should, but I think that's up to them. I'm in favor of closing this issue.

@traviscross traviscross added the T-leadership-council Team: Leadership Coucil label Jan 21, 2025
@traviscross
Copy link
Contributor

traviscross commented Jan 21, 2025

For the reasons above, I propose:

@rfcbot fcp close

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 21, 2025

Team member @traviscross has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. labels Jan 21, 2025
@weiznich
Copy link
Author

weiznich commented Jan 21, 2025

I would like to begin this comment with noting that I explicitly wanted to not derail this discussion into specific incidences with the cargo team. My goal was to focus on what could be improved to avoid such situations in the future. That's the main reason why the initial issue is light on details what exactly happened. Now that it seems to be decided that it is explicitly not wanted to keep the discussion on the broader topic and that the council want's to focus on the specific incidents here let me add some more context. Please note that the following is my perspective as non-team member. Given that I feel that opinions from team members are higher valued than outsider perspectives: Please also note that I'm a long term community member, with a significant contribution history to the language, the compiler, and the overall ecosystem. Or to word that later part differently: Maybe take a step back and consider what it would mean for the ecosystem if it turns out that I'm a rough actor and compare that to e.g. the impacts a cargo team member would have.

That written I would like to flag the following content as concern(s):

To me, the key ingredient to demonstrating such a failure in this case would be to show that these actions caused foreseeable widespread net harm and disruption to users to such a degree that it would contravene our values and make the actions of the team, in failing to avoid or mitigate that foreseeable harm, or in failing to recognize that harm as a problem and themselves take corrective steps, seem objectively unreasonable to us.

What would be required to demonstrate such failure? Or to word it differently, why doesn't the following incidences not qualify as such failures?

To outline the actual impact of the changes mentioned above:

Change the default resolver to 2 with the 2021 edition, without providing a general warning mechanism for potential broken builds (there were warnings focused on specific crates, shipped by the compiler team)

This change modified the way cargo unifies features between build and target dependencies, which causes issues with proc-macro crates. The overall goal of this change was to make it easier to disable features for target dependencies, while they are used for build time dependencies. That's important for embedded targets, so there is a clear conflict of interest between different user groups here.

  • Broke diesel <= 1.4.7, which at that time was a popular common crate in widespread usage. The underlying issue was only noticed by non-team members and also addressed without much help or input from the crates.io team. The linked warning was only added in last minute after a long discussion whether this is
  • Later on it caused issues with other popular crates as for example shown by https://users.rust-lang.org/t/why-doesnt-cargo-compile-with-a-union-of-features/120716 (That involves sqlx and url, which are again popular common crates with widespread usage, the user asking this question is Henri Sivonen, which is also a long term rust contributor, so it does seem like this isn't something that's obvious even to users that "know" the language.)
  • The 2021 edition crater run also lists mime_guess as affected crate, which again is a popular common crate in widespread usage.

More importantly: RFC-3085 outlines the corner stones of the 2021 edition. It's in my opinion important to first highlight that this is an RFC that's scoped to the at that point existing core team. In my understanding that team was the direct precursor of todays leader ship council, which implies that this RFC contains the edition related policy that should be followed by all teams. Otherwise that would be mean that the core team/leader ship council is fine with teams directly ignoring decisions of the core team/leader ship council.
I would like to highlight a few sections of RFC-3085:

This ensures that the decision to migrate to a newer edition is a "private one" that the crate can make without affecting others, apart from the fact that it affects the version of rustc that is required, akin to making use of any new feature.

A resolver change is objectively outside this scope as it effects the whole dependency chain.

Edition changes cannot make arbitrary changes. First and foremost, they must always respect the constraint that crates in different editions can interoperate; this implies that the changes must be "skin deep".

This is objectively violated by a resolver change as again this does affect the whole dependency tree

90% of crater-seen crates migrate without manual intervention.
90% of crates that were migrated to the newer edition could be migrated in less than 1 hour

https://github.com/rust-lang/rust/pull/87190#issuecomment-905250803.t list 1840 regressions, roughly halve of them related to the resolver change. I did not find old package numbers from that time, but I'm not sure if that ratio holds. Especially the second part might be challenging given that migrating the resolver changes required significant work (figuring out which features exactly caused this, which still has bad diagnostics).

Overall I would conclude that this change:

  • Caused significant breakage as demonstrated by the affected large crates.
  • Is in direct conflict with a core team decision from that point in time without public recorded decision to from the core team to go forward with this

Now I personally have the opinion: If something happens once that might just happened by mistake, if it happens more than one there is likely a deeper problem. That brings me to the resolver = "3" change:

The upcoming change of the default resolver version to 3 with the 2024 edition

This change modifies the way cargo resolves dependency versions in such a way that it respects the declared rust-version of packages and the used rust version to build the package. Overall the goal is to improve the ecosystem compatibility to older rust versions.

This change has the potential to break any dependency chain that contains dependencies with imprecise minimal version constraints. Think of cases like serde = "1" instead of serde = "1.0.100", but then use a item that's only there in serde = "1.0.100". Now arguably this change only surfaces bugs in the corresponding crates, but it doesn't provide any way to test for this bug (the -Z minimal-versions flag that could be used is unstable), nor does it provide diagnostics for cases that might break (which could for example point to imprecise minimal version declarations, as they are almost always not what users want).

At this point it is in my opinion hard to provide actual numbers of how many crates are effected by this as this requires:

  • A relatively new rust version to actually use the new resolver implementation
  • At the same time an older rust version to trigger such breakages to trick the resolver into resolving to older crate versions.

I think it should be rather obvious that this will change in the future if the resolver becomes the the default.

To estimate how many crates might be affected by this I just went to crates.io and clicked through the ten newly uploaded crates at the time of writing this to get a random sample of crates to estimate how common imprecise minimal versions are. These crates are:

Out of these ten crates 3 don't provide a source link (art_library_SFU, robolox-rs-cli, danielq_art). The ketje crate is empty, the networker-rs crate doesn't have dependencies. Out of the other five crates the following crates do contain imprecise minimal version declarations:

So we are at two out of five crates that might be affected by this change. Obviously that's not a scientific measurement of all crate versions on crates.io, as there might be effects that shift that number in one or the other direction such as:

  • Older crates might use more imprecise versions as tools like cargo add were not widespread back in the days
  • More established crates might use less imprecise versions because somebody tested with -Z minimal-versions or otherwise found the problem.

I also want to write that not every imprecise version is directly causing a build failure, it's just a potential place for such an error to happen. In addition this breakage also requires other factors (using older rust version, etc), so it might be slightly less relevant.

I would argue that these numbers at least demonstrate that the issue exists and effects a significant number of crates. The exact number is up to discussion, but I would see the cargo team obliged to provide accurate numbers here. In my opinion it still would warrant to require better diagnostics for this cases.

Now the 2024 edition is as well as the 2021 edition policed by an RFC: RFC-3501

Again I would like to highlight certain parts of the RFC:

  • It's an leader ship council RFC, so in my understanding of the previous discussion this is a binding policy to all teams
  • It clearly states:

This RFC is an amendment to RFC 3085 to declare that the Rust Project intends to produce a new edition in 2024

So in my understanding anything in RFC-3085 that's not explicitly amendment applies to the 2024 edition as well. As previously discussed with the resolver = "2" changes this directly rules out any resolver changes as it does affect the whole dependency tree.

This time there exists an RFC that outlines the behavior of the new resolver: RFC-3537. It misses the required migration section, so it's hard to tell what's the expected migration story here.

Again this change seems to be in direct conflict with the boundaries set by RFC-3501 and RFC-3085.

As written before: If it happens once it might be a mistake, if it happens twice that's unlikely.

For me this boils down to the following question: Is the leader ship council aware of the fact that other teams ignore their policies and if that's the case what's even the point of writing such policies in the first place?

The third change in question is RFC-3692. From the impact this is closely related to the resolver = "2" change, but without changing the default behavior. The potential breakage if something changes this setting should be similar as far as I can asses that. This also includes the hard to debug part discussed earlier, which is one of the reasons why I asked for explicitly including the diagnostic requirement into the RFC.


After presenting the technical failure points I also would like to highlight a few communication related problems I had in the past with this topic. On the one hand that's not strictly related to the topic, on the other hand I feel that this might give more context and it might also fall under the topics the leadership council cares about (communication, moderation).

As other pointed out: I should raise this with the cargo team first. I did this, e.g here and here or here. I'm sorry to exaggerate this a bit, but all the responses I got from them are basically: No that's fine, that's no problem, without any public estimation how large the impact of that change could be or anything in that direction. I raised this issue again in rust-lang/rust#133541, but there someone with moderation power shut down the discussion quickly as "too heated" without any heated discussion happening. This still feels like a misuse of moderation power to me.

In other discussions cargo team members even brought up technical wrong arguments (like diesel uses mutually exclusive features, and therefore is expected to be broken) repeatably. It's not trust building if you must correct team members that make decisions on that basis several times, especially if the decision process of the team is opaque for an outsider and you only get told that your concerns are not valid. Again: If that happens once that might be just a mistake, but in that case the same argument was brought up by the same person several times, even after it was pointed out to be incorrect.

In context of the discussion around RFC-3692 I also want to highlight that I suggested relatively early in the discussion to address some of the concerns there by requiring better diagnostics. I've never got an official response from any team members to this. I got told by one of the team members in a private channel that they don't feel like taking that proposal serious. That was coupled with accusations to spread fake news, while I provided examples that exactly demonstrated what I brought up as concerns. Given that this is an Ad-homini attack, which I consider as something that doesn't have any place in technical discussions, this was reported to the moderation team by me, but I haven't got any response from them yet on actions taken. And to be clear here: I know more than well there is never enough time to do everything everyone wants, so there needs to be compromises on what's actually implemented. Part of this is communcating that certain things are just not possible due to capacity reasons or similar. I also know that discussions can sometimes escalate and not all people necessarily agree on the same decisions, but if you want to have a decision process that allows for outsiders to raise their opinions and feel them valued something seems to need changing at least for this particular team. It currently doesn't work for the cargo team and me (and likely others, as for example can be seen in rust-lang/cargo#1734, or https://internals.rust-lang.org/t/rust-projects-attitude-to-the-community/22192 to just link random discussions that came up today!).

@oli-obk
Copy link

oli-obk commented Jan 21, 2025

  • Is in direct conflict with a core team decision from that point in time without public recorded decision to from the core team to go forward with this

just an important side note: the council is not equivalent to the core team. We (the council) are aggregating up the information from the teams and handling things in their interest. We do not prescribe things, we ensure teams communicate and can effectively work together where they overlap.

@weiznich
Copy link
Author

weiznich commented Jan 21, 2025

just an important side note: the council is not equivalent to the core team.

I'm sorry to correct you here, but states this RFC-3392 differently.

It contains beside many other things the following explicit statement:

The Leadership Council serves as the successor to the core team in all capacities.

Source

So yes, the leadership council is not the core team, but all that the core team was + more.

We (the council) are aggregating up the information from the teams and handling things in their interest. We do not prescribe things, we ensure teams communicate and can effectively work together where they overlap.

RFC-3392 lists as duties of the leadership council:

Coordinating Project-wide changes to teams, structures, or processes.

Source

Releasing an new edition is clearly a project-wide process, and therefore it would allow the council to take actions there if it wants. To be fair here: The RFC also says that the council must delegate any additional work to whats listed in the RFC to other teams, so it is a valid argument saying that these decisions are delegated to the cargo team. In the end it boils down to: Is this still project level coordination or not, which likely ends up being one of that undecidable grey zone things where you can reasonably argue for both positions.

@Manishearth
Copy link
Member

Manishearth commented Jan 22, 2025

Couple thoughts

First, to give an idea of my perspective here: I was involved in the planning of the original 2018 edition, especially around the migration lint work. I was also on core during the 2021 edition, though not super involved. I did ensure that some of the problems from the 2018 edition were avoided by getting more explicit wording about edition RFCs needing to have solid migration plans. I am not currently involved in toplevel leadership: I head up the Clippy and Devtools teams, but as previously noted the Devtools team exists more to coordinate between teams, we don't really get involved in what individual teams are doing. I'm commenting here mostly because I think I have a fair amount of relevant background on editions.

Where and how this should be discussed

There's been a bit of back and forth here about what the focus of the issue should have been, and I think I see what caused that. Speaking for myself, I think it makes sense to discuss the possibility of broad cross cutting policy at the council level, but if it's motivated by something, then that background is necessary to understand the issue, and crucially the affected teams should agree that this should be discussed. In other words if a team disagrees with a proposal I do not think the council should be the "next step" to get that disagreement "reversed".

This is in part because I do not see the council deciding on a policy over the desires of a particular team. That's not how Rust's governance functions; the council could decide to pursue a new policy but it would be incorporating the needs of subteams.

This doesn't mean that there needs to be full agreement or anything to bubble something up to the Council: A likely outcome I see happening every now and then is a team going "We don't agree with you but also we think this needs wider discussion, let's ask the Council about this". So basically this type of issue I would expect to come from a team in part.

If a team is egregiously not following project practices: for example deciding on RFCs without paying heed to the "no new rationale" rule around FCPs, or otherwise doing Bad Things, that is something that might be brought up to the council, though I'd treat that as something closer to a moderation issue that should be done in private. We have levers for dealing with egregious situations, the default way things are handled, however, trusts our team members to be doing their duties appropriately.

I think that's why there's a bit of bouncing around about what this issue should be about: it's not the first case, since the team doesn't appear to be involved in the ask, and it does not appear to be the second case either from what is talked about here (and like I said, if it's the second case, it should not be discussed in public).

The "direction" of breakage

I think this is the crux of the issue at hand, and it's somewhat of a terminology issue.

From the edition RFC:

This ensures that the decision to migrate to a newer edition is a "private one" that the crate can make without affecting others, apart from the fact that it affects the version of rustc that is required, akin to making use of any new feature.

This statement, when it was written, was generally understood to be about crates depending on the crate that upgrades editions. It's a bit easy to see why, as previously noted that RFC was written from a lang/libs POV, where it just doesn't make sense for something to apply "the other way".

(Caveat: macros get weird around this. Rust editions have always been handwavy about macros)

The overall goal was that a dependency tree can update piecemeal instead of wholesale.

This is the core of this discussion: the "breakage" with the resolver is where a downstream crate opts into an edition and its dependencies break. That's different. And it's a pretty easy argument to make that, just like regular edition changes where you may need to edit code a bit to get stuff to compile, an edition change to Cargo may need you to tweak your dep specifications.

A lot of the RFC text you quote is reliant on this particular meaning of breakage.

I would also not say that resolver changes "objectively" split the ecosystem since the choice of resolver isn't a meaningful part of the uploaded artifact at all. Similar to a lockfile, a resolver is a toplevel choice the build makes, it does not affect crates that depend on the the current crate. There's an interpretation of "ecosystem split" where this counts, but I don't think that's the one that was used for designing editions.

Crates not working with "new features" is not the type of ecosystem split being talked about here, otherwise the introduction of features like async would also be considered to split the ecosystem. Choosing to use async also requires getting your dependencies on the same page, there are many changes one might make like that, including ones to Cargo.toml like updating specific dependencies. It's a pretty standard part of having dependencies: making changes to your code might require you to get things fixed in dependencies.

The main thing potentially not being followed by the resolver changes is that there is no automatic migration tooling as generally recommended. I put a bunch of work in the 2021 edition RFC to ensure that there was some stronger requirement for migration planning, but as a whole I don't think that panned out. @m-ou-se who drove the actual edition may have further comments as to how consistent we were about migration.

But! It's worth noting that the requirement is mostly that migration be considered and proportional help be provided. We realized early on during the design of editions that automatic migration is not always going to be possible (especially with macros, etc), and what we wanted was that migrations should be as automated as possible, with additional help provided to deal with situations where automatic stuff doesn't or can't work.

At least for resolver = 2, it does seem like there was quite a bit of work done on easing migration: the edition tool gives helpful information about how the build will change. That's pretty good: people upgrading their code will be told of potential issues with dep changes, and they can try to mitigate it, or hold off on updating til things get fixed upstream. Hopefully there's similar work being done for resolver = 3.

The "regressed: dependencies" category of the Crater regressions you note is also a matter of this breakage "direction" thing, it's a separate category in Crater for a reason. It's a useful signal, but still ultimately okay in this context.

I'll note: @m-ou-se is not a Cargo team member, and she stewarded the 2021 edition, including helping decide if the regressions are fine. It's not a case of the Cargo team having a different idea of breakage than everyone else, everyone involved appeared to be on the same page about whether the issues with the new resolver are okay.

Perhaps we should have a stronger standard for cases where changing the edition in your Cargo.toml changes dependency resolution in a way that causes dependencies to no longer compile. To me, that is solidly a question for the Cargo team.

@weiznich
Copy link
Author

Thanks for providing this background information from someone that was involved in designing the previous editions. I would like to comment on a few points.

This statement, when it was written, was generally understood to be about crates depending on the crate that upgrades editions. It's a bit easy to see why, as previously noted that RFC was written from a lang/libs POV, where it just doesn't make sense for something to apply "the other way".

That's totally understandable, but I think we also agree on that the current statement does not necessarily transport that understanding. And to be honest here: As explained before, that's not what I (and possibly a lot of other users) read there. What is one of my main critic points here is that this was not changed with RFC-3501 to clarify that, especially as we had very similar discussions on this before.

The overall goal was that a dependency tree can update piecemeal instead of wholesale.

I would still argue that this goal is violated by a resolver change. After all, by definition, the resolver affects the whole dependency tree. So you cannot update piecemeal, at least not without ensuring that all crates in your dependency tree are compatible.

Crates not working with "new features" is not the type of ecosystem split being talked about here, otherwise the introduction of features like async would also be considered to split the ecosystem. Choosing to use async also requires getting your dependencies on the same page, there are many changes one might make like that, including ones to Cargo.toml like updating specific dependencies. It's a pretty standard part of having dependencies: making changes to your code might require you to get things fixed in dependencies.

I disagree that async is even a comparable feature to the resolver changes here, for the following reasons:

  • You totally can mix async and sync code without causing compilation errors. It's fine to have async and sync dependencies in your dependency chain and there are tools that you can use in your crate to go from one to the other and back as often as you want.
  • At introduction async was a new feature at introduction, not like the resolver change which changes behavior of an existing feature. That implies that you need to do large changes to your code anyway to use it. In contrast editions (and therefore the resolver changes) are promoted as "small" changes, where you only need to run cargo fix + adjust a field in your Cargo.toml. That's in my opinion a completely different scope.
  • And most importantly the failure mode: If you end up with a sync and an async crate in your dependency chain nothing happens, other than that you need to figure out how to call the relevant functions. If you end up with a crate that only supports resolver = "2" and one that requires resolver = "1" you cannot use both, because one of them will always fail to compile. From my point of view that's exactly how you would define "ecosystem spilt", as you suddenly cannot mix and match dependencies on the latest rust version anymore without future considerations.

Or to word this more strongly: Given this definition it would be totally fine for me to detect resolver = "2" in diesel, refuse to compile with this settings and emit an error message that points people to the cargo issue tracker to raise an issue there?

At least for resolver = 2, it does seem like there was quite a bit of work done on easing migration: the edition tool gives helpful information about how the build will change. That's pretty good: people upgrading their code will be told of potential issues with dep changes, and they can try to mitigate it, or hold off on updating til things get fixed upstream. Hopefully there's similar work being done for resolver = 3.

As this is also mentioned by the linked migration guide I would like to highlight one from my point of view important part:

The 2.0 release of diesel (currently in development) does not have this problem as it has been restructured to not have this dependency requirement

The resolver = "2" change did nearly require the diesel team to issue a new major release just to workaround the changed feature unification.
I only managed to avoid that putting in a lot of work to figure out a workaround to avoid this kind of breakage. Anything that was suggested from the cargo team to fix that in diesel itself would have lead to a semver incompatible change. From my point of view updating to a new major dependency version doesn't simplify the migration for the downstream user, but rather makes things more complicated for them. Again the RFC's pointed out that easy migration was an explicit goal of any previous edition.

A similar case could be constructed for the resolver = "3" change where fixing the incompatibility would require bumping a *-sys crate (by definition public dependencies, as only one version of a *-sys crate could exist in your dependency tree) minimal version, which again is a semver major change.

At least for resolver = 2, it does seem like there was quite a bit of work done on easing migration: the edition tool gives helpful information about how the build will change. That's pretty good: people upgrading their code will be told of potential issues with dep changes, and they can try to mitigate it, or hold off on updating til things get fixed upstream. Hopefully there's similar work being done for resolver = 3.

That's exactly the thing I've asked the cargo team for and the response I got was essentially something between: "We don't care" and "We don't believe that's necessary". The relevant PR's are linked, go and have a look on your own, but at least I cannot see how I would interpret their statements as: "We are working on introducing better diagnostics" or anything that would make migrations easier. Ultimately that's the main reason for filling this issue initially. I even suggested ways to implement such a warning.

@Manishearth
Copy link
Member

I would still argue that this goal is violated by a resolver change. After all, by definition, the resolver affects the whole dependency tree. So you cannot update piecemeal, at least not without ensuring that all crates in your dependency tree are compatible.

You mostly can, though, it's only certain deptrees that have some issues. This is similar to the problems with macros and editions as well: editions aren't trying to be perfect around the goal of piecemeal-ness.

And I think the thing that makes this tricky is that unlike changes to Rust code, changes to Cargo.toml are about affecting your deptree. That's part of what makes this a bit different.

It's kind of clear to me that the general position of the Cargo team here, of many others in those discussions, and what I have myself assumed to be the case for ages, is that while the contents of dependencies are "dependencies", the behavior of dependency resolution itself is something that is controlled by the final crate, and if the final crate selects things that causes incompatible resolution, it's up to them to fix, even if it means they need to work with upstream crates. And yes, that does have the unfortunate social aspect that sometimes people see that as an upstream breakage, but educating your users is part and parcel of maintaining a package with well-designed guarantees. Sometimes you have to tell your users "no, this is not a guarantee we provide you", and see what routes you have to help them anyway. This is far, far, from a new dynamic.

That's exactly the thing I've asked the cargo team for and the response I got was essentially something between: "We don't care" and "We don't believe that's necessary".

Generally to be included in the edition the expectation is that thought be put into this and documented in the edition. It is ultimately fine for the thought to be: "we don't think this is necessary, here's why" (the "here's why" should go somewhere, but probably not in the migration docs). I haven't seen the entire discussion, so I haven't seen exactly what you're referring to, but tbh I can see a simple argument for "resolver 3 doesn't need migration hints": the change to resolver 3 has about the same impact as running cargo update (or cargo build on a lockfile-less crate), and it also only matters when you run cargo update.

The resolver = "2" change did nearly require the diesel team to issue a new major release just to workaround the changed feature unification.

That's unfortunate. Glad it could be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-policy Area: Issues related to Project policy. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. I-council-nominated This issue nominated for discussion in a meeting. proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. T-leadership-council Team: Leadership Coucil
Projects
None yet
Development

No branches or pull requests

5 participants