Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change .~ to use filldist rather than a loop #824

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mhauru
Copy link
Member

@mhauru mhauru commented Feb 27, 2025

I wrote the new implementation of .~ as a loop over ~s in #804 because I was originally thinking of covering complex cases where the RHS is a multivariate variable. Now that we in fact restrict the RHS to be univariate, we could instead turn each .~ into a simple call to filldist. This should be more performant.

That's what this PR does. Except I'm not done fixing pointwise_logdensities, and there's some strange test error that seems unrelated.

The question is, in what ways does this change the semantics of .~, and is it a smaller or a bigger change to the old behaviour when .~ still had its own dot_tilde_obssume pipeline. VarInfo would now see x .~ Normal() as x being a single multivariate, which I think is what the old version did. pointwise_logdensities used to see it as multiple univariates, which this PR would then have to change. Are there any other notable cases where this makes a difference?

Thoughts, @yebai?

@mhauru
Copy link
Member Author

mhauru commented Feb 27, 2025

Same benchmark results as in #779 (comment):

2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "linked" => Trial(2.291 μs)
	  "standard" => Trial(2.166 μs)
  "gradient" => 4-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "AutoReverseDiff(compile=true)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff [compiled]"]
		  "linked" => Trial(4.167 μs)
		  "standard" => Trial(4.083 μs)
	  "AutoForwardDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ForwardDiff"]
		  "linked" => Trial(1.982 ms)
		  "standard" => Trial(1.922 ms)
	  "AutoMooncake{Nothing}(nothing)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["AutoMooncake{Nothing}(nothing)"]
		  "linked" => Trial(591.167 μs)
		  "standard" => Trial(588.958 μs)
	  "AutoReverseDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff"]
		  "linked" => Trial(11.375 μs)
		  "standard" => Trial(11.459 μs)
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "linked" => Trial(16.291 μs)
	  "standard" => Trial(16.208 μs)
  "gradient" => 4-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "AutoReverseDiff(compile=true)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff [compiled]"]
		  "linked" => Trial(31.875 μs)
		  "standard" => Trial(31.875 μs)
	  "AutoForwardDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ForwardDiff"]
		  "linked" => Trial(343.613 ms)
		  "standard" => Trial(363.815 ms)
	  "AutoMooncake{Nothing}(nothing)" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["AutoMooncake{Nothing}(nothing)"]
		  "linked" => Trial(5.533 ms)
		  "standard" => Trial(5.456 ms)
	  "AutoReverseDiff()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff"]
		  "linked" => Trial(63.125 μs)
		  "standard" => Trial(62.000 μs)

The difference is so drastic I may need to double check this, but this definitely seems to help performance.

Copy link

codecov bot commented Feb 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.60%. Comparing base (e6a42dd) to head (67d3629).

Additional details and impacted files
@@              Coverage Diff              @@
##           release-0.35     #824   +/-   ##
=============================================
  Coverage         84.60%   84.60%           
=============================================
  Files                34       34           
  Lines              3832     3832           
=============================================
  Hits               3242     3242           
  Misses              590      590           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This is often more performant as well. Note that using `~` rather than `.~` does change the internal storage format a bit: With `.~` `x[i]` are stored as separate variables, with `~` as a single multivariate variable `x`. In most cases this does not change anything for the user, but if it does cause issues, e.g. if you are dealing with `VarInfo` objects directly and need to keep the old behavior, you can always expand into a loop, such as
This is often more performant as well.

The new implementation of `x .~ ...` is just a short-hand for `x ~ filldist(...)`, which means that `x` will be seen as a single multivariate variable. In most cases this does not change anything for the user, with the one notable exception being `pointwise_loglikelihoods`, which previously treated `.~` assignments as assigning multiple univariate variables. If you _do_ want a variable to be seen as an array of univariate variables rather than a single multivariate variable, you can always expand into a loop, such as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to clearly state that each .~ and ~ defines a single random variable. For more flexible condition'ing, I think you could try to support model | x = [missing, 1., 2., missing], which would allow users to condition on a subset of elements in x but still treat x as a single random variable. Again, please document this clearly in breaking changes.

@yebai
Copy link
Member

yebai commented Feb 27, 2025

Overall, @mhauru, this PR looks good, but let's avoid dependence on DistributionsAD. Hopefully, the DistributionsAD rules for ReverseDiff and ForwardDiff will be upstreamed later.

@torfjelde
Copy link
Member

torfjelde commented Feb 27, 2025

VarInfo would now see x .~ Normal() as x being a single multivariate, which I think is what the old version did

Is this the case? AFAIK VarInfo would see this as multiple univariate rvs, no? On the latest release I'm seeing

julia> @model function demo()
           x = Vector{Float64}(undef, 2)
           x .~ Normal()
       end
demo (generic function with 2 methods)

julia> model = demo()
Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DefaultContext}(demo, NamedTuple(), NamedTuple(), DefaultContext())

julia> DynamicPPL.typed_varinfo(model)
TypedVarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}((x = DynamicPPL.Metadata{Dict{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}, Int64}(x[1] => 1, x[2] => 2), VarName{:x, Accessors.IndexLens{Tuple{Int64}}}[x[1], x[2]], UnitRange{Int64}[1:1, 2:2], [-0.9327222546433018, -1.3546249869727593], Normal{Float64}[Normal{Float64}=0.0, σ=1.0), Normal{Float64}=0.0, σ=1.0)], Set{DynamicPPL.Selector}[Set(), Set()], [0, 0], Dict{String, BitVector}("del" => [0, 0], "trans" => [0, 0])),), Base.RefValue{Float64}(-3.190366896228262), Base.RefValue{Int64}(0))

julia> keys(DynamicPPL.typed_varinfo(model))
2-element Vector{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}}:
 x[1]
 x[2]

Wasn't this the OG reason for going with the for-loop in #779 ?

Or am I maybe misunderstanding what that statement meant? 👀

@yebai
Copy link
Member

yebai commented Feb 28, 2025

I now think we should drop .~ entirely in favour of ~. Discussions in #825

@mhauru
Copy link
Member Author

mhauru commented Feb 28, 2025

I switched to product_distribution and dropped the dep on DistributionsAD.

@torfjelde, my bad, you're right. I had gotten myself confused as to how things used to work. (Thanks @penelopeysm who actually checked this for me before I got to it.)

I guess the question then is, which is preferable: The loss of some performance as currently in #824, or the gain of performance but change to whether x .~ Normal() is treated as a multivariate or array of univariates as in this PR.

What are all the things that are affected by the multivariate/array of univariates distinction?

  • Conditioning
  • Fixing
  • pointwise_logdensity
  • What people see if they inspect a VarInfo directly (questionable whether this is considered part of the external interface, but some people are certaintly doing this)
  • Something else?

@torfjelde
Copy link
Member

I would personally suggest that if we were to do such a change, we should do that separately. I get the motivation, but I think complete removal still be annoying for users. The functionality removed in what's about to be released isn't really seen in the wild, but I know there are tons of cases in the wild of dot tile statements to achieve these things (because I've recommended it to people on several occasions) 😕

@penelopeysm
Copy link
Member

@torfjelde: Do you have any insight as to whether the filldist-vs-loop thing in this PR could matter for end users? I get it would be a breaking change in DPPL since the varinfo has different keys, but if people are just calling sample() and getting a Chains it won't make a difference.

@penelopeysm
Copy link
Member

Oh, sorry I didn't see Markus's comment. Yes conditioning and fixing would be different

This is often more performant as well. Note that using `~` rather than `.~` does change the internal storage format a bit: With `.~` `x[i]` are stored as separate variables, with `~` as a single multivariate variable `x`. In most cases this does not change anything for the user, but if it does cause issues, e.g. if you are dealing with `VarInfo` objects directly and need to keep the old behavior, you can always expand into a loop, such as
This is often more performant as well.

The new implementation of `x .~ ...` is just a short-hand for `x ~ filldist(...)`, which means that `x` will be seen as a single multivariate variable. In most cases this does not change anything for the user, with the one notable exception being `pointwise_loglikelihoods`, which previously treated `.~` assignments as assigning multiple univariate variables. If you _do_ want a variable to be seen as an array of univariate variables rather than a single multivariate variable, you can always expand into a loop, such as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The new implementation of `x .~ ...` is just a short-hand for `x ~ filldist(...)`, which means that `x` will be seen as a single multivariate variable. In most cases this does not change anything for the user, with the one notable exception being `pointwise_loglikelihoods`, which previously treated `.~` assignments as assigning multiple univariate variables. If you _do_ want a variable to be seen as an array of univariate variables rather than a single multivariate variable, you can always expand into a loop, such as
The new implementation of `x .~ ...` is just a short-hand for `x ~ product_distribution(...)`, which means that `x` will be seen as a single multivariate variable. In most cases this does not change anything for the user, with the one notable exception being `pointwise_loglikelihoods`, which previously treated `.~` assignments as assigning multiple univariate variables. If you _do_ want a variable to be seen as an array of univariate variables rather than a single multivariate variable, you can always expand into a loop, such as

@torfjelde
Copy link
Member

All those that Markus listed and some more, e.g. generated_quantities, predict.
Over the years we've often recommended writing two models, one for inference where you use something like filldist, and one for analysis where you use .~ or for loops. Given that this has been our response to many issues on these things, it feels like a bit of a rug-pull to unnecessarily drop that 😕 Moreover, we've always recommended against .~ for performance, so I feel less bad for changing the performance characteristics of it.

@mhauru
Copy link
Member Author

mhauru commented Feb 28, 2025

Given that this is more breaking than I initially thought, and more breaking than what is currently in #779, and that there's now a wholly new proposal in play of dropping .~ completely, would it make sense to release v0.35 as is in #779 right now, and leave any further changes to .~ for later?

@yebai
Copy link
Member

yebai commented Feb 28, 2025

release v0.35 as is in #779 right now, and leave any further changes to .~ for later?

Sounds good.

Base automatically changed from release-0.35 to master February 28, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants