-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change .~ to use filldist rather than a loop #824
base: main
Are you sure you want to change the base?
Conversation
Same benchmark results as in #779 (comment):
The difference is so drastic I may need to double check this, but this definitely seems to help performance. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## release-0.35 #824 +/- ##
=============================================
Coverage 84.60% 84.60%
=============================================
Files 34 34
Lines 3832 3832
=============================================
Hits 3242 3242
Misses 590 590 ☔ View full report in Codecov by Sentry. |
This is often more performant as well. Note that using `~` rather than `.~` does change the internal storage format a bit: With `.~` `x[i]` are stored as separate variables, with `~` as a single multivariate variable `x`. In most cases this does not change anything for the user, but if it does cause issues, e.g. if you are dealing with `VarInfo` objects directly and need to keep the old behavior, you can always expand into a loop, such as | ||
This is often more performant as well. | ||
|
||
The new implementation of `x .~ ...` is just a short-hand for `x ~ filldist(...)`, which means that `x` will be seen as a single multivariate variable. In most cases this does not change anything for the user, with the one notable exception being `pointwise_loglikelihoods`, which previously treated `.~` assignments as assigning multiple univariate variables. If you _do_ want a variable to be seen as an array of univariate variables rather than a single multivariate variable, you can always expand into a loop, such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay to clearly state that each .~
and ~
defines a single random variable. For more flexible condition
'ing, I think you could try to support model | x = [missing, 1., 2., missing]
, which would allow users to condition on a subset of elements in x
but still treat x
as a single random variable. Again, please document this clearly in breaking changes.
Overall, @mhauru, this PR looks good, but let's avoid dependence on |
Is this the case? AFAIK julia> @model function demo()
x = Vector{Float64}(undef, 2)
x .~ Normal()
end
demo (generic function with 2 methods)
julia> model = demo()
Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DefaultContext}(demo, NamedTuple(), NamedTuple(), DefaultContext())
julia> DynamicPPL.typed_varinfo(model)
TypedVarInfo{@NamedTuple{x::DynamicPPL.Metadata{Dict{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}((x = DynamicPPL.Metadata{Dict{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}, Int64}, Vector{Normal{Float64}}, Vector{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}, Int64}(x[1] => 1, x[2] => 2), VarName{:x, Accessors.IndexLens{Tuple{Int64}}}[x[1], x[2]], UnitRange{Int64}[1:1, 2:2], [-0.9327222546433018, -1.3546249869727593], Normal{Float64}[Normal{Float64}(μ=0.0, σ=1.0), Normal{Float64}(μ=0.0, σ=1.0)], Set{DynamicPPL.Selector}[Set(), Set()], [0, 0], Dict{String, BitVector}("del" => [0, 0], "trans" => [0, 0])),), Base.RefValue{Float64}(-3.190366896228262), Base.RefValue{Int64}(0))
julia> keys(DynamicPPL.typed_varinfo(model))
2-element Vector{VarName{:x, Accessors.IndexLens{Tuple{Int64}}}}:
x[1]
x[2] Wasn't this the OG reason for going with the for-loop in #779 ? Or am I maybe misunderstanding what that statement meant? 👀 |
I now think we should drop |
I switched to @torfjelde, my bad, you're right. I had gotten myself confused as to how things used to work. (Thanks @penelopeysm who actually checked this for me before I got to it.) I guess the question then is, which is preferable: The loss of some performance as currently in #824, or the gain of performance but change to whether What are all the things that are affected by the multivariate/array of univariates distinction?
|
I would personally suggest that if we were to do such a change, we should do that separately. I get the motivation, but I think complete removal still be annoying for users. The functionality removed in what's about to be released isn't really seen in the wild, but I know there are tons of cases in the wild of dot tile statements to achieve these things (because I've recommended it to people on several occasions) 😕 |
@torfjelde: Do you have any insight as to whether the filldist-vs-loop thing in this PR could matter for end users? I get it would be a breaking change in DPPL since the varinfo has different keys, but if people are just calling |
Oh, sorry I didn't see Markus's comment. Yes conditioning and fixing would be different |
This is often more performant as well. Note that using `~` rather than `.~` does change the internal storage format a bit: With `.~` `x[i]` are stored as separate variables, with `~` as a single multivariate variable `x`. In most cases this does not change anything for the user, but if it does cause issues, e.g. if you are dealing with `VarInfo` objects directly and need to keep the old behavior, you can always expand into a loop, such as | ||
This is often more performant as well. | ||
|
||
The new implementation of `x .~ ...` is just a short-hand for `x ~ filldist(...)`, which means that `x` will be seen as a single multivariate variable. In most cases this does not change anything for the user, with the one notable exception being `pointwise_loglikelihoods`, which previously treated `.~` assignments as assigning multiple univariate variables. If you _do_ want a variable to be seen as an array of univariate variables rather than a single multivariate variable, you can always expand into a loop, such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new implementation of `x .~ ...` is just a short-hand for `x ~ filldist(...)`, which means that `x` will be seen as a single multivariate variable. In most cases this does not change anything for the user, with the one notable exception being `pointwise_loglikelihoods`, which previously treated `.~` assignments as assigning multiple univariate variables. If you _do_ want a variable to be seen as an array of univariate variables rather than a single multivariate variable, you can always expand into a loop, such as | |
The new implementation of `x .~ ...` is just a short-hand for `x ~ product_distribution(...)`, which means that `x` will be seen as a single multivariate variable. In most cases this does not change anything for the user, with the one notable exception being `pointwise_loglikelihoods`, which previously treated `.~` assignments as assigning multiple univariate variables. If you _do_ want a variable to be seen as an array of univariate variables rather than a single multivariate variable, you can always expand into a loop, such as |
All those that Markus listed and some more, e.g. |
Sounds good. |
I wrote the new implementation of
.~
as a loop over~
s in #804 because I was originally thinking of covering complex cases where the RHS is a multivariate variable. Now that we in fact restrict the RHS to be univariate, we could instead turn each.~
into a simple call tofilldist
. This should be more performant.That's what this PR does. Except I'm not done fixing
pointwise_logdensities
, and there's some strange test error that seems unrelated.The question is, in what ways does this change the semantics of
.~
, and is it a smaller or a bigger change to the old behaviour when.~
still had its owndot_tilde_obssume
pipeline.VarInfo
would now seex .~ Normal()
asx
being a single multivariate, which I think is what the old version did.pointwise_logdensities
used to see it as multiple univariates, which this PR would then have to change. Are there any other notable cases where this makes a difference?Thoughts, @yebai?