Unsupported Inner products of Multivariate Normal Random Vectors to model a positive semidefinite matrix #234

h-spiess · 2024-02-26T14:22:54Z

h-spiess
Feb 26, 2024

Hey,

I'm new to RxInfer and somewhat new to Practical Bayesian Modeling. I want to implement a model for a positive semi-definite matrix that has been proposed in this paper: https://ieeexplore.ieee.org/document/5495105. As this paper is behind a paywall I will attach a screenshot of the model definition. The hyper parameters b, c, d are set to very small values 10e-6, a depends on the matrix size N with a = 0.05 * N. x_ij are the entries of the PSD matrix / kernel matrix.

I've implemented this model in Turing (hard to sample from and very slow) and wanted to try RxInfer, the code is at the end of this post. Unfortunately, I have encountered some issues that I can't resolve myself due to my lacking understanding of RxInfer and the implemented inference methods.

Some of the issues I have encountered:

For the diagonal of the PSD matrix, one has to compute the inner products of the phi_i's with themselves. This does not seem to work.
Ignoring these diagonal for now, and without constraints it complains about a missing rule for Marginalisation with the dot product:

ERROR: RuleMethodError: no method matching rule for the given arguments

Possible fix, define:

@rule typeof(dot)(:out, Marginalisation) (m_in1::MvNormalMeanCovariance, m_in2::MvNormalMeanCovariance, meta::MatrixCorrectionTools.ReplaceZeroDiagonalEntries{TinyHugeNumbers.TinyNumber}) = begin 
    return ...
end

Adding constraints where every parameter is independent should turn the inference into Variational inference in my understanding. This will give the following error:

ERROR: `ProductOf` object cannot be used as a functional form in inference backend. Use form constraints to restrict the functional form of marginal posteriors.

In the paper the authors say that they infer this model using Variational Bayes by maximizing the log joint posterior w.r.t. to the individual parameters while taking the expectations for the remaining ones. This sound like expectation propagation to me. Is that right? Any idea how I could get that working in RxInfer? Maybe you can refer me to an example which is similar enough such that I could start from there.

I really appreciate your help and hope that I can learn something from it.

#%%
using RxInfer
using LinearAlgebra
using StatsPlots

#%% Hierarchical model https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5495105
function norm_squared(ϕ)
    norm(ϕ)^2
end

@model function kernel_completion(N::Int)
    a = datavar(Float64)
    b = datavar(Float64)
    c = datavar(Float64)
    d = datavar(Float64)

    s = randomvar()
    s ~ InverseGamma(c, d)

    # precision parameters
    inv_α = randomvar(N)
    inv_α .~ InverseGamma(a, b)

    # features
    ϕ = randomvar(N)
    ϕ .~ MvNormalMeanCovariance(zeros(N), inv_α)

    kernel = datavar(Float64, N, N)
    mu_kernel = randomvar((N^2 ÷ 2) - N ÷ 2)
    n = 1
    for i=1:N, j=(i+1):N  # j can't be i:N, it complains about i == j
        mu_kernel[n] ~ dot(ϕ[i], ϕ[j])
        kernel[i, j] ~ NormalMeanVariance(mu_kernel[n], s)
        n += 1
    end
    # for i=1:N
    #     kernel[i, i] ~ NormalMeanVariance(norm_squared(ϕ[i]), s)
    # end
    return kernel
end

norm_squared_meta = @meta begin
    norm_squared() -> Linearization()
end

#%% Sample data
N = 10

a, b, c, d = 0.05*N, 10e-6, 0.5, 1

inv_α_true = rand(InverseGamma(a, 1/b), N)
inv_A_true = Diagonal(inv_α_true)
ϕ_true = rand(MvNormal(zeros(N), inv_A_true), N)

s_true = rand(InverseGamma(c, 1/d))
kernel = rand.(Normal.(ϕ_true * ϕ_true', sqrt(s_true)))

(eigen(kernel).values .>= 0) |> all

#%%
initmessages = (s = InverseGamma(c, d), inv_α = InverseGamma(a, b), ϕ = MvNormalMeanCovariance(zeros(N), ones(N)))

constraints = @constraints begin 

    q(s, inv_α, ϕ, mu_kernel, kernel) = q(s)q(inv_α)q(ϕ)q(mu_kernel)q(kernel)

    q(inv_α) = q(inv_α[begin])..q(inv_α[end])
    q(ϕ) = q(ϕ[begin])..q(ϕ[end])
    q(mu_kernel) = q(mu_kernel[begin])..q(mu_kernel[end])
    q(kernel) = q(kernel[begin])..q(kernel[end])
    
    # q(as) :: PointMass(starting_point = (args...) -> [ 1.0 ])
end

result = RxInfer.infer(
    model = kernel_completion(N), 
    data = (a=a, b=b, c=c, d=d, kernel=kernel),
    constraints = constraints,
    # meta = norm_squared_meta,
    returnvars = KeepLast(),
    initmarginals = initmessages,
    initmessages = initmessages,
)

Answered by wouterwln

Mar 8, 2024

Hey @h-spiess , I've uploaded the node you need in https://github.com/biaslab/DiagonalMvNormalNode. In experiments you can see a short demo of the node in action. In node.jl I have adapted the Gaussian Mixture node to your setting such that it accepts a variable number of inputs. In rules.jl I have written the inference rules for message computations and for that I needed a scaled-chisquared distribution. I haven't thoroughly tested this node though, I've only ran inference as you can see in experiments.jl and validated that the inferred results are close to the diagonal of the prec matrix I use to generate the data.

In the model specification you can see that you have to convert your ran…

View full answer

albertpod · 2024-02-28T09:43:40Z

albertpod
Feb 28, 2024
Maintainer

Hi, @h-spiess !

This is an easy fix. You should try using softdot instead of dot, as the message corresponding to the inner product between two Gaussians is not available in the closed form.

@HoangMHNguyen will take it from here to help you out ;)

4 replies

h-spiess Feb 28, 2024
Author

Thanks for your reply. I've found softdot before but I don't know what that is. It requires 3 arguments, one appears to be a Gamma RV. I would really appreciate it if you have any reference for this as I can't find something on google.

albertpod Feb 28, 2024
Maintainer

So softdot employs variational approximation of dot node, so the third parameter is precision (inverse variance) or softening factor. You can put a gamma prior on top of it, but some fixed value is easier.
mu_kernel[n] ~ softdot(ϕ[i], ϕ[j], 1e2)

h-spiess Feb 28, 2024
Author

Thanks, that explanation helps. However, all of the aforementioned error also arise for SoftDot instead of Dot. Maybe it's something else. Nevertheless, I really appreciate your help :)

albertpod Feb 29, 2024
Maintainer

Hi @h-spiess.
If I run your loop, e.g.

for i=1:N, j=(i+1):N 
 ...
      @show i, j
end

then i never reach 10. This leaves some of the randomvars unattached.

@HoangMHNguyen is working out the proper definition of this model within RxInfer.jl, but it indeed is a little more complex compared to what I thought in the beginning. He'll probably have a few clarification questions.

HoangMHNguyen · 2024-02-29T18:28:36Z

HoangMHNguyen
Feb 29, 2024
Collaborator

Hi @h-spiess !

ϕ .~ MvNormalMeanCovariance(zeros(N), inv_α) is wrong because inv_α must be a matrix. If I understand correctly, you want to use a diagonal matrix here. If so, you can use MvNormalMeanScalePrecision(zeros(N), α) where α is from Gamma distribution. If you change this and use softdot node then you can run your model.
I also spotted that you couldn't define i=j. You can fix this by using the function norm_squared (that you've already defined) as follows

if i==j 
     mu_kernel[n] ~ norm_squared(ϕ[i]) where {meta =UT() }
else
     mu_kernel[n] ~ softdot(ϕ[i], ϕ[j], tiny) where {q = MeanField() }
end

Also check size/dimensionality of variables in your model. The inference throws a lot of warning about unused variables.

Hope this's helpful! :)

3 replies

h-spiess Mar 4, 2024
Author

Hey, thanks a lot for your help. I guess I don't understand how to adapt my code according to your answer. I've tried different options and all still fail.

function norm_squared(ϕ)
    norm(ϕ)^2
end

@model function kernel_completion(N::Int)
    a = datavar(Float64)
    b = datavar(Float64)
    c = datavar(Float64)
    d = datavar(Float64)

    s = randomvar()
    s ~ Gamma(c, d)

    # precision parameters
    α = randomvar(N)
    α .~ Gamma(a, b)

    # features
    ϕ = randomvar(N)
    ϕ .~ MvNormalMeanPrecision(zeros(N), α)

    N_triu = (N^2 ÷ 2) + N ÷ 2
    kernel_triu = datavar(Float64, N_triu)
    mu_kernel = randomvar(N_triu)
    n = 1
    for i=1:N, j=i:N
        if i == j
            mu_kernel[n] ~ norm_squared(ϕ[i])
        else
            mu_kernel[n] ~ softdot(ϕ[i], ϕ[j], tiny) # last param is precision for softening and could have a hyperparameter
        end
        kernel_triu[n] ~ NormalMeanPrecision(mu_kernel[n], s)
        n += 1
    end
    return kernel_triu
end

norm_squared_meta = @meta begin
    # norm_squared() -> Linearization()
    norm_squared() -> Unscented()
end

initmessages = (s = vague(Gamma), α = vague(Gamma), ϕ = MvNormalMeanPrecision(zeros(N), ones(N)))

constraints = @constraints begin 

    # q(s, α, ϕ, mu_kernel, kernel_triu) = q(s)q(α)q(ϕ)q(mu_kernel)q(kernel_triu)
    q(s, α, ϕ, mu_kernel) = q(s)q(α)q(ϕ)q(mu_kernel)

    q(α) = q(α[begin])..q(α[end])
    q(ϕ) = q(ϕ[begin])..q(ϕ[end])
    q(mu_kernel) = q(mu_kernel[begin])..q(mu_kernel[end])
    # q(kernel_triu) = q(kernel_triu[begin])..q(kernel_triu[end])
    
    # q(as) :: PointMass(starting_point = (args...) -> [ 1.0 ])
end

kernel_triu = kernel[triu(Bool.(ones(size(kernel)...)))]

result = RxInfer.infer(
    model = kernel_completion(N), 
    data = (a=a, b=b, c=c, d=d, kernel_triu=kernel_triu),
    constraints = constraints,
    meta = norm_squared_meta,
    returnvars = KeepLast(),
    initmarginals = initmessages,
    initmessages = initmessages,
)

fails with ERROR: `ProductOf` object cannot be used as a functional form in inference backend. ...

Now the one "closest" to your answer.

function norm_squared(ϕ)
    norm(ϕ)^2
end

@model function kernel_completion(N::Int)
    a = datavar(Float64)
    b = datavar(Float64)
    c = datavar(Float64)
    d = datavar(Float64)

    s = randomvar()
    s ~ Gamma(c, d)

    # precision parameters
    α = randomvar(N)
    α .~ Gamma(a, b)
	# or with corresponding line in constrains removed
	α = randomvar()
    α ~ Gamma(a, b)

    # features
    ϕ = randomvar(N)
    ϕ .~ MvNormalMeanScalePrecision(zeros(N), α)

    N_triu = (N^2 ÷ 2) + N ÷ 2
    kernel_triu = datavar(Float64, N_triu)
    mu_kernel = randomvar(N_triu)
    n = 1
    for i=1:N, j=i:N
        if i == j
            mu_kernel[n] ~ norm_squared(ϕ[i]) where {meta = UT()}
        else
            mu_kernel[n] ~ softdot(ϕ[i], ϕ[j], tiny) where {q = MeanField()} # last param is precision for softening and could have a hyperparameter
        end
        kernel_triu[n] ~ NormalMeanPrecision(mu_kernel[n], s)
        n += 1
    end
    return kernel_triu
end

initmessages = (s = vague(Gamma), α = vague(Gamma), ϕ = MvNormalMeanScalePrecision(zeros(N), 1.0))

constraints = @constraints begin 

    # q(s, α, ϕ, mu_kernel, kernel_triu) = q(s)q(α)q(ϕ)q(mu_kernel)q(kernel_triu)
    q(s, α, ϕ, mu_kernel) = q(s)q(α)q(ϕ)q(mu_kernel)

    q(α) = q(α[begin])..q(α[end])
    q(ϕ) = q(ϕ[begin])..q(ϕ[end])
    q(mu_kernel) = q(mu_kernel[begin])..q(mu_kernel[end])
    # q(kernel_triu) = q(kernel_triu[begin])..q(kernel_triu[end])
    
    # q(as) :: PointMass(starting_point = (args...) -> [ 1.0 ])
end

kernel_triu = kernel[triu(Bool.(ones(size(kernel)...)))]

result = RxInfer.infer(
    model = kernel_completion(N), 
    data = (a=a, b=b, c=c, d=d, kernel_triu=kernel_triu),
    constraints = constraints,
    # meta = norm_squared_meta,
    returnvars = KeepLast(),
    initmarginals = initmessages,
    initmessages = initmessages,
)

Complains about ERROR: MethodError: no method matching iterate(::MvNormalMeanScalePrecision{Float64, Vector{Float64}}). I think this has to do with:

rand(MvNormalMeanScalePrecision(zeros(N), 1.0)) giving me ERROR: MethodError: no method matching length(::MvNormalMeanScalePrecision{Float64, Vector{Float64}}).

I'm on RxInfer.jl version 2.16.2

Thanks :)

HoangMHNguyen Mar 4, 2024
Collaborator

Hi @h-spiess !

In the first model, you change ϕ .~ MvNormalMeanPrecision(zeros(N), α) to ϕ .~ MvNormalMeanScalePrecision(zeros(N), α), because the precision in MvNormalMeanPrecision must be a matrix.
In the second model, the error comes from the initialization of ϕ in initmessage. I suggest you just use ϕ = MvNormalMeanPrecision(zeros(N), ones(N)) for the initialization.

Hope this helps :)

h-spiess Mar 5, 2024
Author

Hey. Thanks a lot for your answer. This works :)

I still have some questions about why it works:

I've wanted to use MvNormalMeanPrecision because it treats a vector of precisions as a diagonal matrix.

julia> size(α_true)
(10,)

julia> mvnmp = MvNormalMeanPrecision(zeros(N), α_true);

julia> mvnmp.Λ == Diagonal(α_true)
true

You recommended to use MvNormalMeanScalePrecision but this wants a scalar for the precision and not a vector:

julia> MvNormalMeanScalePrecision(zeros(N), α_true)
ERROR: MethodError: no method matching MvNormalMeanScalePrecision(::Vector{Float64}, ::Vector{Float64})

Closest candidates are:
  MvNormalMeanScalePrecision(::M, ::T) where {T<:Real, M<:AbstractVector{T}}
   @ ReactiveMP ~/.julia/packages/ReactiveMP/AjKNY/src/nodes/mv_normal_mean_scale_precision.jl:7

Stacktrace:
 [1] top-level scope
   @ REPL[9]:1

julia> MvNormalMeanScalePrecision(zeros(N), α_true[1])
MvNormalMeanScalePrecision{Float64, Vector{Float64}}(
μ: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
γ: 2.268049330259847
)

Why does it work in the model? And does is really give N vectors of ϕ of dim. N, which each have a precision matrix which is a Diagonal matrix of α (N Gamma RV's)? I have the suspicion that the ϕ-vectors now only have a scalar precision (i.e. same entries along the Diagonal). Is there a way to check? I really appreciate your help. Thanks. I would even add the working models as an example to RxInfer if you like.

wouterwln · 2024-03-05T14:32:17Z

wouterwln
Mar 5, 2024
Maintainer

Hey @h-spiess , thanks for trying out RxInfer.jl!

I think there's also some confusion you should be aware of in this model definition. In your code, you try to use the broadcasting operator in Julia:

ϕ .~ MvNormalMeanPrecision(zeros(N), α)

The semantics you imply are: I have a vector of size N that I construct with zeros and a vector α of size N, and I want to construct a vector ϕ that are all independently distributed according to a multivariate Gaussian distribution with the same parameters (zero mean vector and a precision matrix with α along the diagonal). However, this is not what is implied with the broadcasting operator in Julia. The broadcasting operator disjoins the collections used as arguments and applies the function independently. As an example:

broadcastable(x, y) = x + y
in_1 = [1.0, 2.0, 3.0]
in_2 = [2.0, 3.0, 4.0]
broadcastable.(in_1, in_2)

returns you the following:

3-element Vector{Float64}:
 3.0
 5.0
 7.0

So, when you write ϕ .~ MvNormalMeanPrecision(zeros(N), α), you actually say the following: I want you to create N individual "Multivariate" normal distributions, where each is actually of dimensionality 1, having 0 as mean and α[i] as precision. This is also where your error originates: The inference engine takes a single Gamma distributed random variable and links this up as the precision for the MvNormalMeanPrecision node. This node emits a Wishart message back to the precision, which RxInfer then attempts to multiply with the existing Gamma prior to obtain a posterior (hence your error).

To create the factor graph that you want to create:

for i in 1:N
    ϕ[i] ~ MvNormalMeanPrecision(zeros(N), α)
end

However, there's one more caveat: You now specify a model where we have a MvNormalMeanPrecision node with a lot of variables attached as it's precision. Now, in RxInfer we don't have a node that can shoot messages back to the entries of the diagonal of a precision matrix, unfortunately. You are more than welcome to attempt to derive these messages and implement a node that has these messages as interfaces, but I think it's not an easy task. If you have any more questions, feel free to ask!

3 replies

h-spiess Mar 7, 2024
Author

Hey. Thanks for your precise answer.

Usually I'm quite familiar with broadcasting in Julia. I think I just got confused by the DSL. But thanks for clarifying.

It's a bit disappointing that the model I want to specify is way more complicated than I thought originally. It looks so simple from my unexperienced view.

Is RxInfer only missing a node for diagonal precision matrices or for precision matrices in general? But maybe I should have a look at message passing again to see what is required for such a node. The original paper describes p(α_n | ϕ, a, b) analytically, because Gamma is a conjugate prior for a Gaussian. Will that help? Or can you recommend a resource on message passing that is close to the implementation in RxInfer? Thanks :)

wouterwln Mar 7, 2024
Maintainer

RxInfer has a node for precision matrices in general! The MvNormalMeanPrecision node would do that. The difference between what that node does and what you want to do is the following:

If I have a variable representing a precision matrix (let's assume that it's Wishart distributed because it makes our lives easier because of conjugacy), I can attach this to a MvNormalMeanPrecision node and do message passing. The MvNormalMeanPrecision node will now send Wishart messages to this (matrix valued) variable. Because Wishart has nice properties that are not important for this example, we can multiply both Wishart messages originating from both sides and obtain a nice and proper posterior. An important thing to notice here is that, even though our precision matrix is a matrix, it is a single random variable. This is how we represent stuff under the hood in RxInfer, because if we see this matrix as an atomic random variable, we can send and receive messages from it and obtain a posterior over this matrix. This is a subtle difference to see, but it is essential in the efficiency of RxInfer. We now have 3 random variables: one representing your mean vector, one representing the precision matrix and one representing your outcome vector.

What you have is slightly different, you have a collection of random variables that you want to group into a matrix and then let this be the precision of your multivariate gaussian. Under the hood in RxInfer, this grouping of variables into a matrix actually doesn't matter. All it is interested in is seeing how your random variables are connected in order to do message passing. Hence, your MvNormalMeanPrecision now sees N+2 random variables attached to it: 1 mean vector, 1 output vector and N variables that somehow relate to the precision. The message passing engine will request the rule "What is the backwards message towards the n'th entry of the diagonal of the precision matrix, given this datapoint". We don't really have this node in RxInfer, simply because no-one in our team has needed a node like this.

I would, and here's where the speculation in my argument starts, I haven't checked the math yet, think that the backwards message towards these diagonal entries is actually a marginalisation of the Wishart distribution that would actually be sent if the full matrix would be attached. This intuitively makes sense because you're implicitly modelling your variables to be uncorrelated, since you only allow the diagonal of your precision matrix to contain nonzero entries. Luckily for you, marginalization of a Wishart distribution along it's diagonals gives you a nice and proper Chi-squared distribution (see https://stats.stackexchange.com/questions/10006/marginal-distribution-of-the-diagonal-of-an-inverse-wishart-distributed-matrix).

Since this is nontrivial in RxInfer, and it's a fun exercise, me and @mhidalgoaraya will sit together tomorrow to see if we can implement this node for you in RxInfer. Your inference might not be perfect, as the variance of your chisq backwards distribution is fixed by the dimensionality of your data, but that is a matter of modelling decision rather than an inference problem ;) We'll probably get back to you tomorrow!

h-spiess Mar 8, 2024
Author

Hey, Thanks for this awesome explanation. I really appreciate your help and look forward to your implementation. I don't know if it is helpful, but it appears that the python package bayespy which is also based on message passing has a diagonal wishart matrix from a vector of gamma rv's implemented (https://github.com/bayespy/bayespy/blob/develop/bayespy/inference/vmp/nodes/gamma.py). So in that sense, your speculation seems to be correct.

wouterwln · 2024-03-08T15:34:15Z

wouterwln
Mar 8, 2024
Maintainer

Hey @h-spiess , I've uploaded the node you need in https://github.com/biaslab/DiagonalMvNormalNode. In experiments you can see a short demo of the node in action. In node.jl I have adapted the Gaussian Mixture node to your setting such that it accepts a variable number of inputs. In rules.jl I have written the inference rules for message computations and for that I needed a scaled-chisquared distribution. I haven't thoroughly tested this node though, I've only ran inference as you can see in experiments.jl and validated that the inferred results are close to the diagonal of the prec matrix I use to generate the data.

In the model specification you can see that you have to convert your randomvar to a Tuple in order for the node to accurately construct, I guess this is the only inconvenience that pops up, otherwise your node is ready to go. I'd advise you to use mean-field constraints around this node, just to be sure, since I've only implemented mean-field rules. Good luck with your inference and I hope you enjoy using RxInfer!

1 reply

h-spiess Mar 11, 2024
Author

Wow, that's awesome. Thanks for your effort. The implementation looks non-trivial to me with all the RxInfer.jl specific methods and is definitely not something I could have come up with myself. I will try it in the next days and will let you know when it works :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReactiveBayes

Unsupported Inner products of Multivariate Normal Random Vectors to model a positive semidefinite matrix #234

{{title}}

Replies: 4 comments 11 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

ReactiveBayes

Unsupported Inner products of Multivariate Normal Random Vectors to model a positive semidefinite matrix #234

h-spiess Feb 26, 2024

Replies: 4 comments · 11 replies

albertpod Feb 28, 2024 Maintainer

h-spiess Feb 28, 2024 Author

albertpod Feb 28, 2024 Maintainer

h-spiess Feb 28, 2024 Author

albertpod Feb 29, 2024 Maintainer

HoangMHNguyen Feb 29, 2024 Collaborator

h-spiess Mar 4, 2024 Author

HoangMHNguyen Mar 4, 2024 Collaborator

h-spiess Mar 5, 2024 Author

wouterwln Mar 5, 2024 Maintainer

h-spiess Mar 7, 2024 Author

wouterwln Mar 7, 2024 Maintainer

h-spiess Mar 8, 2024 Author

wouterwln Mar 8, 2024 Maintainer

h-spiess Mar 11, 2024 Author

h-spiess
Feb 26, 2024

Replies: 4 comments 11 replies

albertpod
Feb 28, 2024
Maintainer

h-spiess Feb 28, 2024
Author

albertpod Feb 28, 2024
Maintainer

h-spiess Feb 28, 2024
Author

albertpod Feb 29, 2024
Maintainer

HoangMHNguyen
Feb 29, 2024
Collaborator

h-spiess Mar 4, 2024
Author

HoangMHNguyen Mar 4, 2024
Collaborator

h-spiess Mar 5, 2024
Author

wouterwln
Mar 5, 2024
Maintainer

h-spiess Mar 7, 2024
Author

wouterwln Mar 7, 2024
Maintainer

h-spiess Mar 8, 2024
Author

wouterwln
Mar 8, 2024
Maintainer

h-spiess Mar 11, 2024
Author