Skip to content

Add ZeroInflatedPoisson distribution #1393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

emfeltham
Copy link

In the wake of a discussion over on StatisticalRethinkingTuring, I figured that it would be a good idea to put this together. It is used fairly extensively in the social sciences, and is probably not something that researchers should have to construct themselves. Thanks again.

@emfeltham
Copy link
Author

Hi again, I ran and added tests, and changed the code according to the test criteria. It should pass now, at least it does locally. Apologies, and thanks again.

λ::T
p::T

function ZeroInflatedPoisson{T}(λ::T, p::T) where {T <: Real}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you adapt your indentation to 4 spaces?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

@mschauer
Copy link
Member

mschauer commented Sep 4, 2021

Can’t we have the entire thing as actual MixtureDistribution of Dirac at 0 and Poisson, or make that work?

@@ -6,7 +6,9 @@ version = "0.25.15"
[deps]
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
LambertW = "984bce1d-4616-540c-a9ee-88d1112d94c9"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense to define this distribution in StatsFuns rather than using a separate package?

@jlapeyre Would you be OK with that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC there was a discussion about moving it to SpecialFunctions and there even exists a PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that's JuliaMath/SpecialFunctions.jl#84. Though it's quite outdated now and there have been new commits in LambertW since then.

@emfeltham Do you feel like reviving this PR (or opening a new one)? There seems to be lots of interest in it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalimilan (sorry, I just saw a new email ping) Yes, I am definitely OK with moving LambertW into another package. I think the appropriate package is indeed SpecialFunctions. I don't have a pressing interest in doing it myself at the moment. I'm not sure a new attempt at a PR wouldn't fizzle out as well ;)

@@ -239,6 +240,7 @@ export
quantile, # inverse of cdf (defined for p in (0,1))
qqbuild, # build a paired quantiles data structure for qqplots
rate, # get the rate parameter
excessprob, # get the exess probability of zeros parameter (ZeroInflatedPoison)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we should introduce and in particular export a new function here just for ZeroInflatedPoisson.

)

# weighted
function suffstats(::Type{<:ZeroInflatedPoisson}, x::AbstractArray{T}, w::AbstractArray{Float64}) where T<:Integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Float64 type parameter seems a bit restrictive.

return LL
end

function cdf(d::ZeroInflatedPoisson, x::Real)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is type unstable (eg. if the parameters and x are of type Float32). The best approach is to perform all calculations and just return zero(result) or oftype(result, NaN) if necessary for some values of x in the end.

function cdf(d::ZeroInflatedPoisson, x::Real)
pd = Poisson(d.λ)

deflat_limit = -1.0 / expm1(d.λ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid harcoded Float64 literals since they might cause unwanted promotions.

end

# quantile
function quantile(d::ZeroInflatedPoisson, q::Real)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has the same problems as cdf.

if (q <= d.p)
out = 0
elseif (d.p < deflat_limit)
out = convert(Int64, NaN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw an InexactError. In general, one should also avoid hardcoding Int64 and use Int (defaults to Int64 on 64bit) if possible since Int64 can lead to a mixup of Int32 and Int64 on 32bit machines.

@codecov-commenter
Copy link

codecov-commenter commented Sep 5, 2021

Codecov Report

Merging #1393 (2619337) into master (39f9899) will decrease coverage by 0.91%.
The diff coverage is 5.12%.

❗ Current head 2619337 differs from pull request most recent head 424b3d4. Consider uploading reports for the commit 424b3d4 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1393      +/-   ##
==========================================
- Coverage   82.54%   81.63%   -0.92%     
==========================================
  Files         116      117       +1     
  Lines        6950     7001      +51     
==========================================
- Hits         5737     5715      -22     
- Misses       1213     1286      +73     
Impacted Files Coverage Δ
src/univariates.jl 72.82% <ø> (ø)
src/univariate/discrete/zeroinflatedpoisson.jl 5.12% <5.12%> (ø)
src/univariate/discrete/discretenonparametric.jl 98.84% <0.00%> (-0.20%) ⬇️
src/quantilealgs.jl 82.41% <0.00%> (ø)
src/mixtures/mixturemodel.jl 69.60% <0.00%> (+1.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39f9899...424b3d4. Read the comment docs.

@mschauer
Copy link
Member

mschauer commented Sep 5, 2021

So let’s make it ZeroInflated{Poisson}? For the same amount of code we get a couple of related zero inflated distributions, e.g ZeroInflated{NegativeBinomial}.

@azev77
Copy link
Contributor

azev77 commented Oct 5, 2021

Building on @mschauer we can generically allow creating all kinds of transformations (Zero-inflated/truncated/modified):
(from my discourse post)

using Distributions
λ = 2.0; #Poisson parameter
zp = 0.4; #zero prob. 
dpoi = Poisson(λ)

dzip = MixtureModel([Dirac(0.0), dpoi], [zp, 1.0-zp])  #ZeroInflated ZIP
dztp = Truncated(dpoi, 1, Inf)                         #ZeroTruncated ZTP 
dzmp = MixtureModel([Dirac(0.0), dztp], [zp,1.0-zp])   #ZeroModified ZMP

# A function might look something like
ZeroInflated(d, zp) = MixtureModel([Dirac(0.0), d], [zp, 1.0-zp])
ZeroTruncated(d) = Truncated(d, 1, Inf)
ZeroModified(d, zp) = MixtureModel([Dirac(0.0), ZeroTruncated(d)], [zp,1.0-zp])

Btw, there are also one-inflated Binomial/Poisson/Beta etc which can be handled similarly.
This really shows the amazing power of Julia & Distributions.jl!

# x_inflated/x_truncated might look something like
XInflated(d, xp, x) = MixtureModel([Dirac(x), d], [xp, 1.0-xp])
XTruncated(d, x) = Truncated(d, x+1, Inf) # note cts/discrete issue +1...
XModified(d, xp, x) = MixtureModel([Dirac(x), XTruncated(d, x)], [xp,1.0-xp])

# 
ZeroInflated(d, zp) = XInflated(d, zp, 0.0)
ZeroTruncated(d) = XTruncated(d, 0.0)
ZeroModified(d, zp) = XModified(d, zp, 0.0)

We can look at ZeroInflatedDistributions.jl & LRMoE.jl
@jkbest2 & @sparktseung do you have any feedback on how to create Zero Inflated random variables?

@sparktseung
Copy link

@azev77

  • Currently in LRMoE.jl v0.2.0, zero-inflated distributions are implemented as a separate object. e.g. there is PoissonExpert(λ) and there is ZIPoissonExpert(p0, λ). I don't this is a good way and I may need to change it up in the future.

  • I think your example using the MixtureModel in Distributions.jl is much better and more maintainable.

  • For discrete distributions with zero inflation & modification, I'd suggest adding some notes in the documentation, to explicitly specify what is the actual zero probability. For zero-inflated Poisson, it should be p0+(1-p0)*exp(-λ). A lot of people tend forget about the second term.

  • For continuous distributions, some may have infinite density at zero, e.g. Gamma with shape<1. One should be careful about writing and interpreting pdf/logpdf in such case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants