Add ZeroInflatedPoisson distribution #1393

emfeltham · 2021-09-01T16:28:32Z

In the wake of a discussion over on StatisticalRethinkingTuring, I figured that it would be a good idea to put this together. It is used fairly extensively in the social sciences, and is probably not something that researchers should have to construct themselves. Thanks again.

dependencies for Lambert's W, Log/Exp functions. name change from from ZIPoisson to ZeroInflatedPoisson

changed default excess prob of zero to zero (to match R functions), added excessprob() function, correction to variance calculation

emfeltham · 2021-09-03T23:16:04Z

Hi again, I ran and added tests, and changed the code according to the test criteria. It should pass now, at least it does locally. Apologies, and thanks again.

mschauer · 2021-09-04T06:32:51Z

src/univariate/discrete/zeroinflatedpoisson.jl

+  λ::T
+  p::T
+
+  function ZeroInflatedPoisson{T}(λ::T, p::T) where {T <: Real}


Can you adapt your indentation to 4 spaces?

Sure, will do.

mschauer · 2021-09-04T06:38:53Z

Can’t we have the entire thing as actual MixtureDistribution of Dirac at 0 and Poisson, or make that work?

nalimilan · 2021-09-04T13:48:34Z

Project.toml

@@ -6,7 +6,9 @@ version = "0.25.15"
 [deps]
 ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
 FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
+LambertW = "984bce1d-4616-540c-a9ee-88d1112d94c9"


Maybe it would make sense to define this distribution in StatsFuns rather than using a separate package?

@jlapeyre Would you be OK with that?

IIRC there was a discussion about moving it to SpecialFunctions and there even exists a PR.

Ah yes that's JuliaMath/SpecialFunctions.jl#84. Though it's quite outdated now and there have been new commits in LambertW since then.

@emfeltham Do you feel like reviving this PR (or opening a new one)? There seems to be lots of interest in it.

@nalimilan (sorry, I just saw a new email ping) Yes, I am definitely OK with moving LambertW into another package. I think the appropriate package is indeed SpecialFunctions. I don't have a pressing interest in doing it myself at the moment. I'm not sure a new attempt at a PR wouldn't fizzle out as well ;)

src/univariate/discrete/zeroinflatedpoisson.jl

devmotion · 2021-09-05T00:45:20Z

src/Distributions.jl

@@ -239,6 +240,7 @@ export
    quantile,           # inverse of cdf (defined for p in (0,1))
    qqbuild,            # build a paired quantiles data structure for qqplots
    rate,               # get the rate parameter
+    excessprob,         # get the exess probability of zeros parameter (ZeroInflatedPoison)


I am not sure if we should introduce and in particular export a new function here just for ZeroInflatedPoisson.

src/univariate/discrete/zeroinflatedpoisson.jl

devmotion · 2021-09-05T00:53:17Z

src/univariate/discrete/zeroinflatedpoisson.jl

+  )
+
+# weighted
+function suffstats(::Type{<:ZeroInflatedPoisson}, x::AbstractArray{T}, w::AbstractArray{Float64}) where T<:Integer


The Float64 type parameter seems a bit restrictive.

devmotion · 2021-09-05T00:55:54Z

src/univariate/discrete/zeroinflatedpoisson.jl

+  return LL
+end
+
+function cdf(d::ZeroInflatedPoisson, x::Real)


This function is type unstable (eg. if the parameters and x are of type Float32). The best approach is to perform all calculations and just return zero(result) or oftype(result, NaN) if necessary for some values of x in the end.

devmotion · 2021-09-05T00:56:20Z

src/univariate/discrete/zeroinflatedpoisson.jl

+function cdf(d::ZeroInflatedPoisson, x::Real)
+  pd = Poisson(d.λ)
+
+  deflat_limit = -1.0 / expm1(d.λ)


We should avoid harcoded Float64 literals since they might cause unwanted promotions.

devmotion · 2021-09-05T00:56:53Z

src/univariate/discrete/zeroinflatedpoisson.jl

+end
+
+# quantile
+function quantile(d::ZeroInflatedPoisson, q::Real)


This function has the same problems as cdf.

devmotion · 2021-09-05T00:58:58Z

src/univariate/discrete/zeroinflatedpoisson.jl

+  if (q <= d.p)
+    out = 0
+  elseif (d.p < deflat_limit)
+    out = convert(Int64, NaN)


This will throw an InexactError. In general, one should also avoid hardcoding Int64 and use Int (defaults to Int64 on 64bit) if possible since Int64 can lead to a mixup of Int32 and Int64 on 32bit machines.

src/univariate/discrete/zeroinflatedpoisson.jl

codecov-commenter · 2021-09-05T07:54:30Z

Codecov Report

Merging #1393 (2619337) into master (39f9899) will decrease coverage by 0.91%.
The diff coverage is 5.12%.

❗ Current head 2619337 differs from pull request most recent head 424b3d4. Consider uploading reports for the commit 424b3d4 to get more accurate results

@@            Coverage Diff             @@
##           master    #1393      +/-   ##
==========================================
- Coverage   82.54%   81.63%   -0.92%     
==========================================
  Files         116      117       +1     
  Lines        6950     7001      +51     
==========================================
- Hits         5737     5715      -22     
- Misses       1213     1286      +73

Impacted Files	Coverage Δ
src/univariates.jl	`72.82% <ø> (ø)`
src/univariate/discrete/zeroinflatedpoisson.jl	`5.12% <5.12%> (ø)`
src/univariate/discrete/discretenonparametric.jl	`98.84% <0.00%> (-0.20%)`	⬇️
src/quantilealgs.jl	`82.41% <0.00%> (ø)`
src/mixtures/mixturemodel.jl	`69.60% <0.00%> (+1.56%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39f9899...424b3d4. Read the comment docs.

mschauer · 2021-09-05T10:43:31Z

So let’s make it ZeroInflated{Poisson}? For the same amount of code we get a couple of related zero inflated distributions, e.g ZeroInflated{NegativeBinomial}.

Co-authored-by: David Widmann <[email protected]>

azev77 · 2021-10-05T19:06:30Z

Building on @mschauer we can generically allow creating all kinds of transformations (Zero-inflated/truncated/modified):
(from my discourse post)

using Distributions
λ = 2.0; #Poisson parameter
zp = 0.4; #zero prob. 
dpoi = Poisson(λ)

dzip = MixtureModel([Dirac(0.0), dpoi], [zp, 1.0-zp])  #ZeroInflated ZIP
dztp = Truncated(dpoi, 1, Inf)                         #ZeroTruncated ZTP 
dzmp = MixtureModel([Dirac(0.0), dztp], [zp,1.0-zp])   #ZeroModified ZMP

# A function might look something like
ZeroInflated(d, zp) = MixtureModel([Dirac(0.0), d], [zp, 1.0-zp])
ZeroTruncated(d) = Truncated(d, 1, Inf)
ZeroModified(d, zp) = MixtureModel([Dirac(0.0), ZeroTruncated(d)], [zp,1.0-zp])

Btw, there are also one-inflated Binomial/Poisson/Beta etc which can be handled similarly.
This really shows the amazing power of Julia & Distributions.jl!

# x_inflated/x_truncated might look something like
XInflated(d, xp, x) = MixtureModel([Dirac(x), d], [xp, 1.0-xp])
XTruncated(d, x) = Truncated(d, x+1, Inf) # note cts/discrete issue +1...
XModified(d, xp, x) = MixtureModel([Dirac(x), XTruncated(d, x)], [xp,1.0-xp])

# 
ZeroInflated(d, zp) = XInflated(d, zp, 0.0)
ZeroTruncated(d) = XTruncated(d, 0.0)
ZeroModified(d, zp) = XModified(d, zp, 0.0)

We can look at ZeroInflatedDistributions.jl & LRMoE.jl
@jkbest2 & @sparktseung do you have any feedback on how to create Zero Inflated random variables?

sparktseung · 2021-10-05T20:34:55Z

@azev77

Currently in LRMoE.jl v0.2.0, zero-inflated distributions are implemented as a separate object. e.g. there is PoissonExpert(λ) and there is ZIPoissonExpert(p0, λ). I don't this is a good way and I may need to change it up in the future.
I think your example using the MixtureModel in Distributions.jl is much better and more maintainable.
For discrete distributions with zero inflation & modification, I'd suggest adding some notes in the documentation, to explicitly specify what is the actual zero probability. For zero-inflated Poisson, it should be p0+(1-p0)*exp(-λ). A lot of people tend forget about the second term.
For continuous distributions, some may have infinite density at zero, e.g. Gamma with shape<1. One should be careful about writing and interpreting pdf/logpdf in such case.

emfeltham added 11 commits August 31, 2021 13:39

add zeroinflatedpoisson.jl

daac4e0

added dependencies, changed name

ee4539a

dependencies for Lambert's W, Log/Exp functions. name change from from ZIPoisson to ZeroInflatedPoisson

added to docs

89544ee

remove various comments

aa02051

update project.toml

bb6f3cd

updated docstring

74bf23c

type fixes, replace LogPoisson with Poisson

5b1ff99

minor change to comment

0455c59

fix comment

c1e6c8d

correction to var, type chngs, default chng

63a912a

changed default excess prob of zero to zero (to match R functions), added excessprob() function, correction to variance calculation

added param combos to test, zero infl distr tests

2619337

mschauer reviewed Sep 4, 2021

View reviewed changes

nalimilan reviewed Sep 4, 2021

View reviewed changes

devmotion reviewed Sep 5, 2021

View reviewed changes

mschauer reviewed Sep 5, 2021

View reviewed changes

src/univariate/discrete/zeroinflatedpoisson.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

424b3d4

Co-authored-by: David Widmann <[email protected]>

Add ZeroInflatedPoisson distribution #1393

Are you sure you want to change the base?

Add ZeroInflatedPoisson distribution #1393

Uh oh!

Conversation

emfeltham commented Sep 1, 2021

Uh oh!

emfeltham commented Sep 3, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mschauer commented Sep 4, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Sep 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mschauer commented Sep 5, 2021

Uh oh!

azev77 commented Oct 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sparktseung commented Oct 5, 2021

Uh oh!

Uh oh!

codecov-commenter commented Sep 5, 2021 •

edited

Loading

azev77 commented Oct 5, 2021 •

edited

Loading