-
Notifications
You must be signed in to change notification settings - Fork 429
Add ZeroInflatedPoisson distribution #1393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
daac4e0
ee4539a
89544ee
aa02051
bb6f3cd
74bf23c
5b1ff99
0455c59
c1e6c8d
63a912a
2619337
424b3d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ import StatsBase: kurtosis, skewness, entropy, mode, modes, | |
|
||
import PDMats: dim, PDMat, invquad | ||
|
||
using SpecialFunctions | ||
using SpecialFunctions, LambertW | ||
|
||
import ChainRulesCore | ||
|
||
|
@@ -161,6 +161,7 @@ export | |
WalleniusNoncentralHypergeometric, | ||
Weibull, | ||
Wishart, | ||
ZeroInflatedPoisson, | ||
ZeroMeanIsoNormal, | ||
ZeroMeanIsoNormalCanon, | ||
ZeroMeanDiagNormal, | ||
|
@@ -239,6 +240,7 @@ export | |
quantile, # inverse of cdf (defined for p in (0,1)) | ||
qqbuild, # build a paired quantiles data structure for qqplots | ||
rate, # get the rate parameter | ||
excessprob, # get the exess probability of zeros parameter (ZeroInflatedPoison) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if we should introduce and in particular export a new function here just for |
||
sampler, # create a Sampler object for efficient samples | ||
scale, # get the scale parameter | ||
scale!, # provide storage for the scale parameter (used in multivariate distribution mvlognormal) | ||
|
@@ -334,7 +336,7 @@ Supported distributions: | |
QQPair, Rayleigh, Skellam, Soliton, StudentizedRange, SymTriangularDist, TDist, TriangularDist, | ||
Triweight, Truncated, TruncatedNormal, Uniform, UnivariateGMM, | ||
VonMises, VonMisesFisher, WalleniusNoncentralHypergeometric, Weibull, | ||
Wishart, ZeroMeanIsoNormal, ZeroMeanIsoNormalCanon, | ||
Wishart, ZeroInflatedPoisson, ZeroMeanIsoNormal, ZeroMeanIsoNormalCanon, | ||
ZeroMeanDiagNormal, ZeroMeanDiagNormalCanon, ZeroMeanFullNormal, | ||
ZeroMeanFullNormalCanon | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
""" | ||
ZeroInflatedPoisson(λ, p) | ||
A *Zero-Inflated Poisson distribution* is a mixture distribution in which data arise from two processes. The first process is is a Poisson distribution, with mean λ, that descibes the number of independent events occurring within a unit time interval: | ||
```math | ||
P(X = k) = (1 - p) \\frac{\\lambda^k}{k!} e^{-\\lambda}, \\quad \\text{ for } k = 0,1,2,\\ldots. | ||
``` | ||
Zeros may arise from this process, an additional Bernoulli process, where the probability of observing an excess zero is given as p: | ||
```math | ||
P(X = 0) = p + (1 - p) e^{-\\lambda} | ||
``` | ||
As p approaches 0, the distribution tends toward Poisson(λ). | ||
```julia | ||
ZeroInflatedPoisson() # Zero-Inflated Poisson distribution with rate parameter 1, and probability of observing a zero 0.5 | ||
ZeroInflatedPoisson(λ) # ZeroInflatedPoisson distribution with rate parameter λ, and probability of observing a zero 0.5 | ||
params(d) # Get the parameters, i.e. (λ, p) | ||
mean(d) # Get the mean of the mixture distribution | ||
var(d) # Get the variance of the mixture distribution | ||
``` | ||
External links: | ||
* [Zero-inflated Poisson Regression on UCLA IDRE Statistical Consulting](https://stats.idre.ucla.edu/stata/dae/zero-inflated-poisson-regression/) | ||
* [Zero-inflated model on Wikipedia](https://en.wikipedia.org/wiki/Zero-inflated_model) | ||
* McElreath, R. (2020). Statistical Rethinking: A Bayesian Course with Examples in R and Stan (2nd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9780429029608 | ||
|
||
""" | ||
struct ZeroInflatedPoisson{T<:Real} <: DiscreteUnivariateDistribution | ||
λ::T | ||
p::T | ||
|
||
function ZeroInflatedPoisson{T}(λ::T, p::T) where {T <: Real} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you adapt your indentation to 4 spaces? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, will do. |
||
return new{T}(λ, p) | ||
end | ||
end | ||
|
||
function ZeroInflatedPoisson(λ::T, p::T; check_args = true) where {T <: Real} | ||
if check_args | ||
@check_args(Poisson, λ >= zero(λ)) | ||
@check_args(ZeroInflatedPoisson, zero(p) <= p <= one(p)) | ||
end | ||
return ZeroInflatedPoisson{T}(λ, p) | ||
end | ||
|
||
ZeroInflatedPoisson(λ::Real, p::Real) = ZeroInflatedPoisson(promote(λ, p)...) | ||
ZeroInflatedPoisson(λ::Integer, p::Integer) = ZeroInflatedPoisson(float(λ), float(p)) | ||
ZeroInflatedPoisson(λ::Real) = ZeroInflatedPoisson(λ, 0.0) | ||
ZeroInflatedPoisson() = ZeroInflatedPoisson(1.0, 0.0, check_args = false) | ||
|
||
@distr_support ZeroInflatedPoisson 0 (d.λ == zero(typeof(d.λ)) ? 0 : Inf) | ||
|
||
### Statistics | ||
|
||
mean(d::ZeroInflatedPoisson) = (1 - d.p) * d.λ | ||
|
||
var(d::ZeroInflatedPoisson) = d.λ * (1 - d.p) * (1 + d.p * d.λ) | ||
|
||
#### Conversions | ||
|
||
function convert(::Type{ZeroInflatedPoisson{T}}, λ::Real, p::Real) where {T<:Real} | ||
return ZeroInflatedPoisson(T(λ), T(p)) | ||
end | ||
|
||
function convert(::Type{ZeroInflatedPoisson{T}}, d::ZeroInflatedPoisson{S}) where {T <: Real, S <: Real} | ||
return ZeroInflatedPoisson(T(d.λ), T(d.p), check_args = false) | ||
end | ||
|
||
#### Parameters | ||
|
||
params(d::ZeroInflatedPoisson) = (d.λ, d.p,) | ||
partype(::ZeroInflatedPoisson{T}) where {T} = T | ||
|
||
rate(d::ZeroInflatedPoisson) = d.λ | ||
excessprob(d::ZeroInflatedPoisson) = d.p | ||
|
||
#### Evaluation | ||
|
||
function logpdf(d::ZeroInflatedPoisson, y::Real) | ||
lp = if iszero(y) | ||
logaddexp(log(d.p), log1p(-d.p) - d.λ) | ||
else | ||
log1p(-d.p) + logpdf(Poisson(d.λ), y) | ||
end | ||
return lp | ||
end | ||
|
||
function cdf(d::ZeroInflatedPoisson, x::Real) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function is type unstable (eg. if the parameters and |
||
pd = Poisson(d.λ) | ||
|
||
deflat_limit = -1.0 / expm1(d.λ) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should avoid harcoded |
||
|
||
if x < 0 | ||
out = 0.0 | ||
elseif d.p < deflat_limit | ||
out = NaN | ||
else | ||
out = d.p + (1 - d.p) * cdf(pd, x) | ||
end | ||
return out | ||
end | ||
|
||
# quantile | ||
function quantile(d::ZeroInflatedPoisson, q::Real) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function has the same problems as |
||
|
||
deflat_limit = -1.0 / expm1(d.λ) | ||
|
||
if (q <= d.p) | ||
out = 0 | ||
elseif (d.p < deflat_limit) | ||
out = convert(Int64, NaN) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will throw an |
||
elseif (d.p < q) & (deflat_limit <= d.p) & (q < 1.0) | ||
qp = (q - d.p) / (1.0 - d.p) | ||
pd = Poisson(d.λ) | ||
out = quantile(pd, qp) # handles d.p == 1 as InexactError(Inf) | ||
end | ||
return out | ||
end | ||
|
||
#### Fitting | ||
|
||
struct ZeroInflatedPoissonStats <: SufficientStats | ||
sx::Float64 # (weighted) sum of x | ||
p0::Float64 # observed proportion of zeros | ||
tw::Float64 # total sample weight | ||
Comment on lines
+119
to
+121
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The types seem a bit restrictive. |
||
end | ||
|
||
suffstats(::Type{<:ZeroInflatedPoisson}, x::AbstractArray{T}) where {T<:Integer} = ZeroInflatedPoissonStats( | ||
sum(x), | ||
mean(iszero, x), | ||
length(x) | ||
) | ||
|
||
# weighted | ||
function suffstats(::Type{<:ZeroInflatedPoisson}, x::AbstractArray{T}, w::AbstractArray{Float64}) where T<:Integer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
n = length(x) | ||
n == length(w) || throw(DimensionMismatch("Inconsistent array lengths.")) | ||
sx = 0. | ||
tw = 0. | ||
p0 = 0. | ||
for i = 1 : n | ||
@inbounds wi = w[i] | ||
@inbounds sx += x[i] * wi | ||
tw += wi | ||
@inbounds p0i = (x[i] == 0) * wi | ||
p0 += p0i | ||
end | ||
return ZeroInflatedPoissonStats(sx, p0, tw) | ||
end | ||
|
||
function fit_mle(::Type{<:ZeroInflatedPoisson}, ss::ZeroInflatedPoissonStats) | ||
m = ss.sx / ss.tw | ||
s = m / (1 - ss.p0) | ||
|
||
λhat = lambertw(-s * exp(-s)) + s | ||
phat = 1 - (m / λhat) | ||
|
||
return ZeroInflatedPoisson(λhat, phat) | ||
end | ||
|
||
function fit_mle(::Type{<:ZeroInflatedPoisson}, x::AbstractArray{T}) where T<:Real | ||
pstat = suffstats(ZeroInflatedPoisson, x) | ||
return fit_mle(ZeroInflatedPoisson, pstat) | ||
end | ||
|
||
function fit_mle(::Type{<:ZeroInflatedPoisson}, x::AbstractArray{T}, w::AbstractArray{Float64}) where T<:Real | ||
pstat = suffstats(ZeroInflatedPoisson, x, w) | ||
return fit_mle(ZeroInflatedPoisson, pstat) | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
|
||
ZeroInflatedPoisson <- R6Class("ZeroInflatedPoisson", | ||
inherit = DiscreteDistribution, | ||
public = list( | ||
names = c("lambda", "p"), | ||
lambda = NA, | ||
p = NA, | ||
initialize = function(lambda = 1, p = 0) { | ||
self$lambda <- lambda | ||
self$p <- p | ||
}, | ||
supp = function() { c(0, Inf) }, | ||
properties = function() { | ||
lam <- self$lambda | ||
p <- self$p | ||
list(rate = lam, | ||
excessprob = p, | ||
mean = (1 - p) * lam, | ||
var = lam * (1 - p) * (1 + p * lam) | ||
) | ||
}, | ||
pdf = function(x, log=FALSE) { | ||
VGAM::dzipois(x, self$lambda, pstr0 = self$p, log = log) | ||
}, | ||
cdf = function(x) { | ||
VGAM::pzipois(x, self$lambda, pstr0 = self$p) | ||
}, | ||
quan = function(v) { | ||
VGAM::qzipois(v, self$lambda, pstr0 = self$p) | ||
} | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would make sense to define this distribution in StatsFuns rather than using a separate package?
@jlapeyre Would you be OK with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there was a discussion about moving it to SpecialFunctions and there even exists a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes that's JuliaMath/SpecialFunctions.jl#84. Though it's quite outdated now and there have been new commits in LambertW since then.
@emfeltham Do you feel like reviving this PR (or opening a new one)? There seems to be lots of interest in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan (sorry, I just saw a new email ping) Yes, I am definitely OK with moving LambertW into another package. I think the appropriate package is indeed SpecialFunctions. I don't have a pressing interest in doing it myself at the moment. I'm not sure a new attempt at a PR wouldn't fizzle out as well ;)