-
Notifications
You must be signed in to change notification settings - Fork 429
Add ZeroInflatedPoisson distribution #1393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dependencies for Lambert's W, Log/Exp functions. name change from from ZIPoisson to ZeroInflatedPoisson
changed default excess prob of zero to zero (to match R functions), added excessprob() function, correction to variance calculation
Hi again, I ran and added tests, and changed the code according to the test criteria. It should pass now, at least it does locally. Apologies, and thanks again. |
λ::T | ||
p::T | ||
|
||
function ZeroInflatedPoisson{T}(λ::T, p::T) where {T <: Real} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you adapt your indentation to 4 spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
Can’t we have the entire thing as actual MixtureDistribution of Dirac at 0 and Poisson, or make that work? |
@@ -6,7 +6,9 @@ version = "0.25.15" | |||
[deps] | |||
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" | |||
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b" | |||
LambertW = "984bce1d-4616-540c-a9ee-88d1112d94c9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would make sense to define this distribution in StatsFuns rather than using a separate package?
@jlapeyre Would you be OK with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there was a discussion about moving it to SpecialFunctions and there even exists a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes that's JuliaMath/SpecialFunctions.jl#84. Though it's quite outdated now and there have been new commits in LambertW since then.
@emfeltham Do you feel like reviving this PR (or opening a new one)? There seems to be lots of interest in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan (sorry, I just saw a new email ping) Yes, I am definitely OK with moving LambertW into another package. I think the appropriate package is indeed SpecialFunctions. I don't have a pressing interest in doing it myself at the moment. I'm not sure a new attempt at a PR wouldn't fizzle out as well ;)
@@ -239,6 +240,7 @@ export | |||
quantile, # inverse of cdf (defined for p in (0,1)) | |||
qqbuild, # build a paired quantiles data structure for qqplots | |||
rate, # get the rate parameter | |||
excessprob, # get the exess probability of zeros parameter (ZeroInflatedPoison) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we should introduce and in particular export a new function here just for ZeroInflatedPoisson
.
) | ||
|
||
# weighted | ||
function suffstats(::Type{<:ZeroInflatedPoisson}, x::AbstractArray{T}, w::AbstractArray{Float64}) where T<:Integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Float64
type parameter seems a bit restrictive.
return LL | ||
end | ||
|
||
function cdf(d::ZeroInflatedPoisson, x::Real) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is type unstable (eg. if the parameters and x
are of type Float32
). The best approach is to perform all calculations and just return zero(result)
or oftype(result, NaN)
if necessary for some values of x
in the end.
function cdf(d::ZeroInflatedPoisson, x::Real) | ||
pd = Poisson(d.λ) | ||
|
||
deflat_limit = -1.0 / expm1(d.λ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid harcoded Float64
literals since they might cause unwanted promotions.
end | ||
|
||
# quantile | ||
function quantile(d::ZeroInflatedPoisson, q::Real) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has the same problems as cdf
.
if (q <= d.p) | ||
out = 0 | ||
elseif (d.p < deflat_limit) | ||
out = convert(Int64, NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will throw an InexactError
. In general, one should also avoid hardcoding Int64
and use Int
(defaults to Int64
on 64bit) if possible since Int64
can lead to a mixup of Int32
and Int64
on 32bit machines.
Codecov Report
@@ Coverage Diff @@
## master #1393 +/- ##
==========================================
- Coverage 82.54% 81.63% -0.92%
==========================================
Files 116 117 +1
Lines 6950 7001 +51
==========================================
- Hits 5737 5715 -22
- Misses 1213 1286 +73
Continue to review full report at Codecov.
|
So let’s make it ZeroInflated{Poisson}? For the same amount of code we get a couple of related zero inflated distributions, e.g ZeroInflated{NegativeBinomial}. |
Co-authored-by: David Widmann <[email protected]>
Building on @mschauer we can generically allow creating all kinds of transformations (Zero-inflated/truncated/modified): using Distributions
λ = 2.0; #Poisson parameter
zp = 0.4; #zero prob.
dpoi = Poisson(λ)
dzip = MixtureModel([Dirac(0.0), dpoi], [zp, 1.0-zp]) #ZeroInflated ZIP
dztp = Truncated(dpoi, 1, Inf) #ZeroTruncated ZTP
dzmp = MixtureModel([Dirac(0.0), dztp], [zp,1.0-zp]) #ZeroModified ZMP
# A function might look something like
ZeroInflated(d, zp) = MixtureModel([Dirac(0.0), d], [zp, 1.0-zp])
ZeroTruncated(d) = Truncated(d, 1, Inf)
ZeroModified(d, zp) = MixtureModel([Dirac(0.0), ZeroTruncated(d)], [zp,1.0-zp]) Btw, there are also one-inflated Binomial/Poisson/Beta etc which can be handled similarly. # x_inflated/x_truncated might look something like
XInflated(d, xp, x) = MixtureModel([Dirac(x), d], [xp, 1.0-xp])
XTruncated(d, x) = Truncated(d, x+1, Inf) # note cts/discrete issue +1...
XModified(d, xp, x) = MixtureModel([Dirac(x), XTruncated(d, x)], [xp,1.0-xp])
#
ZeroInflated(d, zp) = XInflated(d, zp, 0.0)
ZeroTruncated(d) = XTruncated(d, 0.0)
ZeroModified(d, zp) = XModified(d, zp, 0.0) We can look at ZeroInflatedDistributions.jl & LRMoE.jl |
|
In the wake of a discussion over on StatisticalRethinkingTuring, I figured that it would be a good idea to put this together. It is used fairly extensively in the social sciences, and is probably not something that researchers should have to construct themselves. Thanks again.