Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjustments to the negative binomial tutorial - #884

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 17 additions & 9 deletions docs/notebooks/negative_binomial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,28 +18,36 @@
"cell_type": "markdown",
Copy link
Collaborator

@tomicapretto tomicapretto Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace "we can use the following definition:" with "the probability mass function (pmf) results:" and directly show the first pmf you list.

Then I find it a bit repetitive that you say twice, and very close, that Y starts at zero when modeling failures, but it's fine.


Reply via ReviewNB

Copy link
Collaborator

@tomicapretto tomicapretto Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace PyMC3 with PyMC and put an up-to-date link?


Reply via ReviewNB

Copy link
Collaborator

@tomicapretto tomicapretto Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SciPy people write it with capital P :)


Reply via ReviewNB

"metadata": {},
"source": [
"I always experience some kind of confusion when looking at the negative binomial distribution after a while of not working with it. There are so many different definitions that I usually need to read everything more than once. The definition I've first learned, and the one I like the most, says as follows: The negative binomial distribution is the distribution of a random variable that is defined as the number of independent Bernoulli trials until the k-th \"success\". In short, we repeat a Bernoulli experiment until we observe k successes and record the number of trials it required.\n",
"The negative binomial distribution is flexible with multiple possible formulations. For example, it can model the number of *trials* or *failures* in a sequence of independent Bernoulli trials with probability of success (or failure) $p$ until the $k$-th \"success\". If we want to model the number of trials until the $k$-th success, we can use the following definition:\n",
"\n",
"$$\n",
"Y \\sim \\text{NB}(k, p)\n",
"$$\n",
"\n",
"where $0 \\le p \\le 1$ is the probability of success in each Bernoulli trial, $k > 0$, usually integer, and $y \\in \\{k, k + 1, \\cdots\\}$\n",
"where $0 \\le p \\le 1$ is the probability of success in each Bernoulli trial, $k > 0$, usually integer, $y \\in \\{k, k + 1, \\cdots\\}$ and $Y$ is the number of trials until the $k$-th success.\n",
"\n",
"The probability mass function (pmf) is \n",
"\n",
"$$\n",
"p(y | k, p)= \\binom{y - 1}{y-k}(1 -p)^{y - k}p^k\n",
"$$\n",
"\n",
"If you, like me, find it hard to remember whether $y$ starts at $0$, $1$, or $k$, try to think twice about the definition of the variable. But how? First, recall we aim to have $k$ successes. And success is one of the two possible outcomes of a trial, so the number of trials can never be smaller than the number of successes. Thus, we can be confident to say that $y \\ge k$."
"In this case, since we are modeling the number of *trials* until the $k$-th success, $y$ starts at $k$ and can be any integer greater than or equal to $k$. If instead we want to model the number of *failures* until the $k$-th success, we can use the same definition but $Y$ represents failures and starts at $0$ and there's a slightly different pmf:\n",
"\n",
"$$\n",
"p(y | k, p)= \\binom{y + k - 1}{k-1}(1 -p)^{y}p^k\n",
"$$\n",
"\n",
"In this case, $y$ starts at $0$ and can be any integer greater than or equal to $0$. When modeling failures, $y$ starts at 0, when modeling trials, $y$ starts at $k$.\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But this is not the only way of defining the negative binomial distribution, there are plenty of options! One of the most interesting, and the one you see in [PyMC3](https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.NegativeBinomial), the library we use in Bambi for the backend, is as a continuous mixture. The negative binomial distribution describes a Poisson random variable whose rate is also a random variable (not a fixed constant!) following a gamma distribution. Or in other words, conditional on a gamma-distributed variable $\\mu$, the variable $Y$ has a Poisson distribution with mean $\\mu$.\n",
"These are not the only ways of defining the negative binomial distribution, there are plenty of options! One of the most interesting, and the one you see in [PyMC3](https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.NegativeBinomial), the library we use in Bambi for the backend, is as a continuous mixture. The negative binomial distribution describes a Poisson random variable whose rate is also a random variable (not a fixed constant!) following a gamma distribution. Or in other words, conditional on a gamma-distributed variable $\\mu$, the variable $Y$ has a Poisson distribution with mean $\\mu$.\n",
"\n",
"Under this alternative definition, the pmf is\n",
"\n",
Expand Down Expand Up @@ -88,7 +96,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In SciPy, the definition of the negative binomial distribution differs a little from the one in our introduction. They define $Y$ = Number of failures until k successes and then $y$ starts at 0. In the following plot, we have the probability of observing $y$ failures before we see $k=3$ successes. "
"Scipy uses the number of *failures* until $k$ successes definition, therefore $y$ starts at 0. In the following plot, we have the probability of observing $y$ failures before we see $k=3$ successes. "
]
},
{
Expand Down Expand Up @@ -163,7 +171,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, if one wants to show this probability mass function as if we are following the first definition of negative binomial distribution we introduced, we just need to shift the whole thing to the right by adding $k$ to the $y$ values."
"To change the definition to the number of *trials* until $k$ successes, we just need to shift the whole thing to the right by adding $k$ to the $y$ values."
]
},
{
Expand Down Expand Up @@ -226,7 +234,7 @@
"\n",
"School administrators study the attendance behavior of high school juniors at two schools. Predictors of the **number of days of absence** include the **type of program** in which the student is enrolled and a **standardized test in math**. We have attendance data on 314 high school juniors.\n",
"\n",
"The variables of insterest in the dataset are\n",
"The variables of interest in the dataset are\n",
"\n",
"* daysabs: The number of days of absence. It is our response variable.\n",
"* progr: The type of program. Can be one of 'General', 'Academic', or 'Vocational'.\n",
Expand Down Expand Up @@ -551,7 +559,7 @@
"\n",
"But then, why negative binomial? Can't we just use a Poisson likelihood?\n",
"\n",
"Yes, we can. However, using a Poisson likelihood implies that the mean is equal to the variance, and that is usually an unrealistic assumption. If it turns out the variance is either substantially smaller or greater than the mean, the Poisson regression model results in a poor fit. Alternatively, if we use a negative binomial likelihood, the variance is not forced to be equal to the mean, and there's more flexibility to handle a given dataset, and consequently, the fit tends to better."
"Yes, we can. However, using a Poisson likelihood implies that the mean is equal to the variance, and that is usually an unrealistic assumption. If it turns out the variance is either substantially smaller or greater than the mean, the Poisson regression model results in a poor fit. Alternatively, if we use a negative binomial likelihood, the variance is not forced to be equal to the mean, and there's more flexibility to handle a given dataset, and consequently, the fit tends to be better."
]
},
{
Expand Down Expand Up @@ -608,7 +616,7 @@
"\\log(\\mathbb{E}[Y_i]) = \\beta_3 + \\beta_4 \\text{Math\\_std}_i\n",
"$$\n",
"\n",
"And one last thing to note is we've decided not to inclide an intercept term, that's why you don't see any $\\beta_0$ above. This choice allows us to represent the effect of each program directly with $\\beta_1$, $\\beta_2$, and $\\beta_3$."
"And one last thing to note is we've decided not to include an intercept term, that's why you don't see any $\\beta_0$ above. This choice allows us to represent the effect of each program directly with $\\beta_1$, $\\beta_2$, and $\\beta_3$."
]
},
{
Expand Down