bambinos · connor-pph · Jan 26, 2025 · Jan 31, 2025 · tomicapretto · Jan 30, 2025
diff --git a/docs/notebooks/negative_binomial.ipynb b/docs/notebooks/negative_binomial.ipynb
@@ -18,28 +18,36 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "I always experience some kind of confusion when looking at the negative binomial distribution after a while of not working with it. There are so many different definitions that I usually need to read everything more than once. The definition I've first learned, and the one I like the most, says as follows: The negative binomial distribution is the distribution of a random variable that is defined as the number of independent Bernoulli trials until the k-th \"success\". In short, we repeat a Bernoulli experiment until we observe k  successes and record the number of trials it required.\n",
+    "The negative binomial distribution is flexible with multiple possible formulations. For example, it can model the number of *trials* or *failures* in a sequence of independent Bernoulli trials with probability of success (or failure) $p$ until the $k$-th \"success\". If we want to model the number of trials until the $k$-th success, we can use the following definition:\n",
     "\n",
     "$$\n",
     "Y \\sim \\text{NB}(k, p)\n",
     "$$\n",
     "\n",
-    "where $0 \\le p \\le 1$ is the probability of success in each Bernoulli trial, $k > 0$, usually integer, and $y \\in \\{k, k + 1, \\cdots\\}$\n",
+    "where $0 \\le p \\le 1$ is the probability of success in each Bernoulli trial, $k > 0$, usually integer, $y \\in \\{k, k + 1, \\cdots\\}$ and $Y$ is the number of trials until the $k$-th success.\n",
     "\n",
     "The probability mass function (pmf) is \n",
     "\n",
     "$$\n",
     "p(y | k, p)= \\binom{y - 1}{y-k}(1 -p)^{y - k}p^k\n",
     "$$\n",
     "\n",
-    "If you, like me, find it hard to remember whether $y$ starts at $0$, $1$, or $k$, try to think twice about the definition of the variable. But how? First, recall we aim to have $k$ successes. And success is one of the two possible outcomes of a trial, so the number of trials can never be smaller than the number of successes. Thus, we can be confident to say that $y \\ge k$."
+    "In this case, since we are modeling the number of *trials* until the $k$-th success, $y$ starts at $k$ and can be any integer greater than or equal to $k$. If instead we want to model the number of *failures* until the $k$-th success, we can use the same definition but $Y$ represents failures and starts at $0$ and there's a slightly different pmf:\n",
+    "\n",
+    "$$\n",
+    "p(y | k, p)= \\binom{y + k - 1}{k-1}(1 -p)^{y}p^k\n",
+    "$$\n",
+    "\n",
+    "In this case, $y$ starts at $0$ and can be any integer greater than or equal to $0$. When modeling failures, $y$ starts at 0, when modeling trials, $y$ starts at $k$.\n",
+    "\n",
+    "\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "But this is not the only way of defining the negative binomial distribution, there are plenty of options! One of the most interesting, and the one you see in [PyMC3](https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.NegativeBinomial), the library we use in Bambi for the backend, is as a continuous mixture. The negative binomial distribution describes a Poisson random variable whose rate is also a random variable (not a fixed constant!) following a gamma distribution. Or in other words, conditional on a gamma-distributed variable $\\mu$, the variable $Y$ has a Poisson distribution with mean $\\mu$.\n",
+    "These are not the only ways of defining the negative binomial distribution, there are plenty of options! One of the most interesting, and the one you see in [PyMC3](https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.NegativeBinomial), the library we use in Bambi for the backend, is as a continuous mixture. The negative binomial distribution describes a Poisson random variable whose rate is also a random variable (not a fixed constant!) following a gamma distribution. Or in other words, conditional on a gamma-distributed variable $\\mu$, the variable $Y$ has a Poisson distribution with mean $\\mu$.\n",
     "\n",
     "Under this alternative definition, the pmf is\n",
     "\n",
@@ -88,7 +96,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In SciPy, the definition of the negative binomial distribution differs a little from the one in our introduction. They define $Y$ = Number of failures until k successes and then $y$ starts at 0. In the following plot, we have  the probability of observing $y$ failures before we see $k=3$ successes. "
+    "Scipy uses the number of *failures* until $k$ successes definition, therefore $y$ starts at 0. In the following plot, we have  the probability of observing $y$ failures before we see $k=3$ successes. "
    ]
   },
   {
@@ -163,7 +171,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Finally, if one wants to show this probability mass function as if we are following the first definition of negative binomial distribution we introduced, we just need to shift the whole thing to the right by adding $k$ to the $y$ values."
+    "To change the definition to the number of *trials* until $k$ successes, we just need to shift the whole thing to the right by adding $k$ to the $y$ values."
    ]
   },
   {
@@ -226,7 +234,7 @@
     "\n",
     "School administrators study the attendance behavior of high school juniors at two schools.  Predictors of the **number of days of absence** include the **type of program** in which the student is enrolled and a **standardized test in math**. We have attendance data on 314 high school juniors.\n",
     "\n",
-    "The variables of insterest in the dataset are\n",
+    "The variables of interest in the dataset are\n",
     "\n",
     "* daysabs: The number of days of absence. It is our response variable.\n",
     "* progr: The type of program. Can be one of 'General', 'Academic', or 'Vocational'.\n",
@@ -551,7 +559,7 @@
     "\n",
     "But then, why negative binomial? Can't we just use a Poisson likelihood?\n",
     "\n",
-    "Yes, we can. However, using a Poisson likelihood implies that the mean is equal to the variance, and that is usually an unrealistic assumption. If it turns out the variance is either substantially smaller or greater than the mean, the Poisson regression model results in a poor fit. Alternatively, if we use a negative binomial likelihood, the variance is not forced to be equal to the mean, and there's more flexibility to handle a given dataset, and consequently, the fit tends to better."
+    "Yes, we can. However, using a Poisson likelihood implies that the mean is equal to the variance, and that is usually an unrealistic assumption. If it turns out the variance is either substantially smaller or greater than the mean, the Poisson regression model results in a poor fit. Alternatively, if we use a negative binomial likelihood, the variance is not forced to be equal to the mean, and there's more flexibility to handle a given dataset, and consequently, the fit tends to be better."
    ]
   },
   {
@@ -608,7 +616,7 @@
     "\\log(\\mathbb{E}[Y_i]) = \\beta_3 + \\beta_4 \\text{Math\\_std}_i\n",
     "$$\n",
     "\n",
-    "And one last thing to note is we've decided not to inclide an intercept term, that's why you don't see any $\\beta_0$ above. This choice allows us to represent the effect of each program directly with $\\beta_1$, $\\beta_2$, and $\\beta_3$."
+    "And one last thing to note is we've decided not to include an intercept term, that's why you don't see any $\\beta_0$ above. This choice allows us to represent the effect of each program directly with $\\beta_1$, $\\beta_2$, and $\\beta_3$."
    ]
   },
   {