diff --git a/docs/notebooks/negative_binomial.ipynb b/docs/notebooks/negative_binomial.ipynb
index d0aaff1e..d5dbcdfa 100644
--- a/docs/notebooks/negative_binomial.ipynb
+++ b/docs/notebooks/negative_binomial.ipynb
@@ -18,36 +18,28 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The negative binomial distribution is flexible with multiple possible formulations. For example, it can model the number of *trials* or *failures* in a sequence of independent Bernoulli trials with probability of success (or failure) $p$ until the $k$-th \"success\". If we want to model the number of trials until the $k$-th success, we can use the following definition:\n",
+    "The negative binomial distribution is flexible with multiple possible formulations. For example, it can model the number of *trials* or *failures* in a sequence of independent Bernoulli trials with probability of success (or failure) $p$ until the $k$-th \"success\". If we want to model the number of trials until the $k$-th success, the probability mass function (pmf) results:\n",
     "\n",
     "$$\n",
-    "Y \\sim \\text{NB}(k, p)\n",
+    "p(y | k, p)= \\binom{y - 1}{y-k}(1 -p)^{y - k}p^k\n",
     "$$\n",
     "\n",
     "where $0 \\le p \\le 1$ is the probability of success in each Bernoulli trial, $k > 0$, usually integer, $y \\in \\{k, k + 1, \\cdots\\}$ and $Y$ is the number of trials until the $k$-th success.\n",
     "\n",
-    "The probability mass function (pmf) is \n",
-    "\n",
-    "$$\n",
-    "p(y | k, p)= \\binom{y - 1}{y-k}(1 -p)^{y - k}p^k\n",
-    "$$\n",
-    "\n",
     "In this case, since we are modeling the number of *trials* until the $k$-th success, $y$ starts at $k$ and can be any integer greater than or equal to $k$. If instead we want to model the number of *failures* until the $k$-th success, we can use the same definition but $Y$ represents failures and starts at $0$ and there's a slightly different pmf:\n",
     "\n",
     "$$\n",
     "p(y | k, p)= \\binom{y + k - 1}{k-1}(1 -p)^{y}p^k\n",
     "$$\n",
     "\n",
-    "In this case, $y$ starts at $0$ and can be any integer greater than or equal to $0$. When modeling failures, $y$ starts at 0, when modeling trials, $y$ starts at $k$.\n",
-    "\n",
-    "\n"
+    "In this case, $y$ starts at $0$ and can be any integer greater than or equal to $0$. When modeling failures, $y$ starts at 0, when modeling trials, $y$ starts at $k$."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "These are not the only ways of defining the negative binomial distribution, there are plenty of options! One of the most interesting, and the one you see in [PyMC3](https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.NegativeBinomial), the library we use in Bambi for the backend, is as a continuous mixture. The negative binomial distribution describes a Poisson random variable whose rate is also a random variable (not a fixed constant!) following a gamma distribution. Or in other words, conditional on a gamma-distributed variable $\\mu$, the variable $Y$ has a Poisson distribution with mean $\\mu$.\n",
+    "These are not the only ways of defining the negative binomial distribution, there are plenty of options! One of the most interesting, and the one you see in [PyMC](https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.NegativeBinomial.html), the library we use in Bambi for the backend, is as a continuous mixture. The negative binomial distribution describes a Poisson random variable whose rate is also a random variable (not a fixed constant!) following a gamma distribution. Or in other words, conditional on a gamma-distributed variable $\\mu$, the variable $Y$ has a Poisson distribution with mean $\\mu$.\n",
     "\n",
     "Under this alternative definition, the pmf is\n",
     "\n",
@@ -96,7 +88,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Scipy uses the number of *failures* until $k$ successes definition, therefore $y$ starts at 0. In the following plot, we have  the probability of observing $y$ failures before we see $k=3$ successes. "
+    "SciPy uses the number of *failures* until $k$ successes definition, therefore $y$ starts at 0. In the following plot, we have  the probability of observing $y$ failures before we see $k=3$ successes. "
    ]
   },
   {