You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
\@writefile{toc}{\contentsline {section}{\numberline {1.2}A Very Short Introduction to \texttt {R} and \emph {RStudio}}{13}{section.1.2}\protected@file@percent }
Copy file name to clipboardexpand all lines: 02-ch2.Rmd
+7-23
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,6 @@ knitr::kable(pdfdata, format = my_output, caption = "PDF and CDF of a Dice Roll"
46
46
47
47
We can easily plot both functions using `r ttcode("R")`. Since the probability equals $1/6$ for each outcome, we set up the vector `r ttcode("probability")` by using the function `r ttcode("rep()")` which replicates a given value a specified number of times.
For the cumulative probability distribution we need the cumulative probabilities, i.e., we need the cumulative sums of the vector `r ttcode("probability")`. These sums can be computed using `r ttcode("cumsum()")`.
For the standard normal distribution we have $\mu=0$ and $\sigma=1$. Standard normal variates are often denoted by $Z$. Usually, the standard normal PDF is denoted by $\phi$ and the standard normal CDF is denoted by $\Phi$. Hence,
450
446
$$ \phi(c) = \Phi'(c) \ \ , \ \ \Phi(c) = P(Z \leq c) \ \ , \ \ Z \sim \mathcal{N}(0,1).$$ Note that the notation X $\sim$ Y reads as "X is distributed as Y". In `r ttcode("R")`, we can conveniently obtain densities of normal distributions using the function `r ttcode("dnorm()")`. Let us draw a plot of the standard normal density function using `r ttcode("curve()")` together with `r ttcode("dnorm()")`.
Similar to the PDF, we can plot the standard normal CDF using `r ttcode("curve()")`. We could use `r ttcode("dnorm()")` for this but it is much more convenient to rely on `r ttcode("pnorm()")`.
main = "Standard Normal Cumulative Distribution Function")
478
473
```
479
-
</div>
474
+
480
475
481
476
We can also use `r ttcode("R")` to calculate the probability of events associated with a standard normal variate.
482
477
@@ -644,7 +639,6 @@ it holds that
644
639
$$ Z_1^2+Z_2^2+Z_3^3 \sim \chi^2_3. \tag{2.3} $$
645
640
Using the code below, we can display the PDF and the CDF of a $\chi^2_3$ random variable in a single plot. This is achieved by setting the argument `r ttcode("add = TRUE")` in the second call of `r ttcode("curve()")`. Further we adjust limits of both axes using `r ttcode("xlim")` and `r ttcode("ylim")` and choose different colors to make both functions better distinguishable. The plot is completed by adding a legend with help of `r ttcode("legend()")`.
Since the outcomes of a $\chi^2_M$ distributed random variable are always positive, the support of the related PDF and CDF is $\mathbb{R}_{\geq0}$.
672
666
673
667
As expectation and variance depend (solely!) on the degrees of freedom, the distribution's shape changes drastically if we vary the number of squared standard normals that are summed up. This relation is often depicted by overlaying densities for different $M$, see the <ahref="https://en.wikipedia.org/wiki/Chi-squared_distribution">Wikipedia Article</a>.
674
668
675
669
We reproduce this here by plotting the density of the $\chi_1^2$ distribution on the interval $[0,15]$ with `r ttcode("curve()")`. In the next step, we loop over degrees of freedom $M=2,...,7$ and add a density curve for each $M$ to the plot. We also adjust the line color for each iteration of the loop by setting `r ttcode("col = M")`. At last, we add a legend that displays degrees of freedom and the associated colors.
Increasing the degrees of freedom shifts the distribution to the right (the mode becomes larger) and increases the dispersion (the distribution's variance grows).
704
697
@@ -722,7 +715,6 @@ A $t_M$ distributed random variable $X$ has an expectation if $M>1$ and it has a
722
715
723
716
Let us plot some $t$ distributions with different $M$ and compare them to the standard normal distribution.
The plot illustrates what has been said in the previous paragraph: as the degrees of freedom increase, the shape of the $t$ distribution comes closer to that of a standard normal bell curve. Already for $M=25$ we find little difference to the standard normal density. If $M$ is small, we find the distribution to have heavier tails than a standard normal, i.e., it has a "fatter" bell shape.
# define coordinate vectors for vertices of the polygon
783
773
x <- c(2, seq(2, 10, 0.01), 10)
@@ -793,7 +783,7 @@ curve(df(x ,3 ,14),
793
783
# draw the polygon
794
784
polygon(x, y, col = "orange")
795
785
```
796
-
</div>
786
+
797
787
798
788
The $F$ distribution is related to many other distributions. An important special case encountered in econometrics arises if the denominator degrees of freedom are large such that the $F_{M,n}$ distribution can be approximated by the $F_{M,\infty}$ distribution which turns out to be simply the distribution of a $\chi^2_M$ random variable divided by its degrees of freedom $M$,
799
789
@@ -894,7 +884,6 @@ VarS
894
884
895
885
So the distribution of $S$ is known. It is also evident that its distribution differs considerably from the marginal distribution, i.e,the distribution of a single dice roll's outcome, $D$ . Let us visualize this using bar plots.
# divide the plotting area into one row with two columns
900
889
par(mfrow = c(1, 2))
@@ -919,8 +908,6 @@ barplot(probability,
919
908
space = 0,
920
909
main = "Outcome of a Single Dice Roll")
921
910
```
922
-
</div>
923
-
924
911
925
912
Many econometric procedures deal with averages of sampled data. It is typically assumed that observations are drawn randomly from a larger, unknown population. As demonstrated for the sample function $S$, computing an average of a random sample has the effect that the average is a random variable itself. This random variable in turn has a probability distribution, called the sampling distribution. Knowledge about the sampling distribution of the average is therefore crucial for understanding the performance of econometric procedures.
926
913
@@ -994,7 +981,6 @@ A straightforward approach to examine the distribution of univariate numerical d
994
981
995
982
Using `r ttcode("curve()")`, we overlay the histogram with a red line, the theoretical density of a $\mathcal{N}(0, 0.1)$ random variable. Remember to use the argument `r ttcode("add = TRUE")` to add the curve to the current plot. Otherwise `r ttcode("R")` will open a new graphic device and discard the previous plot!^[*Hint:*`r ttcode("T")` and `r ttcode("F")` are alternatives for `r ttcode("TRUE")` and `r ttcode("FALSE")`.]
The sampling distribution of $\overline{Y}$ is indeed very close to that of a $\mathcal{N}(0, 0.1)$ distribution so the Monte Carlo simulation supports the theoretical claim.
1015
1000
@@ -1023,7 +1008,6 @@ To visualize the claim stated in equation (<a href="#mjx-eqn-2.3">2.3</a>), we p
1023
1008
1024
1009
Again, we produce a density estimate for the distribution underlying our simulated data using a density histogram and overlay it with a line graph of the theoretical density function of the $\chi^2_3$ distribution.
0 commit comments