|
| 1 | +\documentclass[a4paper,10pt, notitlepage]{report} |
| 2 | +\usepackage[utf8]{inputenc} |
| 3 | +\usepackage{natbib} |
| 4 | +\usepackage{amssymb} |
| 5 | +\usepackage{amsmath} |
| 6 | +\usepackage[shortlabels]{enumitem} |
| 7 | +% \usepackage[portuguese]{babel} |
| 8 | + |
| 9 | + |
| 10 | +% Title Page |
| 11 | +\title{Assignment II: Advanced simulation techniques.} |
| 12 | +\author{Computational Statistics \\ Instructor: Luiz Max de Carvalho} |
| 13 | + |
| 14 | +\begin{document} |
| 15 | +\maketitle |
| 16 | + |
| 17 | +\textbf{Hand-in date: 15/12/2022.} |
| 18 | + |
| 19 | +\section*{General guidance} |
| 20 | +\begin{itemize} |
| 21 | + \item State and prove all non-trivial mathematical results necessary to substantiate your arguments; |
| 22 | + \item Do not forget to add appropriate scholarly references~\textit{at the end} of the document; |
| 23 | + \item Mathematical expressions also receive punctuation; |
| 24 | + \item All computational implementations must be ``from scratch'', i.e., you may not employ a ready-made package to implement the technique in question. |
| 25 | + You may, however (a) employ pre-packaged routines for things like random variate generation and MCMC diagnostics and (b) use a package implementation against which to check your own. |
| 26 | + \item Please hand in a single PDF file as your final main document. |
| 27 | + Code appendices are welcome,~\textit{in addition} to the main PDF document. |
| 28 | + \end{itemize} |
| 29 | + |
| 30 | + |
| 31 | +\section*{Background} |
| 32 | + |
| 33 | +We have by now hopefully acquired a solid theoretical understanding of simulation techniques, including Markov chain Monte Carlo (MCMC). |
| 34 | +In this assigment, we shall re-visit some of the main techniques in the field of Simulation. |
| 35 | +The goal is to broaden your knowledge of the field by implementing one of the many variations on the general theme of simulation algorithms. |
| 36 | + |
| 37 | +Each method/paper brings its own advantages and pitfalls, and each explores a slightly different aspect of Computational Statistics. |
| 38 | +You should pick~\textbf{one} of the listed papers and answer the associated questions. |
| 39 | + |
| 40 | +In what follows, ESS stands for effective sample size, and is similar to $n_{\text{eff}}$ we have encountered before: it measures the number of effectively uncorrelated samples in a given collection of random variates. |
| 41 | + |
| 42 | +\newpage |
| 43 | + |
| 44 | +\section*{Paper 1: MCMC using Hamiltonian dynamics~\citep{Neal2011}} |
| 45 | + |
| 46 | +As discussed in class, as the dimensionality of the space over which integrals need to be taken grows, performance suffers massively -- this is the so-called ``curse of dimensionality''. |
| 47 | +In our quest to compute expectations efficiently, we might want to draw on all of the available information in order to find pockets of high probability mass. |
| 48 | + |
| 49 | +Clever proposal mechanisms in MCMC use local information, usually in the form of gradients of the (log) target. |
| 50 | +In this seminal 2011 review, Radford Neal lays out a complete treatment of a technique known as Hybrid or Hamiltonian Monte Carlo (HMC), which works by constructing a Markov chain on an augmented state-space where one considers potentials and momenta. |
| 51 | + |
| 52 | +\begin{enumerate} |
| 53 | + \item Describe how to apply Hamiltonian dynamics to MCMC; |
| 54 | + \item Implementation: reproduce Figure 6 of~\cite{Neal2011} |
| 55 | +\begin{enumerate}[(a)] |
| 56 | + \item Supplement the analyses presented therein with ESS/hour computations in order to gauge the real gain of applying HMC. |
| 57 | + \textit{Hint:} Use the function \verb|effectiveSize()| from the~\textbf{coda} package in R~\citep{Plummer2006}; |
| 58 | + \end{enumerate} |
| 59 | + \item Why does HMC avoid random walk behaviour? What advantages are there of such an algorithm? |
| 60 | +\end{enumerate} |
| 61 | + |
| 62 | +\section*{Paper 2: Bootstrap~\citep{Efron1986}} |
| 63 | + |
| 64 | +In orthodox (frequentist) Statistics, it is common to want to ascertain long run (frequency) properties of estimators, including coverage of confidence intervals and standard errors. |
| 65 | +Unfortunately, for the models of interest in actual practice, constructing confidence intervals directly (exactly) is difficult. |
| 66 | +The bootstrap method is a re-sampling technique that allows for a simple yet theoretically grounded way of constructing confidence intervals and assessing standard errors in quite complex situations. |
| 67 | + |
| 68 | +For this assigment, you are encouraged to consult the seminal 1986 review by stellar statisticians Bradley Efron and Robert Tibshirani~\citep{Efron1986}. |
| 69 | + |
| 70 | +\textit{Hint:} Brush off on your Normal theory before delving in. |
| 71 | +The book by~\cite{Schervish2012} -- specially Chapter 5 -- is a great resource. |
| 72 | + |
| 73 | +\begin{enumerate} |
| 74 | + \item Define and explain the bootstrap technique; |
| 75 | + \item Define and explain the jackknife technique; |
| 76 | + \item Implementation: |
| 77 | +\begin{enumerate}[(a)] |
| 78 | + \item Reproduce the results in Table I of~\cite{Efron1986}; |
| 79 | + \item Show what happens if one increases/decreases the value of $B$; |
| 80 | + \end{enumerate} |
| 81 | + \item Why is it important to draw exactly $n$ samples in each bootstrap iteration? Can this be relaxed? |
| 82 | + \item (bonus) Propose an alternative bootstrap method to the one proposed in the paper and discuss the situations where the new method is expected to perform better. |
| 83 | +\end{enumerate} |
| 84 | + |
| 85 | +\section*{Paper 3: Blocked Gibbs sampling~\citep{Tan2009}} |
| 86 | + |
| 87 | +The so-called Gibbs sampler is a work horse of Computational Statistics. |
| 88 | +It depends on decomposing a target distribution into conditional densities from which new values of a given coordinate can be drawn. |
| 89 | + |
| 90 | +One of the difficulties one might encounter with the Gibbs sampler is that it might be slow to converge, specially in highly-correlated targets. |
| 91 | +In Statistics, multilevel models (also called hierarchical or random effects) are extremely useful in modelling data coming from stratified structures (e.g. individuals within a city and cities within a state) and typically present highly correlated posterior distributions. |
| 92 | + |
| 93 | +One way to counteract the correlation between coordinates in the Gibbs sampler is to~\textbf{block} them together, and sample correlated coordinates jointly. |
| 94 | + |
| 95 | +For this assigment you are referred to the 2009~\textit{Journal of Computational and Graphical Statistics} paper by Tan and Hobert~\citep{Tan2009}. |
| 96 | + |
| 97 | +\begin{enumerate} |
| 98 | + \item Precisely describe the so-called blocked Gibbs sampler; |
| 99 | + \textit{Hint:} you do not need to describe theoretical properties of the algorithm given in this paper; a general description of the algorithm should suffice. |
| 100 | + \item Explain the advantages -- both theoretical and practical -- of a clever blocking scheme; |
| 101 | + \item Would it be possible to apply the ``simple'' Gibbs sampler in this example? Why? |
| 102 | + \item Implementation: |
| 103 | + \begin{enumerate}[(a)] |
| 104 | + \item Implement the blocked Gibbs sampler discussed in the paper in order to fit the model of Section 1 of~\cite{Tan2009} to the data described in Section 5 therein. |
| 105 | + \item Assess convergence (or lack thereof) and mixing of the resulting chain. |
| 106 | + \item Confirm your results agree with those given by the original authors up to Monte Carlo error. |
| 107 | + \end{enumerate} |
| 108 | + \item Comment on the significance of geometric ergodicity for the blocked Gibbs sampler proposed by~\cite{Tan2009}. |
| 109 | +\end{enumerate} |
| 110 | + |
| 111 | +\section*{Paper 4: Approximate Bayesian computation~\citep{Beaumont2002}} |
| 112 | + |
| 113 | +Bayesian inference relies on computing a posterior distribution of a set of unknowns, $\boldsymbol{\theta}$ conditional on the observed data, $\boldsymbol{x}$. |
| 114 | +This posterior distribution, $p(\boldsymbol{\theta} \mid \boldsymbol{x})$, is proportional to a likelihood function times a prior distribution, i.e., |
| 115 | +\begin{equation} |
| 116 | + \label{eq:posterior} |
| 117 | + p(\boldsymbol{\theta} \mid \boldsymbol{x}) \propto l(\boldsymbol{x} \mid \boldsymbol{\theta})\pi(\boldsymbol{\theta}). |
| 118 | +\end{equation} |
| 119 | + |
| 120 | +In many situations, however, our models are so complex that the likelihood function in~(\ref{eq:posterior}) might be either very costly to compute or computationally intractable. |
| 121 | +Examples of models which fall onto this class include Epidemiological models, Population Genetics models and Gibbs random fields. |
| 122 | + |
| 123 | +In such cases, one can use the so-called likelihood-free methods, which either replace the ``true'' likelihood function with a surrogate or eschew computing it altogether. |
| 124 | +The so-called Approximate Bayesian Computation (ABC) class of algorithms has enjoyed great success in recent years because it allows inference about very complex stochastic models that are inaccessible to other methods. |
| 125 | + |
| 126 | +\textit{Hint:} I strongly suggest you consult the recent review of~\cite{Beaumont2019} for extra details. |
| 127 | +\begin{enumerate} |
| 128 | + \item Describe the (basic) ABC rejection algorithm; |
| 129 | + \item Implementation: |
| 130 | +\begin{enumerate}[(a)] |
| 131 | + \item Suppose one has data $\boldsymbol{x} = (x_1, x_2, \ldots, x_n)$ on a binary outcome, i.e., $x_i \in \{0, 1\}$. |
| 132 | + Suppose further we choose to model these data as independent $x_i \sim \operatorname{Bernoulli}(\theta)$ and pick a Beta prior for $\theta$ with hyperparameters $\alpha>0$ and $\beta>0$. |
| 133 | + Implement an ABC scheme to sample (approximately) from the corresponding posterior, $p(\theta \mid \boldsymbol{x})$. |
| 134 | + \item Implement a Metropolis-Hastings scheme to sample from $p(\theta \mid \boldsymbol{x})$; |
| 135 | + \item Compare the results of the previous two items to the exact posterior distribution (we derived this in class): how well does ABC fare for a range of true values of $\theta$? |
| 136 | + Does performance change if one changes the sample size ($n$)? |
| 137 | + \end{enumerate} |
| 138 | + \item Is it possible to employ improper priors with ABC?\footnote{Recall that a prior distribution, $\pi(\theta)$ is said to be~\textit{improper} if $\int_{\Theta} \pi(t)\,d\mu(t) = \infty$.} |
| 139 | + \item What is the role of sufficient statistics in ABC? |
| 140 | + \textit{Hint:} Take a look at~\cite{Robert2011}. |
| 141 | +\end{enumerate} |
| 142 | + |
| 143 | +\bibliographystyle{apalike} |
| 144 | +\bibliography{stat_comp} |
| 145 | + |
| 146 | +\end{document} |
0 commit comments