diff --git a/homework2/Homework2.ipynb b/homework2/Homework2.ipynb new file mode 100644 index 0000000..1c50bf3 --- /dev/null +++ b/homework2/Homework2.ipynb @@ -0,0 +1,213 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Homework2.ipynb", + "provenance": [], + "collapsed_sections": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "tkKY6us_cCg4", + "colab_type": "text" + }, + "source": [ + "# Homework 2\n", + "\n", + "**For exercises in the week 20-25.11.19**\n", + "\n", + "**Points: 6 + 1b**\n", + "\n", + "Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the blackboard.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8rdqrcVvNTv0", + "colab_type": "text" + }, + "source": [ + "## Problem 1 [1p]\n", + "\n", + "Let $(x^{(i)},y^{(i)})$ be a data sample with $x^{(i)}\\in\\mathbb{R}^D$, $y^{(i)}\\in\\mathbb{R}$. Let $\\Theta \\in\\mathbb{R}^D$ a parameter vector.\n", + "\n", + "Find the closed form solution $\\Theta^*$ to \n", + "\n", + "$$\n", + "\\min_\\Theta \\left(\\frac{1}{2}\\sum_i (\\Theta^Tx^{(i)} - y^{(i)})^2 + \\frac{\\lambda}{2}\\sum_{d=1}^D \\Theta_d^2\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4PMPpxChQHnR", + "colab_type": "text" + }, + "source": [ + "## Problem 2 [1p]\n", + "Let $v\\in\\mathbb{R}^D$ be a vector. Define the gradient of $f(v)\\in\\mathbb{R}$ with respect to $v$ to be $\\frac{\\partial f}{\\partial v} = \\left[\\frac{\\partial f(v)}{\\partial v_1}, \\frac{\\partial f(v)}{\\partial v_2}, ..., \\frac{\\partial f(v)}{\\partial v_D}\\right]$\n", + "\n", + "Find the following functions' gradients with respect to vector $[x, y, z]^T$:\n", + "1. $f_1([x, y, z]^T) = x + y$\n", + "2. $f_2([x, y, z]^T) = xy$\n", + "3. $f_3([x, y, z]^T) = x^2y^2$\n", + "4. $f_4([x, y, z]^T) = (x + y)^2$\n", + "5. $f_5([x, y, z]^T) = x^4 + x^2 y z + x y^2 z + z^4$\n", + "6. $f_6([x, y, z]^T) = e^{x + 2y}$\n", + "7. $f_7([x, y, z]^T) = \\frac{1}{x y^2}$\n", + "8. $f_8([x, y, z]^T) = ax + by + c$\n", + "9. $f_9([x, y, z]^T) = \\tanh(ax + by + c)$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TH9nPvyCMu27", + "colab_type": "text" + }, + "source": [ + "## Problem 3 [0.5p]\n", + "\n", + "Find the following functions' gradients or Jacobians with respect to vector $\\mathbf{x}$, where $\\mathbf{x}, \\mathbf{b} \\in \\mathbb{R}^{n}$, $\\mathbf{W} \\in \\mathbb{R}^{n \\times n}$:\n", + "\n", + "1. $\\mathbf{W} \\mathbf{x} + \\mathbf{b}$\n", + "2. $\\mathbf{x}^T \\mathbf{W} \\mathbf{x}$," + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8sYUtEqUQWYC", + "colab_type": "text" + }, + "source": [ + "## Problem 4 [1p]\n", + "\n", + "Find the derivative of\n", + "$-\\log(S(\\mathbf{x})_j)$, where $S$ is the\n", + " softmax function\n", + " (https://en.wikipedia.org/wiki/Softmax_function) and we are\n", + " interested in the derivative over the $j$-th output of the\n", + " Softmax." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w_Nd4O2PP5mq", + "colab_type": "text" + }, + "source": [ + "## Problem 5 [0.5p]\n", + "\n", + "Consider a dataset with 400 examples of class C1 and 400 of class C2. \n", + "Let tree A have 2 leaves with class distributions:\n", + "\n", + "| Tree A | C1 | C2 |\n", + "|----------|-------|-----|\n", + "| Leaf 1 | 100 | 300 |\n", + "| Leaf 2 | 300 | 100 |\n", + "\n", + "and let tree B have 2 leaves with class distribution:\n", + "\n", + "| Tree B | C1 | C2 |\n", + "|----------|-------|-----|\n", + "| Leaf 1 | 200 | 400 |\n", + "| Leaf 2 | 200 | 0 |\n", + "\n", + "What is the misclassification rate for both trees? Which tree is more pure according to Gini or Infogain?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X0BwK5qnaxav", + "colab_type": "text" + }, + "source": [ + "## Problem 6 [1p]\n", + "\n", + "Consider regresion problem, with $M$ predictors $h_m(x)$ trained to aproximate a target $y$. Define the error to be $\\epsilon_m(x) = h_m(x) - y$.\n", + "\n", + "Suppose you train $M$ independent classifiers with average least squares error\n", + "$$\n", + "E_{AV} = \\frac{1}{M}\\sum_{m=1}^M \\mathbb{E}_{x}[\\epsilon_m(x)^2].\n", + "$$\n", + "\n", + "Further assume that the errors have zero mean and are uncorrelated:\n", + "$$\n", + "\\mathbb{E}_{x}[\\epsilon_m(x)] = 0\\qquad\\text{ and }\\qquad\\mathbb{E}_{x}[\\epsilon_m(x)\\epsilon_l(x)] = 0\\text{ for } m \\neq l\n", + "$$\n", + "\n", + "Let the mean predictor be\n", + "$$\n", + "h_M(x) = \\frac{1}{M}h_m(x).\n", + "$$\n", + "\n", + "What is the average error of $h_M(x)$?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kVFSEXy_g_fo", + "colab_type": "text" + }, + "source": [ + "## Problem 7 [1p]\n", + "\n", + "Suppose you work on a binary classification problem and train 3 weak classifiers. You combine their prediction by voting. \n", + "\n", + "Can the training error rate of the voting ensemble smaller that the error rate of the individual weak predictors? Can it be larger? Show an example or prove infeasibility." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8IzGTnZuMxTB", + "colab_type": "text" + }, + "source": [ + "## Problem 8 [1 bonus point]\n", + "\n", + "While on a walk, you notice that a locomotive has the serial number 50. Assuming that all locomotives used by PKP (the Polish railroad operator) are numbered using consecutive natural numbers, what is your estimate of $N$ the total number of locomotives operated by PKP?\n", + "\n", + "Tell why the Maximum Likelihood principle may not yield satisfactory results. \n", + "\n", + "Use the Bayesian approach to find the posterior distribution over\n", + " the number of locomotives. Then compute the expected count of\n", + " locomotives. For the prior use the power law:\n", + " \\begin{equation}\n", + " p(N) = \\frac{1}{N^\\alpha}\\frac{1}{\\zeta(\\alpha,1)},\n", + " \\end{equation}\n", + " where the $\\zeta(s,q)=\\sum_{n=0}^{\\infty}\\frac{1}{(q+n)^s}$ is the\n", + " Hurwitz Zeta function\n", + " (https://en.wikipedia.org/wiki/Hurwitz_zeta_function)\n", + " available in Python as `scipy.special.zeta`. The use of the\n", + " power law is motivated by the observation that the frequency of\n", + " occurrence of a company is inversely proportional to its size (see\n", + " also: R.L. Axtell, Zipf distribution of US firm sizes\n", + " https://www.sciencemag.org/content/293/5536/1818).\n", + " \n", + " How would your estimate change after seeing 5 locomotives, with the\n", + " biggest serial number among them being 50?\n", + "\n", + " **Note**: During the Second World War, a similar problem was\n", + " encountered while trying to estimate the total German tank\n", + " production from the serial numbers of captured machines. The\n", + " statistical estimates were the most precise!" + ] + } + ] +} \ No newline at end of file