Created using Colaboratory

thegunner157 · Oct 17, 2019 · 3b45dd0 · 3b45dd0
1 parent 3609a1f
commit 3b45dd0
Showing 1 changed file with 231 additions and 0 deletions.
diff --git a/homework1/Homework1.ipynb b/homework1/Homework1.ipynb
@@ -0,0 +1,231 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "Homework1.ipynb",
+      "provenance": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/janchorowski/ml_uwr/blob/fall2019/homework1/Homework1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "tkKY6us_cCg4",
+        "colab_type": "text"
+      },
+      "source": [
+        "# Homework 1\n",
+        "\n",
+        "**For exercises in the week 22-28.10.19**\n",
+        "\n",
+        "**Points: 7 + 2 bonus point**\n",
+        "\n",
+        "Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the backboard.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "iYFL1cWQbv1D",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Problem 1 (McKay 4.1) [1p]\n",
+        "\n",
+        "You are given a set of 12 balls in which:\n",
+        "- 11 balls are equal\n",
+        "- 1 ball is different (either heavier or lighter).\n",
+        "\n",
+        "You have a two-pan balance. How many weightings you must use to detect toe odd ball?\n",
+        "\n",
+        "*Hint:* A weighting can be seen as a random event. You can design them to maximize carry the most information, i.e. to maximize the entropy of their outcome."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "-kxgvhPBb7hN",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Problem 2 [1p]\n",
+        "\n",
+        "Bayes' theorem allows to reason about conditional probabilities of causes and their effects:\n",
+        "\n",
+        "\\begin{equation}\n",
+        "p(A,B)=p(A|B)p(B)=p(B|A)p(A)\n",
+        "\\end{equation}\n",
+        "\n",
+        "\\begin{equation}\n",
+        "p(A|B) = \\frac{p(B|A)p(A)}{p(B)}\n",
+        "\\end{equation}\n",
+        "\n",
+        "Bayes' theorem allows us to reason about probabilities of causes, when\n",
+        "we observe their results.  Instead of directly answering the hard\n",
+        "question $p(\\text{cause}|\\text{result})$ we can instead separately\n",
+        "work out the marginal probabilities of causes $p(\\text{cause})$ and\n",
+        "carefully study their effects $p(\\text{effect}|\\text{cause})$.\n",
+        "\n",
+        "Solve the following using Bayes' theorem.\n",
+        "\n",
+        "1. There are two boxes on the table: box \\#1 holds two\n",
+        "  black balls and eight red ones, box \\#2 holds 5 black ones and\n",
+        "  5 red ones. We pick a box at random (with equal probabilities),\n",
+        "  and then a ball from that box.\n",
+        "  1. What is the probability, that the\n",
+        "  ball came from box \\#1 if we happened to pick a red ball?\n",
+        "  \n",
+        "1. The government has started a preventive program of\n",
+        "  mandatory tests for the Ebola virus. Mass testing method is\n",
+        "  imprecise, yielding 1% of false positives (healthy, but the test\n",
+        "  indicates the virus) and 1% of false negatives (\n",
+        "  having the virus but healthy according to test results).\n",
+        "  As Ebola is rather infrequent, lets assume that it occurs in\n",
+        "  one in a million people in Europe.\n",
+        "  1. What is the probability,\n",
+        "  that a random European, who has been tested positive for Ebola\n",
+        "  virus, is indeed a carrier?\n",
+        "  2. Suppose we have an additional information, that the person has just\n",
+        "  arrived from a country where one in a thousand people is a carrier.\n",
+        "  How much will be the increase in probability?\n",
+        "  3. How accurate should be the test, for a 80% probability of true\n",
+        "  positive in a European?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "72HwrFKBb9Mn",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Problem 3 [1.5p]\n",
+        "\n",
+        "Given observations $x_1,\\ldots,x_n$\n",
+        "  coming from a certain distribution,\n",
+        "  prove that MLE of a particular parameter of that distribution is equal to the sample mean $\\frac{1}{n}\\sum_{i=1}^n x_i$:\n",
+        "1. Bernoulli distribution with success probability $p$ and MLE $\\hat{p}$,\n",
+        "2. Gaussian distribution $\\mathcal{N}(\\mu,\\sigma)$ and MLE $\\hat{\\mu}$,\n",
+        "3. Poisson distribution $\\mathit{Pois}(\\lambda)$ and MLE $\\hat{\\lambda}$."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "oh5OFthnb-4f",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Problem 4 [1.5p]\n",
+        "\n",
+        "1D Gaussian manipulatoin for Kalman filters.\n",
+        "\n",
+        "A [1D Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter) tracks the location of an object given imprecise measurements of its location. At its core it performs an update of the form:\n",
+        "\n",
+        "$$\n",
+        "      p(x|m) = \\frac{p(m|x)p(x)}{p(m)} = \\frac{p(m|x)p(x)}{Z},\n",
+        "$$\n",
+        "\n",
+        "where:\n",
+        "- $p(x|m)$ is the updated belief about the location,\n",
+        "- $p(x) = \\mathcal{N}(\\mu=\\mu_x, \\sigma=\\sigma_x)$ is the belief about the location,\n",
+        "- $p(m|x) = \\mathcal{N}(\\mu=x, \\sigma=\\sigma_m)$ is the noisy measurement, centered on the location of the object,\n",
+        "- $Z = p(m) =\\int p(m|x)p(x) dx$ is a normalization constant not dependent on $x$.\n",
+        "\n",
+        "Compute $p(x|m)$.\n",
+        "\n",
+        "*Hint:* The product $\\mathcal{N}(x;\\mu_1, \\sigma_1)\\mathcal{N}(x;\\mu_2, \\sigma_2)$ ressembles an unnormalized probability distribution, which one? Can you normalize it by computing the mean and standard deviation and fitting it to a knoen PDF?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "00IQ8eslcHVI",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Problem 5 (Murphy, 2.17) [1p]\n",
+        "\n",
+        "Expected value of the minimum.\n",
+        "\n",
+        "Let $X, Y$ be sampled uniformily on the interval $[0,1]$. What is the expected value of $\\min(X,Y)$?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "bEETFqdrqBoj",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Problem 6 (Kohavi) [1p]\n",
+        "\n",
+        "The failure of leave-one-out evaluation. \n",
+        "\n",
+        "Consider a binary classification dataset in which the labels are assigned completely at random, with 50% probability given to either class. Assume you have a collected a dataset with 100 records in which exactly 50 of them belong to class 0 and 50 to class 1. \n",
+        "\n",
+        "What will be the leave-one-out accuracy of the majority voting classifier?\n",
+        "\n",
+        "NB: sometimes it is useful to equalize the number of classes in each fold of cross-validation, e.g. using the [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) implementation from SKlearn."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "yrnFTXpfcN1R",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Problem 7 [1pb]\n",
+        "Do Problem 7a from [Assignment 1](https://github.com/janchorowski/ml_uwr/blob/fall2019/assignment1/Assignment1.ipynb)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "3PO62-Ffy6z9",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Problem 8 [1bp]\n",
+        "\n",
+        "Many websites ([Reddit](reddit.com), [Wykop](wykop.pl), [StackOverflow](stackoverflow.com)) provide sorting of comments based on user votes. Discuss what are the implications when sorting by:\n",
+        "- difference between up- and down-votes\n",
+        "- mean score\n",
+        "- lower or upper confidence bound of the score\n",
+        "\n",
+        "At least for Reddit the sorting algorithm can be found online, what is it?"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "JHOYUVyoLdN6",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        ""
+      ],
+      "execution_count": 0,
+      "outputs": []
+    }
+  ]
+}