diff --git a/homework1/Homework1.ipynb b/homework1/Homework1.ipynb index 69717e7..a1c516b 100644 --- a/homework1/Homework1.ipynb +++ b/homework1/Homework1.ipynb @@ -1,231 +1,232 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "name": "Homework1.ipynb", - "provenance": [], - "include_colab_link": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "view-in-github" + }, + "source": [ + "\"Open" + ] }, - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "view-in-github", - "colab_type": "text" - }, - "source": [ - "\"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tkKY6us_cCg4", - "colab_type": "text" - }, - "source": [ - "# Homework 1\n", - "\n", - "**For exercises in the week 22-28.10.19**\n", - "\n", - "**Points: 7 + 2 bonus point**\n", - "\n", - "Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the backboard.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "iYFL1cWQbv1D", - "colab_type": "text" - }, - "source": [ - "## Problem 1 (McKay 4.1) [1p]\n", - "\n", - "You are given a set of 12 balls in which:\n", - "- 11 balls are equal\n", - "- 1 ball is different (either heavier or lighter).\n", - "\n", - "You have a two-pan balance. How many weightings you must use to detect toe odd ball?\n", - "\n", - "*Hint:* A weighting can be seen as a random event. You can design them to maximize carry the most information, i.e. to maximize the entropy of their outcome." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-kxgvhPBb7hN", - "colab_type": "text" - }, - "source": [ - "## Problem 2 [1p]\n", - "\n", - "Bayes' theorem allows to reason about conditional probabilities of causes and their effects:\n", - "\n", - "\\begin{equation}\n", - "p(A,B)=p(A|B)p(B)=p(B|A)p(A)\n", - "\\end{equation}\n", - "\n", - "\\begin{equation}\n", - "p(A|B) = \\frac{p(B|A)p(A)}{p(B)}\n", - "\\end{equation}\n", - "\n", - "Bayes' theorem allows us to reason about probabilities of causes, when\n", - "we observe their results. Instead of directly answering the hard\n", - "question $p(\\text{cause}|\\text{result})$ we can instead separately\n", - "work out the marginal probabilities of causes $p(\\text{cause})$ and\n", - "carefully study their effects $p(\\text{effect}|\\text{cause})$.\n", - "\n", - "Solve the following using Bayes' theorem.\n", - "\n", - "1. There are two boxes on the table: box \\#1 holds two\n", - " black balls and eight red ones, box \\#2 holds 5 black ones and\n", - " 5 red ones. We pick a box at random (with equal probabilities),\n", - " and then a ball from that box.\n", - " 1. What is the probability, that the\n", - " ball came from box \\#1 if we happened to pick a red ball?\n", - " \n", - "1. The government has started a preventive program of\n", - " mandatory tests for the Ebola virus. Mass testing method is\n", - " imprecise, yielding 1% of false positives (healthy, but the test\n", - " indicates the virus) and 1% of false negatives (\n", - " having the virus but healthy according to test results).\n", - " As Ebola is rather infrequent, lets assume that it occurs in\n", - " one in a million people in Europe.\n", - " 1. What is the probability,\n", - " that a random European, who has been tested positive for Ebola\n", - " virus, is indeed a carrier?\n", - " 2. Suppose we have an additional information, that the person has just\n", - " arrived from a country where one in a thousand people is a carrier.\n", - " How much will be the increase in probability?\n", - " 3. How accurate should be the test, for a 80% probability of true\n", - " positive in a European?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "72HwrFKBb9Mn", - "colab_type": "text" - }, - "source": [ - "## Problem 3 [1.5p]\n", - "\n", - "Given observations $x_1,\\ldots,x_n$\n", - " coming from a certain distribution,\n", - " prove that MLE of a particular parameter of that distribution is equal to the sample mean $\\frac{1}{n}\\sum_{i=1}^n x_i$:\n", - "1. Bernoulli distribution with success probability $p$ and MLE $\\hat{p}$,\n", - "2. Gaussian distribution $\\mathcal{N}(\\mu,\\sigma)$ and MLE $\\hat{\\mu}$,\n", - "3. Poisson distribution $\\mathit{Pois}(\\lambda)$ and MLE $\\hat{\\lambda}$." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "oh5OFthnb-4f", - "colab_type": "text" - }, - "source": [ - "## Problem 4 [1.5p]\n", - "\n", - "1D Gaussian manipulatoin for Kalman filters.\n", - "\n", - "A [1D Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter) tracks the location of an object given imprecise measurements of its location. At its core it performs an update of the form:\n", - "\n", - "$$\n", - " p(x|m) = \\frac{p(m|x)p(x)}{p(m)} = \\frac{p(m|x)p(x)}{Z},\n", - "$$\n", - "\n", - "where:\n", - "- $p(x|m)$ is the updated belief about the location,\n", - "- $p(x) = \\mathcal{N}(\\mu=\\mu_x, \\sigma=\\sigma_x)$ is the belief about the location,\n", - "- $p(m|x) = \\mathcal{N}(\\mu=x, \\sigma=\\sigma_m)$ is the noisy measurement, centered on the location of the object,\n", - "- $Z = p(m) =\\int p(m|x)p(x) dx$ is a normalization constant not dependent on $x$.\n", - "\n", - "Compute $p(x|m)$.\n", - "\n", - "*Hint:* The product $\\mathcal{N}(x;\\mu_1, \\sigma_1)\\mathcal{N}(x;\\mu_2, \\sigma_2)$ ressembles an unnormalized probability distribution, which one? Can you normalize it by computing the mean and standard deviation and fitting it to a knoen PDF?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "00IQ8eslcHVI", - "colab_type": "text" - }, - "source": [ - "## Problem 5 (Murphy, 2.17) [1p]\n", - "\n", - "Expected value of the minimum.\n", - "\n", - "Let $X, Y$ be sampled uniformily on the interval $[0,1]$. What is the expected value of $\\min(X,Y)$?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bEETFqdrqBoj", - "colab_type": "text" - }, - "source": [ - "## Problem 6 (Kohavi) [1p]\n", - "\n", - "The failure of leave-one-out evaluation. \n", - "\n", - "Consider a binary classification dataset in which the labels are assigned completely at random, with 50% probability given to either class. Assume you have a collected a dataset with 100 records in which exactly 50 of them belong to class 0 and 50 to class 1. \n", - "\n", - "What will be the leave-one-out accuracy of the majority voting classifier?\n", - "\n", - "NB: sometimes it is useful to equalize the number of classes in each fold of cross-validation, e.g. using the [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) implementation from SKlearn." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "yrnFTXpfcN1R", - "colab_type": "text" - }, - "source": [ - "## Problem 7 [1pb]\n", - "Do Problem 7a from [Assignment 1](https://github.com/janchorowski/ml_uwr/blob/fall2019/assignment1/Assignment1.ipynb)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "3PO62-Ffy6z9", - "colab_type": "text" - }, - "source": [ - "## Problem 8 [1bp]\n", - "\n", - "Many websites ([Reddit](reddit.com), [Wykop](wykop.pl), [StackOverflow](stackoverflow.com)) provide sorting of comments based on user votes. Discuss what are the implications when sorting by:\n", - "- difference between up- and down-votes\n", - "- mean score\n", - "- lower or upper confidence bound of the score\n", - "\n", - "At least for Reddit the sorting algorithm can be found online, what is it?" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "JHOYUVyoLdN6", - "colab_type": "code", - "colab": {} - }, - "source": [ - "" - ], - "execution_count": 0, - "outputs": [] - } - ] -} \ No newline at end of file + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "tkKY6us_cCg4" + }, + "source": [ + "# Homework 1\n", + "\n", + "**For exercises in the week 22-28.10.19**\n", + "\n", + "**Points: 7 + 2 bonus point**\n", + "\n", + "Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the blackboard.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "iYFL1cWQbv1D" + }, + "source": [ + "## Problem 1 (McKay 4.1) [1p]\n", + "\n", + "You are given a set of 12 balls in which:\n", + "- 11 balls are equal\n", + "- 1 ball is different (either heavier or lighter).\n", + "\n", + "You have a two-pan balance. How many weightings you must use to detect the odd ball?\n", + "\n", + "*Hint:* A weighting can be seen as a random event. You can design them to carry the most information, i.e. to maximize the entropy of their outcome." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "-kxgvhPBb7hN" + }, + "source": [ + "## Problem 2 [1p]\n", + "\n", + "Bayes' theorem allows to reason about conditional probabilities of causes and their effects:\n", + "\n", + "\\begin{equation}\n", + "p(A,B)=p(A|B)p(B)=p(B|A)p(A)\n", + "\\end{equation}\n", + "\n", + "\\begin{equation}\n", + "p(A|B) = \\frac{p(B|A)p(A)}{p(B)}\n", + "\\end{equation}\n", + "\n", + "Bayes' theorem allows us to reason about probabilities of causes, when\n", + "we observe their results. Instead of directly answering the hard\n", + "question $p(\\text{cause}|\\text{result})$ we can instead separately\n", + "work out the marginal probabilities of causes $p(\\text{cause})$ and\n", + "carefully study their effects $p(\\text{effect}|\\text{cause})$.\n", + "\n", + "Solve the following using Bayes' theorem.\n", + "\n", + "1. There are two boxes on the table: box \\#1 holds two\n", + " black balls and eight red ones, box \\#2 holds 5 black ones and\n", + " 5 red ones. We pick a box at random (with equal probabilities),\n", + " and then a ball from that box.\n", + " 1. What is the probability, that the\n", + " ball came from box \\#1 if we happened to pick a red ball?\n", + " \n", + "1. The government has started a preventive program of\n", + " mandatory tests for the Ebola virus. Mass testing method is\n", + " imprecise, yielding 1% of false positives (healthy, but the test\n", + " indicates the virus) and 1% of false negatives\n", + " (having the virus but healthy according to test results).\n", + " As Ebola is rather infrequent, lets assume that it occurs in\n", + " one in a million people in Europe.\n", + " 1. What is the probability,\n", + " that a random European, who has been tested positive for Ebola\n", + " virus, is indeed a carrier?\n", + " 2. Suppose we have an additional information, that the person has just\n", + " arrived from a country where one in a thousand people is a carrier.\n", + " How big will be the increase in probability?\n", + " 3. How accurate should be the test, for a 80% probability of true\n", + " positive in a European?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "72HwrFKBb9Mn" + }, + "source": [ + "## Problem 3 [1.5p]\n", + "\n", + "Given observations $x_1,\\ldots,x_n$\n", + " coming from a certain distribution,\n", + " prove that MLE of a particular parameter of that distribution is equal to the sample mean $\\frac{1}{n}\\sum_{i=1}^n x_i$:\n", + "1. Bernoulli distribution with success probability $p$ and MLE $\\hat{p}$,\n", + "2. Gaussian distribution $\\mathcal{N}(\\mu,\\sigma)$ and MLE $\\hat{\\mu}$,\n", + "3. Poisson distribution $\\mathit{Pois}(\\lambda)$ and MLE $\\hat{\\lambda}$." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "oh5OFthnb-4f" + }, + "source": [ + "## Problem 4 [1.5p]\n", + "\n", + "1D Gaussian manipulation for Kalman filters.\n", + "\n", + "A [1D Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter) tracks the location of an object given imprecise measurements of its location. At its core it performs an update of the form:\n", + "\n", + "$$\n", + " p(x|m) = \\frac{p(m|x)p(x)}{p(m)} = \\frac{p(m|x)p(x)}{Z},\n", + "$$\n", + "\n", + "where:\n", + "\n", + "- $p(x|m)$ is the updated belief about the location,\n", + "- $p(x) = \\mathcal{N}(\\mu=\\mu_x, \\sigma=\\sigma_x)$ is the belief about the location,\n", + "- $p(m|x) = \\mathcal{N}(\\mu=x, \\sigma=\\sigma_m)$ is the noisy measurement, centered on the location of the object,\n", + "- $Z = p(m) =\\int p(m|x)p(x) dx$ is a normalization constant not dependent on $x$.\n", + "\n", + "Compute $p(x|m)$.\n", + "\n", + "*Hint:* The product $\\mathcal{N}(x;\\mu_1, \\sigma_1)\\mathcal{N}(x;\\mu_2, \\sigma_2)$ resembles an unnormalized probability distribution, which one? Can you normalize it by computing the mean and standard deviation and fitting it to a known PDF?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "00IQ8eslcHVI" + }, + "source": [ + "## Problem 5 (Murphy, 2.17) [1p]\n", + "\n", + "Expected value of the minimum.\n", + "\n", + "Let $X, Y$ be sampled uniformly on the interval $[0,1]$. What is the expected value of $\\min(X,Y)$?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "bEETFqdrqBoj" + }, + "source": [ + "## Problem 6 (Kohavi) [1p]\n", + "\n", + "The failure of leave-one-out evaluation. \n", + "\n", + "Consider a binary classification dataset in which the labels are assigned completely at random, with 50% probability given to either class. Assume you have a collected a dataset with 100 records in which exactly 50 of them belong to class 0 and 50 to class 1. \n", + "\n", + "What will be the leave-one-out accuracy of the majority voting classifier?\n", + "\n", + "NB: To avoid similar effects, sometimes it is useful to equalize the number of classes in each fold of cross-validation, e.g. using the [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) implementation from SKlearn. Somewhat obviously, this will not work with leave-one-out validation – but it will with leave-two-out already, at least in the binary case." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "yrnFTXpfcN1R" + }, + "source": [ + "## Problem 7 [1pb]\n", + "Do Problem 7a from [Assignment 1](https://github.com/janchorowski/ml_uwr/blob/fall2019/assignment1/Assignment1.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "3PO62-Ffy6z9" + }, + "source": [ + "## Problem 8 [1bp]\n", + "\n", + "Many websites ([Reddit](https://reddit.com), [Wykop](https://wykop.pl), [StackOverflow](https://stackoverflow.com)) provide sorting of comments based on user votes. Discuss what are the implications when sorting by:\n", + "- difference between up- and down-votes\n", + "- mean score\n", + "- lower or upper confidence bound of the score\n", + "\n", + "At least for Reddit the sorting algorithm can be found online, what is it?" + ] + } + ], + "metadata": { + "colab": { + "include_colab_link": true, + "name": "Homework1.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}