diff --git a/homework1/Homework1.ipynb b/homework1/Homework1.ipynb
index 69717e7..a1c516b 100644
--- a/homework1/Homework1.ipynb
+++ b/homework1/Homework1.ipynb
@@ -1,231 +1,232 @@
{
- "nbformat": 4,
- "nbformat_minor": 0,
- "metadata": {
- "colab": {
- "name": "Homework1.ipynb",
- "provenance": [],
- "include_colab_link": true
- },
- "kernelspec": {
- "name": "python3",
- "display_name": "Python 3"
- }
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "view-in-github"
+ },
+ "source": [
+ ""
+ ]
},
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "view-in-github",
- "colab_type": "text"
- },
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "tkKY6us_cCg4",
- "colab_type": "text"
- },
- "source": [
- "# Homework 1\n",
- "\n",
- "**For exercises in the week 22-28.10.19**\n",
- "\n",
- "**Points: 7 + 2 bonus point**\n",
- "\n",
- "Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the backboard.\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "iYFL1cWQbv1D",
- "colab_type": "text"
- },
- "source": [
- "## Problem 1 (McKay 4.1) [1p]\n",
- "\n",
- "You are given a set of 12 balls in which:\n",
- "- 11 balls are equal\n",
- "- 1 ball is different (either heavier or lighter).\n",
- "\n",
- "You have a two-pan balance. How many weightings you must use to detect toe odd ball?\n",
- "\n",
- "*Hint:* A weighting can be seen as a random event. You can design them to maximize carry the most information, i.e. to maximize the entropy of their outcome."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "-kxgvhPBb7hN",
- "colab_type": "text"
- },
- "source": [
- "## Problem 2 [1p]\n",
- "\n",
- "Bayes' theorem allows to reason about conditional probabilities of causes and their effects:\n",
- "\n",
- "\\begin{equation}\n",
- "p(A,B)=p(A|B)p(B)=p(B|A)p(A)\n",
- "\\end{equation}\n",
- "\n",
- "\\begin{equation}\n",
- "p(A|B) = \\frac{p(B|A)p(A)}{p(B)}\n",
- "\\end{equation}\n",
- "\n",
- "Bayes' theorem allows us to reason about probabilities of causes, when\n",
- "we observe their results. Instead of directly answering the hard\n",
- "question $p(\\text{cause}|\\text{result})$ we can instead separately\n",
- "work out the marginal probabilities of causes $p(\\text{cause})$ and\n",
- "carefully study their effects $p(\\text{effect}|\\text{cause})$.\n",
- "\n",
- "Solve the following using Bayes' theorem.\n",
- "\n",
- "1. There are two boxes on the table: box \\#1 holds two\n",
- " black balls and eight red ones, box \\#2 holds 5 black ones and\n",
- " 5 red ones. We pick a box at random (with equal probabilities),\n",
- " and then a ball from that box.\n",
- " 1. What is the probability, that the\n",
- " ball came from box \\#1 if we happened to pick a red ball?\n",
- " \n",
- "1. The government has started a preventive program of\n",
- " mandatory tests for the Ebola virus. Mass testing method is\n",
- " imprecise, yielding 1% of false positives (healthy, but the test\n",
- " indicates the virus) and 1% of false negatives (\n",
- " having the virus but healthy according to test results).\n",
- " As Ebola is rather infrequent, lets assume that it occurs in\n",
- " one in a million people in Europe.\n",
- " 1. What is the probability,\n",
- " that a random European, who has been tested positive for Ebola\n",
- " virus, is indeed a carrier?\n",
- " 2. Suppose we have an additional information, that the person has just\n",
- " arrived from a country where one in a thousand people is a carrier.\n",
- " How much will be the increase in probability?\n",
- " 3. How accurate should be the test, for a 80% probability of true\n",
- " positive in a European?"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "72HwrFKBb9Mn",
- "colab_type": "text"
- },
- "source": [
- "## Problem 3 [1.5p]\n",
- "\n",
- "Given observations $x_1,\\ldots,x_n$\n",
- " coming from a certain distribution,\n",
- " prove that MLE of a particular parameter of that distribution is equal to the sample mean $\\frac{1}{n}\\sum_{i=1}^n x_i$:\n",
- "1. Bernoulli distribution with success probability $p$ and MLE $\\hat{p}$,\n",
- "2. Gaussian distribution $\\mathcal{N}(\\mu,\\sigma)$ and MLE $\\hat{\\mu}$,\n",
- "3. Poisson distribution $\\mathit{Pois}(\\lambda)$ and MLE $\\hat{\\lambda}$."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "oh5OFthnb-4f",
- "colab_type": "text"
- },
- "source": [
- "## Problem 4 [1.5p]\n",
- "\n",
- "1D Gaussian manipulatoin for Kalman filters.\n",
- "\n",
- "A [1D Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter) tracks the location of an object given imprecise measurements of its location. At its core it performs an update of the form:\n",
- "\n",
- "$$\n",
- " p(x|m) = \\frac{p(m|x)p(x)}{p(m)} = \\frac{p(m|x)p(x)}{Z},\n",
- "$$\n",
- "\n",
- "where:\n",
- "- $p(x|m)$ is the updated belief about the location,\n",
- "- $p(x) = \\mathcal{N}(\\mu=\\mu_x, \\sigma=\\sigma_x)$ is the belief about the location,\n",
- "- $p(m|x) = \\mathcal{N}(\\mu=x, \\sigma=\\sigma_m)$ is the noisy measurement, centered on the location of the object,\n",
- "- $Z = p(m) =\\int p(m|x)p(x) dx$ is a normalization constant not dependent on $x$.\n",
- "\n",
- "Compute $p(x|m)$.\n",
- "\n",
- "*Hint:* The product $\\mathcal{N}(x;\\mu_1, \\sigma_1)\\mathcal{N}(x;\\mu_2, \\sigma_2)$ ressembles an unnormalized probability distribution, which one? Can you normalize it by computing the mean and standard deviation and fitting it to a knoen PDF?"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "00IQ8eslcHVI",
- "colab_type": "text"
- },
- "source": [
- "## Problem 5 (Murphy, 2.17) [1p]\n",
- "\n",
- "Expected value of the minimum.\n",
- "\n",
- "Let $X, Y$ be sampled uniformily on the interval $[0,1]$. What is the expected value of $\\min(X,Y)$?"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "bEETFqdrqBoj",
- "colab_type": "text"
- },
- "source": [
- "## Problem 6 (Kohavi) [1p]\n",
- "\n",
- "The failure of leave-one-out evaluation. \n",
- "\n",
- "Consider a binary classification dataset in which the labels are assigned completely at random, with 50% probability given to either class. Assume you have a collected a dataset with 100 records in which exactly 50 of them belong to class 0 and 50 to class 1. \n",
- "\n",
- "What will be the leave-one-out accuracy of the majority voting classifier?\n",
- "\n",
- "NB: sometimes it is useful to equalize the number of classes in each fold of cross-validation, e.g. using the [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) implementation from SKlearn."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "yrnFTXpfcN1R",
- "colab_type": "text"
- },
- "source": [
- "## Problem 7 [1pb]\n",
- "Do Problem 7a from [Assignment 1](https://github.com/janchorowski/ml_uwr/blob/fall2019/assignment1/Assignment1.ipynb)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "3PO62-Ffy6z9",
- "colab_type": "text"
- },
- "source": [
- "## Problem 8 [1bp]\n",
- "\n",
- "Many websites ([Reddit](reddit.com), [Wykop](wykop.pl), [StackOverflow](stackoverflow.com)) provide sorting of comments based on user votes. Discuss what are the implications when sorting by:\n",
- "- difference between up- and down-votes\n",
- "- mean score\n",
- "- lower or upper confidence bound of the score\n",
- "\n",
- "At least for Reddit the sorting algorithm can be found online, what is it?"
- ]
- },
- {
- "cell_type": "code",
- "metadata": {
- "id": "JHOYUVyoLdN6",
- "colab_type": "code",
- "colab": {}
- },
- "source": [
- ""
- ],
- "execution_count": 0,
- "outputs": []
- }
- ]
-}
\ No newline at end of file
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "tkKY6us_cCg4"
+ },
+ "source": [
+ "# Homework 1\n",
+ "\n",
+ "**For exercises in the week 22-28.10.19**\n",
+ "\n",
+ "**Points: 7 + 2 bonus point**\n",
+ "\n",
+ "Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the blackboard.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "iYFL1cWQbv1D"
+ },
+ "source": [
+ "## Problem 1 (McKay 4.1) [1p]\n",
+ "\n",
+ "You are given a set of 12 balls in which:\n",
+ "- 11 balls are equal\n",
+ "- 1 ball is different (either heavier or lighter).\n",
+ "\n",
+ "You have a two-pan balance. How many weightings you must use to detect the odd ball?\n",
+ "\n",
+ "*Hint:* A weighting can be seen as a random event. You can design them to carry the most information, i.e. to maximize the entropy of their outcome."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "-kxgvhPBb7hN"
+ },
+ "source": [
+ "## Problem 2 [1p]\n",
+ "\n",
+ "Bayes' theorem allows to reason about conditional probabilities of causes and their effects:\n",
+ "\n",
+ "\\begin{equation}\n",
+ "p(A,B)=p(A|B)p(B)=p(B|A)p(A)\n",
+ "\\end{equation}\n",
+ "\n",
+ "\\begin{equation}\n",
+ "p(A|B) = \\frac{p(B|A)p(A)}{p(B)}\n",
+ "\\end{equation}\n",
+ "\n",
+ "Bayes' theorem allows us to reason about probabilities of causes, when\n",
+ "we observe their results. Instead of directly answering the hard\n",
+ "question $p(\\text{cause}|\\text{result})$ we can instead separately\n",
+ "work out the marginal probabilities of causes $p(\\text{cause})$ and\n",
+ "carefully study their effects $p(\\text{effect}|\\text{cause})$.\n",
+ "\n",
+ "Solve the following using Bayes' theorem.\n",
+ "\n",
+ "1. There are two boxes on the table: box \\#1 holds two\n",
+ " black balls and eight red ones, box \\#2 holds 5 black ones and\n",
+ " 5 red ones. We pick a box at random (with equal probabilities),\n",
+ " and then a ball from that box.\n",
+ " 1. What is the probability, that the\n",
+ " ball came from box \\#1 if we happened to pick a red ball?\n",
+ " \n",
+ "1. The government has started a preventive program of\n",
+ " mandatory tests for the Ebola virus. Mass testing method is\n",
+ " imprecise, yielding 1% of false positives (healthy, but the test\n",
+ " indicates the virus) and 1% of false negatives\n",
+ " (having the virus but healthy according to test results).\n",
+ " As Ebola is rather infrequent, lets assume that it occurs in\n",
+ " one in a million people in Europe.\n",
+ " 1. What is the probability,\n",
+ " that a random European, who has been tested positive for Ebola\n",
+ " virus, is indeed a carrier?\n",
+ " 2. Suppose we have an additional information, that the person has just\n",
+ " arrived from a country where one in a thousand people is a carrier.\n",
+ " How big will be the increase in probability?\n",
+ " 3. How accurate should be the test, for a 80% probability of true\n",
+ " positive in a European?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "72HwrFKBb9Mn"
+ },
+ "source": [
+ "## Problem 3 [1.5p]\n",
+ "\n",
+ "Given observations $x_1,\\ldots,x_n$\n",
+ " coming from a certain distribution,\n",
+ " prove that MLE of a particular parameter of that distribution is equal to the sample mean $\\frac{1}{n}\\sum_{i=1}^n x_i$:\n",
+ "1. Bernoulli distribution with success probability $p$ and MLE $\\hat{p}$,\n",
+ "2. Gaussian distribution $\\mathcal{N}(\\mu,\\sigma)$ and MLE $\\hat{\\mu}$,\n",
+ "3. Poisson distribution $\\mathit{Pois}(\\lambda)$ and MLE $\\hat{\\lambda}$."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "oh5OFthnb-4f"
+ },
+ "source": [
+ "## Problem 4 [1.5p]\n",
+ "\n",
+ "1D Gaussian manipulation for Kalman filters.\n",
+ "\n",
+ "A [1D Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter) tracks the location of an object given imprecise measurements of its location. At its core it performs an update of the form:\n",
+ "\n",
+ "$$\n",
+ " p(x|m) = \\frac{p(m|x)p(x)}{p(m)} = \\frac{p(m|x)p(x)}{Z},\n",
+ "$$\n",
+ "\n",
+ "where:\n",
+ "\n",
+ "- $p(x|m)$ is the updated belief about the location,\n",
+ "- $p(x) = \\mathcal{N}(\\mu=\\mu_x, \\sigma=\\sigma_x)$ is the belief about the location,\n",
+ "- $p(m|x) = \\mathcal{N}(\\mu=x, \\sigma=\\sigma_m)$ is the noisy measurement, centered on the location of the object,\n",
+ "- $Z = p(m) =\\int p(m|x)p(x) dx$ is a normalization constant not dependent on $x$.\n",
+ "\n",
+ "Compute $p(x|m)$.\n",
+ "\n",
+ "*Hint:* The product $\\mathcal{N}(x;\\mu_1, \\sigma_1)\\mathcal{N}(x;\\mu_2, \\sigma_2)$ resembles an unnormalized probability distribution, which one? Can you normalize it by computing the mean and standard deviation and fitting it to a known PDF?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "00IQ8eslcHVI"
+ },
+ "source": [
+ "## Problem 5 (Murphy, 2.17) [1p]\n",
+ "\n",
+ "Expected value of the minimum.\n",
+ "\n",
+ "Let $X, Y$ be sampled uniformly on the interval $[0,1]$. What is the expected value of $\\min(X,Y)$?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "bEETFqdrqBoj"
+ },
+ "source": [
+ "## Problem 6 (Kohavi) [1p]\n",
+ "\n",
+ "The failure of leave-one-out evaluation. \n",
+ "\n",
+ "Consider a binary classification dataset in which the labels are assigned completely at random, with 50% probability given to either class. Assume you have a collected a dataset with 100 records in which exactly 50 of them belong to class 0 and 50 to class 1. \n",
+ "\n",
+ "What will be the leave-one-out accuracy of the majority voting classifier?\n",
+ "\n",
+ "NB: To avoid similar effects, sometimes it is useful to equalize the number of classes in each fold of cross-validation, e.g. using the [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) implementation from SKlearn. Somewhat obviously, this will not work with leave-one-out validation – but it will with leave-two-out already, at least in the binary case."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "yrnFTXpfcN1R"
+ },
+ "source": [
+ "## Problem 7 [1pb]\n",
+ "Do Problem 7a from [Assignment 1](https://github.com/janchorowski/ml_uwr/blob/fall2019/assignment1/Assignment1.ipynb)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "3PO62-Ffy6z9"
+ },
+ "source": [
+ "## Problem 8 [1bp]\n",
+ "\n",
+ "Many websites ([Reddit](https://reddit.com), [Wykop](https://wykop.pl), [StackOverflow](https://stackoverflow.com)) provide sorting of comments based on user votes. Discuss what are the implications when sorting by:\n",
+ "- difference between up- and down-votes\n",
+ "- mean score\n",
+ "- lower or upper confidence bound of the score\n",
+ "\n",
+ "At least for Reddit the sorting algorithm can be found online, what is it?"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "include_colab_link": true,
+ "name": "Homework1.ipynb",
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}