From 3b45dd0946acf5acd286fbf776759d5ab6585df4 Mon Sep 17 00:00:00 2001 From: Jan Chorowski Date: Thu, 17 Oct 2019 16:48:46 +0200 Subject: [PATCH] Created using Colaboratory --- homework1/Homework1.ipynb | 231 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 homework1/Homework1.ipynb diff --git a/homework1/Homework1.ipynb b/homework1/Homework1.ipynb new file mode 100644 index 0000000..69717e7 --- /dev/null +++ b/homework1/Homework1.ipynb @@ -0,0 +1,231 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Homework1.ipynb", + "provenance": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tkKY6us_cCg4", + "colab_type": "text" + }, + "source": [ + "# Homework 1\n", + "\n", + "**For exercises in the week 22-28.10.19**\n", + "\n", + "**Points: 7 + 2 bonus point**\n", + "\n", + "Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the backboard.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iYFL1cWQbv1D", + "colab_type": "text" + }, + "source": [ + "## Problem 1 (McKay 4.1) [1p]\n", + "\n", + "You are given a set of 12 balls in which:\n", + "- 11 balls are equal\n", + "- 1 ball is different (either heavier or lighter).\n", + "\n", + "You have a two-pan balance. How many weightings you must use to detect toe odd ball?\n", + "\n", + "*Hint:* A weighting can be seen as a random event. You can design them to maximize carry the most information, i.e. to maximize the entropy of their outcome." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-kxgvhPBb7hN", + "colab_type": "text" + }, + "source": [ + "## Problem 2 [1p]\n", + "\n", + "Bayes' theorem allows to reason about conditional probabilities of causes and their effects:\n", + "\n", + "\\begin{equation}\n", + "p(A,B)=p(A|B)p(B)=p(B|A)p(A)\n", + "\\end{equation}\n", + "\n", + "\\begin{equation}\n", + "p(A|B) = \\frac{p(B|A)p(A)}{p(B)}\n", + "\\end{equation}\n", + "\n", + "Bayes' theorem allows us to reason about probabilities of causes, when\n", + "we observe their results. Instead of directly answering the hard\n", + "question $p(\\text{cause}|\\text{result})$ we can instead separately\n", + "work out the marginal probabilities of causes $p(\\text{cause})$ and\n", + "carefully study their effects $p(\\text{effect}|\\text{cause})$.\n", + "\n", + "Solve the following using Bayes' theorem.\n", + "\n", + "1. There are two boxes on the table: box \\#1 holds two\n", + " black balls and eight red ones, box \\#2 holds 5 black ones and\n", + " 5 red ones. We pick a box at random (with equal probabilities),\n", + " and then a ball from that box.\n", + " 1. What is the probability, that the\n", + " ball came from box \\#1 if we happened to pick a red ball?\n", + " \n", + "1. The government has started a preventive program of\n", + " mandatory tests for the Ebola virus. Mass testing method is\n", + " imprecise, yielding 1% of false positives (healthy, but the test\n", + " indicates the virus) and 1% of false negatives (\n", + " having the virus but healthy according to test results).\n", + " As Ebola is rather infrequent, lets assume that it occurs in\n", + " one in a million people in Europe.\n", + " 1. What is the probability,\n", + " that a random European, who has been tested positive for Ebola\n", + " virus, is indeed a carrier?\n", + " 2. Suppose we have an additional information, that the person has just\n", + " arrived from a country where one in a thousand people is a carrier.\n", + " How much will be the increase in probability?\n", + " 3. How accurate should be the test, for a 80% probability of true\n", + " positive in a European?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "72HwrFKBb9Mn", + "colab_type": "text" + }, + "source": [ + "## Problem 3 [1.5p]\n", + "\n", + "Given observations $x_1,\\ldots,x_n$\n", + " coming from a certain distribution,\n", + " prove that MLE of a particular parameter of that distribution is equal to the sample mean $\\frac{1}{n}\\sum_{i=1}^n x_i$:\n", + "1. Bernoulli distribution with success probability $p$ and MLE $\\hat{p}$,\n", + "2. Gaussian distribution $\\mathcal{N}(\\mu,\\sigma)$ and MLE $\\hat{\\mu}$,\n", + "3. Poisson distribution $\\mathit{Pois}(\\lambda)$ and MLE $\\hat{\\lambda}$." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oh5OFthnb-4f", + "colab_type": "text" + }, + "source": [ + "## Problem 4 [1.5p]\n", + "\n", + "1D Gaussian manipulatoin for Kalman filters.\n", + "\n", + "A [1D Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter) tracks the location of an object given imprecise measurements of its location. At its core it performs an update of the form:\n", + "\n", + "$$\n", + " p(x|m) = \\frac{p(m|x)p(x)}{p(m)} = \\frac{p(m|x)p(x)}{Z},\n", + "$$\n", + "\n", + "where:\n", + "- $p(x|m)$ is the updated belief about the location,\n", + "- $p(x) = \\mathcal{N}(\\mu=\\mu_x, \\sigma=\\sigma_x)$ is the belief about the location,\n", + "- $p(m|x) = \\mathcal{N}(\\mu=x, \\sigma=\\sigma_m)$ is the noisy measurement, centered on the location of the object,\n", + "- $Z = p(m) =\\int p(m|x)p(x) dx$ is a normalization constant not dependent on $x$.\n", + "\n", + "Compute $p(x|m)$.\n", + "\n", + "*Hint:* The product $\\mathcal{N}(x;\\mu_1, \\sigma_1)\\mathcal{N}(x;\\mu_2, \\sigma_2)$ ressembles an unnormalized probability distribution, which one? Can you normalize it by computing the mean and standard deviation and fitting it to a knoen PDF?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "00IQ8eslcHVI", + "colab_type": "text" + }, + "source": [ + "## Problem 5 (Murphy, 2.17) [1p]\n", + "\n", + "Expected value of the minimum.\n", + "\n", + "Let $X, Y$ be sampled uniformily on the interval $[0,1]$. What is the expected value of $\\min(X,Y)$?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bEETFqdrqBoj", + "colab_type": "text" + }, + "source": [ + "## Problem 6 (Kohavi) [1p]\n", + "\n", + "The failure of leave-one-out evaluation. \n", + "\n", + "Consider a binary classification dataset in which the labels are assigned completely at random, with 50% probability given to either class. Assume you have a collected a dataset with 100 records in which exactly 50 of them belong to class 0 and 50 to class 1. \n", + "\n", + "What will be the leave-one-out accuracy of the majority voting classifier?\n", + "\n", + "NB: sometimes it is useful to equalize the number of classes in each fold of cross-validation, e.g. using the [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) implementation from SKlearn." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yrnFTXpfcN1R", + "colab_type": "text" + }, + "source": [ + "## Problem 7 [1pb]\n", + "Do Problem 7a from [Assignment 1](https://github.com/janchorowski/ml_uwr/blob/fall2019/assignment1/Assignment1.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3PO62-Ffy6z9", + "colab_type": "text" + }, + "source": [ + "## Problem 8 [1bp]\n", + "\n", + "Many websites ([Reddit](reddit.com), [Wykop](wykop.pl), [StackOverflow](stackoverflow.com)) provide sorting of comments based on user votes. Discuss what are the implications when sorting by:\n", + "- difference between up- and down-votes\n", + "- mean score\n", + "- lower or upper confidence bound of the score\n", + "\n", + "At least for Reddit the sorting algorithm can be found online, what is it?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JHOYUVyoLdN6", + "colab_type": "code", + "colab": {} + }, + "source": [ + "" + ], + "execution_count": 0, + "outputs": [] + } + ] +} \ No newline at end of file