Skip to content

Commit

Permalink
Created using Colaboratory
Browse files Browse the repository at this point in the history
  • Loading branch information
janchorowski committed Oct 17, 2019
1 parent 3609a1f commit 3b45dd0
Showing 1 changed file with 231 additions and 0 deletions.
231 changes: 231 additions & 0 deletions homework1/Homework1.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Homework1.ipynb",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/janchorowski/ml_uwr/blob/fall2019/homework1/Homework1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tkKY6us_cCg4",
"colab_type": "text"
},
"source": [
"# Homework 1\n",
"\n",
"**For exercises in the week 22-28.10.19**\n",
"\n",
"**Points: 7 + 2 bonus point**\n",
"\n",
"Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the backboard.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iYFL1cWQbv1D",
"colab_type": "text"
},
"source": [
"## Problem 1 (McKay 4.1) [1p]\n",
"\n",
"You are given a set of 12 balls in which:\n",
"- 11 balls are equal\n",
"- 1 ball is different (either heavier or lighter).\n",
"\n",
"You have a two-pan balance. How many weightings you must use to detect toe odd ball?\n",
"\n",
"*Hint:* A weighting can be seen as a random event. You can design them to maximize carry the most information, i.e. to maximize the entropy of their outcome."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-kxgvhPBb7hN",
"colab_type": "text"
},
"source": [
"## Problem 2 [1p]\n",
"\n",
"Bayes' theorem allows to reason about conditional probabilities of causes and their effects:\n",
"\n",
"\\begin{equation}\n",
"p(A,B)=p(A|B)p(B)=p(B|A)p(A)\n",
"\\end{equation}\n",
"\n",
"\\begin{equation}\n",
"p(A|B) = \\frac{p(B|A)p(A)}{p(B)}\n",
"\\end{equation}\n",
"\n",
"Bayes' theorem allows us to reason about probabilities of causes, when\n",
"we observe their results. Instead of directly answering the hard\n",
"question $p(\\text{cause}|\\text{result})$ we can instead separately\n",
"work out the marginal probabilities of causes $p(\\text{cause})$ and\n",
"carefully study their effects $p(\\text{effect}|\\text{cause})$.\n",
"\n",
"Solve the following using Bayes' theorem.\n",
"\n",
"1. There are two boxes on the table: box \\#1 holds two\n",
" black balls and eight red ones, box \\#2 holds 5 black ones and\n",
" 5 red ones. We pick a box at random (with equal probabilities),\n",
" and then a ball from that box.\n",
" 1. What is the probability, that the\n",
" ball came from box \\#1 if we happened to pick a red ball?\n",
" \n",
"1. The government has started a preventive program of\n",
" mandatory tests for the Ebola virus. Mass testing method is\n",
" imprecise, yielding 1% of false positives (healthy, but the test\n",
" indicates the virus) and 1% of false negatives (\n",
" having the virus but healthy according to test results).\n",
" As Ebola is rather infrequent, lets assume that it occurs in\n",
" one in a million people in Europe.\n",
" 1. What is the probability,\n",
" that a random European, who has been tested positive for Ebola\n",
" virus, is indeed a carrier?\n",
" 2. Suppose we have an additional information, that the person has just\n",
" arrived from a country where one in a thousand people is a carrier.\n",
" How much will be the increase in probability?\n",
" 3. How accurate should be the test, for a 80% probability of true\n",
" positive in a European?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "72HwrFKBb9Mn",
"colab_type": "text"
},
"source": [
"## Problem 3 [1.5p]\n",
"\n",
"Given observations $x_1,\\ldots,x_n$\n",
" coming from a certain distribution,\n",
" prove that MLE of a particular parameter of that distribution is equal to the sample mean $\\frac{1}{n}\\sum_{i=1}^n x_i$:\n",
"1. Bernoulli distribution with success probability $p$ and MLE $\\hat{p}$,\n",
"2. Gaussian distribution $\\mathcal{N}(\\mu,\\sigma)$ and MLE $\\hat{\\mu}$,\n",
"3. Poisson distribution $\\mathit{Pois}(\\lambda)$ and MLE $\\hat{\\lambda}$."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oh5OFthnb-4f",
"colab_type": "text"
},
"source": [
"## Problem 4 [1.5p]\n",
"\n",
"1D Gaussian manipulatoin for Kalman filters.\n",
"\n",
"A [1D Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter) tracks the location of an object given imprecise measurements of its location. At its core it performs an update of the form:\n",
"\n",
"$$\n",
" p(x|m) = \\frac{p(m|x)p(x)}{p(m)} = \\frac{p(m|x)p(x)}{Z},\n",
"$$\n",
"\n",
"where:\n",
"- $p(x|m)$ is the updated belief about the location,\n",
"- $p(x) = \\mathcal{N}(\\mu=\\mu_x, \\sigma=\\sigma_x)$ is the belief about the location,\n",
"- $p(m|x) = \\mathcal{N}(\\mu=x, \\sigma=\\sigma_m)$ is the noisy measurement, centered on the location of the object,\n",
"- $Z = p(m) =\\int p(m|x)p(x) dx$ is a normalization constant not dependent on $x$.\n",
"\n",
"Compute $p(x|m)$.\n",
"\n",
"*Hint:* The product $\\mathcal{N}(x;\\mu_1, \\sigma_1)\\mathcal{N}(x;\\mu_2, \\sigma_2)$ ressembles an unnormalized probability distribution, which one? Can you normalize it by computing the mean and standard deviation and fitting it to a knoen PDF?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "00IQ8eslcHVI",
"colab_type": "text"
},
"source": [
"## Problem 5 (Murphy, 2.17) [1p]\n",
"\n",
"Expected value of the minimum.\n",
"\n",
"Let $X, Y$ be sampled uniformily on the interval $[0,1]$. What is the expected value of $\\min(X,Y)$?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bEETFqdrqBoj",
"colab_type": "text"
},
"source": [
"## Problem 6 (Kohavi) [1p]\n",
"\n",
"The failure of leave-one-out evaluation. \n",
"\n",
"Consider a binary classification dataset in which the labels are assigned completely at random, with 50% probability given to either class. Assume you have a collected a dataset with 100 records in which exactly 50 of them belong to class 0 and 50 to class 1. \n",
"\n",
"What will be the leave-one-out accuracy of the majority voting classifier?\n",
"\n",
"NB: sometimes it is useful to equalize the number of classes in each fold of cross-validation, e.g. using the [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) implementation from SKlearn."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yrnFTXpfcN1R",
"colab_type": "text"
},
"source": [
"## Problem 7 [1pb]\n",
"Do Problem 7a from [Assignment 1](https://github.com/janchorowski/ml_uwr/blob/fall2019/assignment1/Assignment1.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3PO62-Ffy6z9",
"colab_type": "text"
},
"source": [
"## Problem 8 [1bp]\n",
"\n",
"Many websites ([Reddit](reddit.com), [Wykop](wykop.pl), [StackOverflow](stackoverflow.com)) provide sorting of comments based on user votes. Discuss what are the implications when sorting by:\n",
"- difference between up- and down-votes\n",
"- mean score\n",
"- lower or upper confidence bound of the score\n",
"\n",
"At least for Reddit the sorting algorithm can be found online, what is it?"
]
},
{
"cell_type": "code",
"metadata": {
"id": "JHOYUVyoLdN6",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": 0,
"outputs": []
}
]
}

0 comments on commit 3b45dd0

Please sign in to comment.