forked from janchorowski/ml_uwr
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3609a1f
commit 3b45dd0
Showing
1 changed file
with
231 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,231 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"name": "Homework1.ipynb", | ||
"provenance": [], | ||
"include_colab_link": true | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "view-in-github", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"<a href=\"https://colab.research.google.com/github/janchorowski/ml_uwr/blob/fall2019/homework1/Homework1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "tkKY6us_cCg4", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"# Homework 1\n", | ||
"\n", | ||
"**For exercises in the week 22-28.10.19**\n", | ||
"\n", | ||
"**Points: 7 + 2 bonus point**\n", | ||
"\n", | ||
"Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the backboard.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "iYFL1cWQbv1D", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"## Problem 1 (McKay 4.1) [1p]\n", | ||
"\n", | ||
"You are given a set of 12 balls in which:\n", | ||
"- 11 balls are equal\n", | ||
"- 1 ball is different (either heavier or lighter).\n", | ||
"\n", | ||
"You have a two-pan balance. How many weightings you must use to detect toe odd ball?\n", | ||
"\n", | ||
"*Hint:* A weighting can be seen as a random event. You can design them to maximize carry the most information, i.e. to maximize the entropy of their outcome." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "-kxgvhPBb7hN", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"## Problem 2 [1p]\n", | ||
"\n", | ||
"Bayes' theorem allows to reason about conditional probabilities of causes and their effects:\n", | ||
"\n", | ||
"\\begin{equation}\n", | ||
"p(A,B)=p(A|B)p(B)=p(B|A)p(A)\n", | ||
"\\end{equation}\n", | ||
"\n", | ||
"\\begin{equation}\n", | ||
"p(A|B) = \\frac{p(B|A)p(A)}{p(B)}\n", | ||
"\\end{equation}\n", | ||
"\n", | ||
"Bayes' theorem allows us to reason about probabilities of causes, when\n", | ||
"we observe their results. Instead of directly answering the hard\n", | ||
"question $p(\\text{cause}|\\text{result})$ we can instead separately\n", | ||
"work out the marginal probabilities of causes $p(\\text{cause})$ and\n", | ||
"carefully study their effects $p(\\text{effect}|\\text{cause})$.\n", | ||
"\n", | ||
"Solve the following using Bayes' theorem.\n", | ||
"\n", | ||
"1. There are two boxes on the table: box \\#1 holds two\n", | ||
" black balls and eight red ones, box \\#2 holds 5 black ones and\n", | ||
" 5 red ones. We pick a box at random (with equal probabilities),\n", | ||
" and then a ball from that box.\n", | ||
" 1. What is the probability, that the\n", | ||
" ball came from box \\#1 if we happened to pick a red ball?\n", | ||
" \n", | ||
"1. The government has started a preventive program of\n", | ||
" mandatory tests for the Ebola virus. Mass testing method is\n", | ||
" imprecise, yielding 1% of false positives (healthy, but the test\n", | ||
" indicates the virus) and 1% of false negatives (\n", | ||
" having the virus but healthy according to test results).\n", | ||
" As Ebola is rather infrequent, lets assume that it occurs in\n", | ||
" one in a million people in Europe.\n", | ||
" 1. What is the probability,\n", | ||
" that a random European, who has been tested positive for Ebola\n", | ||
" virus, is indeed a carrier?\n", | ||
" 2. Suppose we have an additional information, that the person has just\n", | ||
" arrived from a country where one in a thousand people is a carrier.\n", | ||
" How much will be the increase in probability?\n", | ||
" 3. How accurate should be the test, for a 80% probability of true\n", | ||
" positive in a European?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "72HwrFKBb9Mn", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"## Problem 3 [1.5p]\n", | ||
"\n", | ||
"Given observations $x_1,\\ldots,x_n$\n", | ||
" coming from a certain distribution,\n", | ||
" prove that MLE of a particular parameter of that distribution is equal to the sample mean $\\frac{1}{n}\\sum_{i=1}^n x_i$:\n", | ||
"1. Bernoulli distribution with success probability $p$ and MLE $\\hat{p}$,\n", | ||
"2. Gaussian distribution $\\mathcal{N}(\\mu,\\sigma)$ and MLE $\\hat{\\mu}$,\n", | ||
"3. Poisson distribution $\\mathit{Pois}(\\lambda)$ and MLE $\\hat{\\lambda}$." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "oh5OFthnb-4f", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"## Problem 4 [1.5p]\n", | ||
"\n", | ||
"1D Gaussian manipulatoin for Kalman filters.\n", | ||
"\n", | ||
"A [1D Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter) tracks the location of an object given imprecise measurements of its location. At its core it performs an update of the form:\n", | ||
"\n", | ||
"$$\n", | ||
" p(x|m) = \\frac{p(m|x)p(x)}{p(m)} = \\frac{p(m|x)p(x)}{Z},\n", | ||
"$$\n", | ||
"\n", | ||
"where:\n", | ||
"- $p(x|m)$ is the updated belief about the location,\n", | ||
"- $p(x) = \\mathcal{N}(\\mu=\\mu_x, \\sigma=\\sigma_x)$ is the belief about the location,\n", | ||
"- $p(m|x) = \\mathcal{N}(\\mu=x, \\sigma=\\sigma_m)$ is the noisy measurement, centered on the location of the object,\n", | ||
"- $Z = p(m) =\\int p(m|x)p(x) dx$ is a normalization constant not dependent on $x$.\n", | ||
"\n", | ||
"Compute $p(x|m)$.\n", | ||
"\n", | ||
"*Hint:* The product $\\mathcal{N}(x;\\mu_1, \\sigma_1)\\mathcal{N}(x;\\mu_2, \\sigma_2)$ ressembles an unnormalized probability distribution, which one? Can you normalize it by computing the mean and standard deviation and fitting it to a knoen PDF?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "00IQ8eslcHVI", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"## Problem 5 (Murphy, 2.17) [1p]\n", | ||
"\n", | ||
"Expected value of the minimum.\n", | ||
"\n", | ||
"Let $X, Y$ be sampled uniformily on the interval $[0,1]$. What is the expected value of $\\min(X,Y)$?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "bEETFqdrqBoj", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"## Problem 6 (Kohavi) [1p]\n", | ||
"\n", | ||
"The failure of leave-one-out evaluation. \n", | ||
"\n", | ||
"Consider a binary classification dataset in which the labels are assigned completely at random, with 50% probability given to either class. Assume you have a collected a dataset with 100 records in which exactly 50 of them belong to class 0 and 50 to class 1. \n", | ||
"\n", | ||
"What will be the leave-one-out accuracy of the majority voting classifier?\n", | ||
"\n", | ||
"NB: sometimes it is useful to equalize the number of classes in each fold of cross-validation, e.g. using the [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) implementation from SKlearn." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "yrnFTXpfcN1R", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"## Problem 7 [1pb]\n", | ||
"Do Problem 7a from [Assignment 1](https://github.com/janchorowski/ml_uwr/blob/fall2019/assignment1/Assignment1.ipynb)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "3PO62-Ffy6z9", | ||
"colab_type": "text" | ||
}, | ||
"source": [ | ||
"## Problem 8 [1bp]\n", | ||
"\n", | ||
"Many websites ([Reddit](reddit.com), [Wykop](wykop.pl), [StackOverflow](stackoverflow.com)) provide sorting of comments based on user votes. Discuss what are the implications when sorting by:\n", | ||
"- difference between up- and down-votes\n", | ||
"- mean score\n", | ||
"- lower or upper confidence bound of the score\n", | ||
"\n", | ||
"At least for Reddit the sorting algorithm can be found online, what is it?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "JHOYUVyoLdN6", | ||
"colab_type": "code", | ||
"colab": {} | ||
}, | ||
"source": [ | ||
"" | ||
], | ||
"execution_count": 0, | ||
"outputs": [] | ||
} | ||
] | ||
} |