From d21c2d1fd26756840599bb1aa516c2d4983d75de Mon Sep 17 00:00:00 2001 From: Jan Chorowski Date: Sat, 18 Jan 2020 23:34:22 +0100 Subject: [PATCH] HW4 --- ml_uwr/homework4/Homework4.ipynb | 116 +++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 ml_uwr/homework4/Homework4.ipynb diff --git a/ml_uwr/homework4/Homework4.ipynb b/ml_uwr/homework4/Homework4.ipynb new file mode 100644 index 0000000..82fa901 --- /dev/null +++ b/ml_uwr/homework4/Homework4.ipynb @@ -0,0 +1,116 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Homework4.ipynb", + "provenance": [], + "collapsed_sections": [], + "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Z3gmko6zus4J", + "colab_type": "text" + }, + "source": [ + "# Homework 4\n", + "\n", + "**For exercises between 22-27.01.2020**\n", + "**The bonus problem can be submitted on paper until the last day of semester**\n", + "\n", + "**Points: 4 + 2b**\n", + "\n", + "Please solve the problems at home and bring to class a [declaration form](http://ii.uni.wroc.pl/~jmi/Dydaktyka/misc/kupony-klasyczne.pdf) to indicate which problems you are willing to present on the blackboard.\n", + "\n", + "$\\def\\R{{\\mathbb R}} \\def\\i{^{(i)}} \\def\\sjt{\\mathrm{s.t. }\\ }$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YfOp2jR-l6Ro", + "colab_type": "text" + }, + "source": [ + "# Problem 1 (Bishop) [1.5p]\n", + "\n", + "Consoder a $K$-element mixtures of $D$-dimensional binary vectors. Each component of the mixture uses a different Bernoulli distribution for each dimension of the vector:\n", + "\n", + "$$\n", + "\\begin{split}\n", + "p(z=k) &= \\pi_k \\quad \\text{with } 0 \\leq \\pi_k \\leq 1 \\text{ and } \\sum_k\\pi_k = 1\\\\\n", + "p(x | z=k) &= \\prod_{d=1}^{D} \\mu_{kd}^{x_d}(1-\\mu_{kd})^{(1-x_d)}\n", + "\\end{split}\n", + "$$\n", + "\n", + "where $x\\in\\mathbb{R}^D$ is a random vector. The $k$-th mixture component is parameterized by $D$ different probabilities $\\mu_{kd}$ of $x_d$ being 1.\n", + "\n", + "Do the following\n", + "- Write an expression for the likelihood ($p(x;\\pi,\\mu)$).\n", + "- Compute the expected value of $x$.\n", + "- Compute the covariance of $x$." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q4TIhWQGl5p8", + "colab_type": "text" + }, + "source": [ + "# Problem 2 [2bp]\n", + "\n", + "Derive an E-M scheme for fitting a mixture of Bernoulli distributions as defined in Problem 1." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2TNgwkXtvJLT", + "colab_type": "text" + }, + "source": [ + "# Problem 3 [1.5p]\n", + "\n", + "Let $X\\in \\R^{D\\times N}$ be a data matrix contianing $N$ $D$-dimensional points. Furthermore assume $X$ is centered, i.e. \n", + "$$\n", + "\\sum_{n=1}^N X_{d,n} = 0 \\quad \\forall d.\n", + "$$\n", + "\n", + "Read about the SVD matrix decomposition (https://en.wikipedia.org/wiki/Singular_value_decomposition). \n", + "\n", + "Show:\n", + "- **P3.1** [0.5p] how the singular vectors of $X$ relate to eigenvectors of $XX^T$\n", + "- **P3.2** [1p] that PCA can be interpreted as a matrix factorization method, which finds a linaer projection data which retains the most information about $X$ (in the least squares sense)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g6rmavsV2ifp", + "colab_type": "text" + }, + "source": [ + "# Problem 4 [1p]\n", + "\n", + "Consider orthornormal matrices whose entries are non-negative. What can they express?\n", + "\n", + "What would be the limitations of learning an NMF factorization when the columns of the $W$ matrix (defined below) are orthogonal?\n", + "\n", + "NMF definition:\n", + "$$\n", + "X \\approx W\\cdot H\n", + "$$\n", + "in which $X\\in \\mathbb{R^+}^{D \\times N}$ is the data matrix containing $N$ examples in $D$ dimensions, $W\\in \\mathbb{R^+}^{D \\times K}$ is the dictionary of NMF features and $H \\in \\mathbb{R^+}^{K \\times N}$ gives the encoding of each data sample, and ${\\mathbb{R^+}$ is the set of non-negative real numbers." + ] + } + ] +} \ No newline at end of file