add GD example

fs446 · fs446 · commit facc5c4fecd1 · 2024-01-26T13:56:57.000+01:00
diff --git a/gradient_descent2.ipynb b/gradient_descent2.ipynb
@@ -0,0 +1,298 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c3cfdffe",
+   "metadata": {},
+   "source": [
+    "Sascha Spors,\n",
+    "Professorship Signal Theory and Digital Signal Processing,\n",
+    "Institute of Communications Engineering (INT),\n",
+    "Faculty of Computer Science and Electrical Engineering (IEF),\n",
+    "University of Rostock,\n",
+    "Germany\n",
+    "\n",
+    "# Data Driven Audio Signal Processing - A Tutorial with Computational Examples\n",
+    "\n",
+    "Master Course #24512\n",
+    "\n",
+    "- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture\n",
+    "- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise\n",
+    "\n",
+    "Feel free to contact lecturer frank.schultz@uni-rostock.de"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4810223",
+   "metadata": {},
+   "source": [
+    "# Gradient Descent\n",
+    "\n",
+    "- a nice 2D loss surface is discussed with Fig. 4.4(b) in the highly recommended textbook https://doi.org/10.1007/978-3-030-40344-7 (page 150)\n",
+    "- this loss function has one global minimum, three local minima, one local maximum and four saddle points\n",
+    "- while this is still a toy example spanning a comparable simple surface, different gradient descents can be studied when varying\n",
+    "    - starting point\n",
+    "    - learning rate\n",
+    "    - stop criterion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8f4ccf84",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib import cm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "33001986",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "matplotlib_widget_flag = True"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c806f80",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if matplotlib_widget_flag:\n",
+    "    %matplotlib widget"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee2030e1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "w1 = np.linspace(-2, 3, 1000, endpoint=False)\n",
+    "w2 = np.linspace(-2, 3, 1000, endpoint=False)\n",
+    "W1, W2 = np.meshgrid(w1, w2, indexing='xy')\n",
+    "# cf. Fig. 4.4(b) from https://doi.org/10.1007/978-3-030-40344-7 \n",
+    "J = (W1**4 + W2**4) / 4 - (W1**3 + W2**3) / 3 - W1**2 - W2**2 + 4"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "13c6a43d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# local maximum at (0,0) -> J(0,0) = 4\n",
+    "J[W1==0][w2==0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "572e2a63",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# local minimum at (2,-1) -> J(2,-1) = 11/12 = 0.91666667\n",
+    "J[W1==2][w2==-1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10bba308",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# local minimum at (-1,-1) -> J(-1,-1) = 19/6 = 3.16666667\n",
+    "J[W1==-1][w2==-1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3d54a722",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# local minimum at (-1,2) -> J(-1,2) = 11/12 = 0.91666667\n",
+    "J[W1==-1][w2==2]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "87138cef",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# global minimum at (2,2) -> J(2,2) = -4/3 = -1.33333333\n",
+    "np.min(J), J[W1==2][w2==2], W1[np.min(J) == J], W2[np.min(J) == J]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "003e8406",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# saddle points at\n",
+    "# (2,0); (0,-1); (-1,0); (0,2)\n",
+    "# J = \n",
+    "J[W1==2][w2==0], J[W1==0][w2==-1], J[W1==-1][w2==0], J[W1==0][w2==2]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c756302c",
+   "metadata": {},
+   "source": [
+    "## Loss Surface"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c57eb277",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots(subplot_kw={\"projection\": \"3d\"})\n",
+    "surf = ax.plot_surface(W1, W2, J,\n",
+    "                       cmap=cm.magma_r,\n",
+    "                       rstride=10, cstride=10,\n",
+    "                       linewidth=0, antialiased=False)\n",
+    "ax.plot([2], [2], [-4/3], 'o')\n",
+    "ax.set_zlim(-2, 10)\n",
+    "ax.set_xlabel(r'$w_1$')\n",
+    "ax.set_ylabel(r'$w_2$')\n",
+    "ax.set_zlabel(r'$J(w_1,w_2)$')\n",
+    "ax.view_init(elev=65, azim=-135, roll=0)\n",
+    "fig.colorbar(surf, shrink=0.67, aspect=20)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa797ec2",
+   "metadata": {},
+   "source": [
+    "## Gradient Descent\n",
+    "\n",
+    "With the chosen parameters\n",
+    "- `w_act = np.array([[3], [0+1e-3]])`\n",
+    "- `step_size = 1e-2`\n",
+    "- `N = 2**10`\n",
+    "the gradient descent has a delicate outcome: it approaches one saddle point in the beginning, comparably fast; and because we are slightly offset with $w_2 = 1e-3$ the GD will not die on the saddle point, but rather (comparably slowly) pursues to the global minimum, making a radical turn close to the saddle point.\n",
+    "\n",
+    "1. Set init vallues such that GD will end in a saddle point\n",
+    "2. What possible choices to init $w_2$ for letting GD path arrive at the local minimum (2,-1)\n",
+    "3. Do we have a chance with the given starting parameters and plain gradient descent algorithm, that the GD path finds its way to the local minima (-1,-1) or (-1,2)?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0026ad20",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "w_act = np.array([[3], [0+1e-3]])\n",
+    "step_size = 1e-2\n",
+    "N = 2**10\n",
+    "\n",
+    "# gradient descent\n",
+    "w1w2J = np.zeros([3, N])\n",
+    "for i in range(N):\n",
+    "    # calc gradient\n",
+    "    grad_J_to_w = np.array([[w_act[0, 0]**3 - w_act[0, 0]**2 - 2*w_act[0, 0]],\n",
+    "                            [w_act[1, 0]**3 - w_act[1, 0]**2 - 2*w_act[1, 0]]])\n",
+    "    # GD update\n",
+    "    w_act = w_act - step_size * grad_J_to_w\n",
+    "    # calc cost with current weights\n",
+    "    J_tmp = (w_act[0, 0]**4+w_act[1, 0]**4)/4 -\\\n",
+    "        (w_act[0, 0]**3 + w_act[1, 0]**3)/3 -\\\n",
+    "        w_act[0, 0]**2 - w_act[1, 0]**2 + 4\n",
+    "    # store the path for plotting\n",
+    "    w1w2J[0:2, i] = np.squeeze(w_act)\n",
+    "    w1w2J[2, i] = J_tmp"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24217cd3",
+   "metadata": {},
+   "source": [
+    "## Plot Loss Surface and Gradient Descent Path"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "602a07d7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots(subplot_kw={\"projection\": \"3d\"})\n",
+    "surf = ax.plot_surface(W1, W2, J,\n",
+    "                       cmap=cm.magma_r,\n",
+    "                       rstride=10, cstride=10,\n",
+    "                       linewidth=0, antialiased=False)\n",
+    "ax.plot(w1w2J[0,:], w1w2J[1,:], w1w2J[2,:],\n",
+    "        'C0x-', ms=1, zorder=3)\n",
+    "ax.set_zlim(-2, 10)\n",
+    "ax.set_xlabel(r'$w_1$')\n",
+    "ax.set_ylabel(r'$w_2$')\n",
+    "ax.set_zlabel(r'$J(w_1,w_2)$')\n",
+    "ax.view_init(elev=65, azim=-135, roll=0)\n",
+    "fig.colorbar(surf, shrink=0.67, aspect=20)\n",
+    "\n",
+    "w1w2J[:,-1]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4021d96",
+   "metadata": {},
+   "source": [
+    "## Copyright\n",
+    "\n",
+    "- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)\n",
+    "- feel free to use the notebooks for your own purposes\n",
+    "- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)\n",
+    "- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)\n",
+    "- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "myddasp",
+   "language": "python",
+   "name": "myddasp"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/index.ipynb b/index.ipynb
@@ -74,8 +74,9 @@
     "- [Bias-Variance Trade-Off vs. Regularization](bias_variance_ridge_regression.ipynb)\n",
     "\n",
     "\n",
-    "## Exercise: Gradient Descent\n",
-    "- [Gradient Descent](gradient_descent.ipynb)\n",
+    "## Exercise: Gradient Descent along a 2D Surface\n",
+    "- [Gradient Descent 1](gradient_descent.ipynb) with one saddle point\n",
+    "- [Gradient Descent 2](gradient_descent2.ipynb) with saddle points, local maximum, local minima and a global minimum\n",
     "- [Gradient Descent with Momentum](gradient_descent_momentum.ipynb)\n",
     "- [Stochastic Gradient Descent for Least Squares Error](gradient_descent_on_least_squares.ipynb)\n",
     "\n",