Skip to content

Commit facc5c4

Browse files
committed
add GD example
1 parent 252621f commit facc5c4

File tree

2 files changed

+301
-2
lines changed

2 files changed

+301
-2
lines changed

gradient_descent2.ipynb

+298
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "c3cfdffe",
6+
"metadata": {},
7+
"source": [
8+
"Sascha Spors,\n",
9+
"Professorship Signal Theory and Digital Signal Processing,\n",
10+
"Institute of Communications Engineering (INT),\n",
11+
"Faculty of Computer Science and Electrical Engineering (IEF),\n",
12+
"University of Rostock,\n",
13+
"Germany\n",
14+
"\n",
15+
"# Data Driven Audio Signal Processing - A Tutorial with Computational Examples\n",
16+
"\n",
17+
"Master Course #24512\n",
18+
"\n",
19+
"- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture\n",
20+
"- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise\n",
21+
"\n",
22+
"Feel free to contact lecturer [email protected]"
23+
]
24+
},
25+
{
26+
"cell_type": "markdown",
27+
"id": "c4810223",
28+
"metadata": {},
29+
"source": [
30+
"# Gradient Descent\n",
31+
"\n",
32+
"- a nice 2D loss surface is discussed with Fig. 4.4(b) in the highly recommended textbook https://doi.org/10.1007/978-3-030-40344-7 (page 150)\n",
33+
"- this loss function has one global minimum, three local minima, one local maximum and four saddle points\n",
34+
"- while this is still a toy example spanning a comparable simple surface, different gradient descents can be studied when varying\n",
35+
" - starting point\n",
36+
" - learning rate\n",
37+
" - stop criterion"
38+
]
39+
},
40+
{
41+
"cell_type": "code",
42+
"execution_count": null,
43+
"id": "8f4ccf84",
44+
"metadata": {},
45+
"outputs": [],
46+
"source": [
47+
"import numpy as np\n",
48+
"import matplotlib.pyplot as plt\n",
49+
"from matplotlib import cm"
50+
]
51+
},
52+
{
53+
"cell_type": "code",
54+
"execution_count": null,
55+
"id": "33001986",
56+
"metadata": {},
57+
"outputs": [],
58+
"source": [
59+
"matplotlib_widget_flag = True"
60+
]
61+
},
62+
{
63+
"cell_type": "code",
64+
"execution_count": null,
65+
"id": "4c806f80",
66+
"metadata": {},
67+
"outputs": [],
68+
"source": [
69+
"if matplotlib_widget_flag:\n",
70+
" %matplotlib widget"
71+
]
72+
},
73+
{
74+
"cell_type": "code",
75+
"execution_count": null,
76+
"id": "ee2030e1",
77+
"metadata": {},
78+
"outputs": [],
79+
"source": [
80+
"w1 = np.linspace(-2, 3, 1000, endpoint=False)\n",
81+
"w2 = np.linspace(-2, 3, 1000, endpoint=False)\n",
82+
"W1, W2 = np.meshgrid(w1, w2, indexing='xy')\n",
83+
"# cf. Fig. 4.4(b) from https://doi.org/10.1007/978-3-030-40344-7 \n",
84+
"J = (W1**4 + W2**4) / 4 - (W1**3 + W2**3) / 3 - W1**2 - W2**2 + 4"
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": null,
90+
"id": "13c6a43d",
91+
"metadata": {},
92+
"outputs": [],
93+
"source": [
94+
"# local maximum at (0,0) -> J(0,0) = 4\n",
95+
"J[W1==0][w2==0]"
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": null,
101+
"id": "572e2a63",
102+
"metadata": {},
103+
"outputs": [],
104+
"source": [
105+
"# local minimum at (2,-1) -> J(2,-1) = 11/12 = 0.91666667\n",
106+
"J[W1==2][w2==-1]"
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": null,
112+
"id": "10bba308",
113+
"metadata": {},
114+
"outputs": [],
115+
"source": [
116+
"# local minimum at (-1,-1) -> J(-1,-1) = 19/6 = 3.16666667\n",
117+
"J[W1==-1][w2==-1]"
118+
]
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": null,
123+
"id": "3d54a722",
124+
"metadata": {},
125+
"outputs": [],
126+
"source": [
127+
"# local minimum at (-1,2) -> J(-1,2) = 11/12 = 0.91666667\n",
128+
"J[W1==-1][w2==2]"
129+
]
130+
},
131+
{
132+
"cell_type": "code",
133+
"execution_count": null,
134+
"id": "87138cef",
135+
"metadata": {},
136+
"outputs": [],
137+
"source": [
138+
"# global minimum at (2,2) -> J(2,2) = -4/3 = -1.33333333\n",
139+
"np.min(J), J[W1==2][w2==2], W1[np.min(J) == J], W2[np.min(J) == J]"
140+
]
141+
},
142+
{
143+
"cell_type": "code",
144+
"execution_count": null,
145+
"id": "003e8406",
146+
"metadata": {},
147+
"outputs": [],
148+
"source": [
149+
"# saddle points at\n",
150+
"# (2,0); (0,-1); (-1,0); (0,2)\n",
151+
"# J = \n",
152+
"J[W1==2][w2==0], J[W1==0][w2==-1], J[W1==-1][w2==0], J[W1==0][w2==2]"
153+
]
154+
},
155+
{
156+
"cell_type": "markdown",
157+
"id": "c756302c",
158+
"metadata": {},
159+
"source": [
160+
"## Loss Surface"
161+
]
162+
},
163+
{
164+
"cell_type": "code",
165+
"execution_count": null,
166+
"id": "c57eb277",
167+
"metadata": {},
168+
"outputs": [],
169+
"source": [
170+
"fig, ax = plt.subplots(subplot_kw={\"projection\": \"3d\"})\n",
171+
"surf = ax.plot_surface(W1, W2, J,\n",
172+
" cmap=cm.magma_r,\n",
173+
" rstride=10, cstride=10,\n",
174+
" linewidth=0, antialiased=False)\n",
175+
"ax.plot([2], [2], [-4/3], 'o')\n",
176+
"ax.set_zlim(-2, 10)\n",
177+
"ax.set_xlabel(r'$w_1$')\n",
178+
"ax.set_ylabel(r'$w_2$')\n",
179+
"ax.set_zlabel(r'$J(w_1,w_2)$')\n",
180+
"ax.view_init(elev=65, azim=-135, roll=0)\n",
181+
"fig.colorbar(surf, shrink=0.67, aspect=20)"
182+
]
183+
},
184+
{
185+
"cell_type": "markdown",
186+
"id": "aa797ec2",
187+
"metadata": {},
188+
"source": [
189+
"## Gradient Descent\n",
190+
"\n",
191+
"With the chosen parameters\n",
192+
"- `w_act = np.array([[3], [0+1e-3]])`\n",
193+
"- `step_size = 1e-2`\n",
194+
"- `N = 2**10`\n",
195+
"the gradient descent has a delicate outcome: it approaches one saddle point in the beginning, comparably fast; and because we are slightly offset with $w_2 = 1e-3$ the GD will not die on the saddle point, but rather (comparably slowly) pursues to the global minimum, making a radical turn close to the saddle point.\n",
196+
"\n",
197+
"1. Set init vallues such that GD will end in a saddle point\n",
198+
"2. What possible choices to init $w_2$ for letting GD path arrive at the local minimum (2,-1)\n",
199+
"3. Do we have a chance with the given starting parameters and plain gradient descent algorithm, that the GD path finds its way to the local minima (-1,-1) or (-1,2)?"
200+
]
201+
},
202+
{
203+
"cell_type": "code",
204+
"execution_count": null,
205+
"id": "0026ad20",
206+
"metadata": {},
207+
"outputs": [],
208+
"source": [
209+
"w_act = np.array([[3], [0+1e-3]])\n",
210+
"step_size = 1e-2\n",
211+
"N = 2**10\n",
212+
"\n",
213+
"# gradient descent\n",
214+
"w1w2J = np.zeros([3, N])\n",
215+
"for i in range(N):\n",
216+
" # calc gradient\n",
217+
" grad_J_to_w = np.array([[w_act[0, 0]**3 - w_act[0, 0]**2 - 2*w_act[0, 0]],\n",
218+
" [w_act[1, 0]**3 - w_act[1, 0]**2 - 2*w_act[1, 0]]])\n",
219+
" # GD update\n",
220+
" w_act = w_act - step_size * grad_J_to_w\n",
221+
" # calc cost with current weights\n",
222+
" J_tmp = (w_act[0, 0]**4+w_act[1, 0]**4)/4 -\\\n",
223+
" (w_act[0, 0]**3 + w_act[1, 0]**3)/3 -\\\n",
224+
" w_act[0, 0]**2 - w_act[1, 0]**2 + 4\n",
225+
" # store the path for plotting\n",
226+
" w1w2J[0:2, i] = np.squeeze(w_act)\n",
227+
" w1w2J[2, i] = J_tmp"
228+
]
229+
},
230+
{
231+
"cell_type": "markdown",
232+
"id": "24217cd3",
233+
"metadata": {},
234+
"source": [
235+
"## Plot Loss Surface and Gradient Descent Path"
236+
]
237+
},
238+
{
239+
"cell_type": "code",
240+
"execution_count": null,
241+
"id": "602a07d7",
242+
"metadata": {},
243+
"outputs": [],
244+
"source": [
245+
"fig, ax = plt.subplots(subplot_kw={\"projection\": \"3d\"})\n",
246+
"surf = ax.plot_surface(W1, W2, J,\n",
247+
" cmap=cm.magma_r,\n",
248+
" rstride=10, cstride=10,\n",
249+
" linewidth=0, antialiased=False)\n",
250+
"ax.plot(w1w2J[0,:], w1w2J[1,:], w1w2J[2,:],\n",
251+
" 'C0x-', ms=1, zorder=3)\n",
252+
"ax.set_zlim(-2, 10)\n",
253+
"ax.set_xlabel(r'$w_1$')\n",
254+
"ax.set_ylabel(r'$w_2$')\n",
255+
"ax.set_zlabel(r'$J(w_1,w_2)$')\n",
256+
"ax.view_init(elev=65, azim=-135, roll=0)\n",
257+
"fig.colorbar(surf, shrink=0.67, aspect=20)\n",
258+
"\n",
259+
"w1w2J[:,-1]"
260+
]
261+
},
262+
{
263+
"cell_type": "markdown",
264+
"id": "d4021d96",
265+
"metadata": {},
266+
"source": [
267+
"## Copyright\n",
268+
"\n",
269+
"- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)\n",
270+
"- feel free to use the notebooks for your own purposes\n",
271+
"- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)\n",
272+
"- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)\n",
273+
"- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year."
274+
]
275+
}
276+
],
277+
"metadata": {
278+
"kernelspec": {
279+
"display_name": "myddasp",
280+
"language": "python",
281+
"name": "myddasp"
282+
},
283+
"language_info": {
284+
"codemirror_mode": {
285+
"name": "ipython",
286+
"version": 3
287+
},
288+
"file_extension": ".py",
289+
"mimetype": "text/x-python",
290+
"name": "python",
291+
"nbconvert_exporter": "python",
292+
"pygments_lexer": "ipython3",
293+
"version": "3.10.6"
294+
}
295+
},
296+
"nbformat": 4,
297+
"nbformat_minor": 5
298+
}

index.ipynb

+3-2
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,9 @@
7474
"- [Bias-Variance Trade-Off vs. Regularization](bias_variance_ridge_regression.ipynb)\n",
7575
"\n",
7676
"\n",
77-
"## Exercise: Gradient Descent\n",
78-
"- [Gradient Descent](gradient_descent.ipynb)\n",
77+
"## Exercise: Gradient Descent along a 2D Surface\n",
78+
"- [Gradient Descent 1](gradient_descent.ipynb) with one saddle point\n",
79+
"- [Gradient Descent 2](gradient_descent2.ipynb) with saddle points, local maximum, local minima and a global minimum\n",
7980
"- [Gradient Descent with Momentum](gradient_descent_momentum.ipynb)\n",
8081
"- [Stochastic Gradient Descent for Least Squares Error](gradient_descent_on_least_squares.ipynb)\n",
8182
"\n",

0 commit comments

Comments
 (0)