mods wrt OLS, homework

fs446 · fs446 · commit 5a5b4776b54e · 2024-11-22T12:18:06.000+01:00
- projection consistently in SVD
- new books added
- OLS link in homework, improved comments
- typo
diff --git a/homework/homework.ipynb b/homework/homework.ipynb
@@ -23,7 +23,7 @@
     "\n",
     "# Homework Template\n",
     "\n",
-    "Make sure that you copy the template and the groudn truth data into mnt/home/... to have persistent storage. All data in the virtual machine is lost, once the virtual machine is deleted."
+    "Make sure that you copy the template and the ground truth data into mnt/home/... to have persistent storage. All data in the virtual machine is lost, once the virtual machine is deleted."
    ]
   },
   {
@@ -135,6 +135,14 @@
     "plt.tight_layout()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ee27e362",
+   "metadata": {},
+   "source": [
+    "We might get an idea to use a linear model, i.e. **linear regression** / ordinary least squares, which we have dealt with in exercise 5 [Line Fit with Linear Regression](../line_fit_linear_regression.ipynb). The code below will do this, but the model performance is rather not convincing, as the data originates in fact from a non-linear model. Hence, we need to go for a non-linear model...and thus we should solve the homework task :)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 6,
@@ -152,17 +160,18 @@
    "source": [
     "# simple linear model, train with full data set\n",
     "x_left_inverse = np.linalg.inv(x.T @ x) @ x.T\n",
-    "w = x_left_inverse @ y  # get weights via left inverse = train the model\n",
-    "y_predict = x @ w  # predict = forward propagation\n",
+    "w = x_left_inverse @ y  # get model weights via left inverse = fit/ train the model\n",
+    "y_predict = x @ w  # predict = forward propagation, often also denoted y_hat\n",
     "\n",
-    "# get residual e, loss L & empirical risk ER\n",
+    "# get error / residual e, get loss L & get empirical risk ER\n",
     "e = y - y_predict\n",
     "L = e.T @ e\n",
     "ER = L / N\n",
     "print('empirical risk:', ER[0, 0])  # 0.003427800540873722 = 3.427800540873722e-3\n",
-    "# non-linear model in homeweork task has ER = 5.7457115873891e-5\n",
-    "# so it might explain the data better than the simple non-linear model\n",
-    "# in fact the ground truth data y=f(x) originates from a non-linear function f\n",
+    "# non-linear model in homework task has ER = 5.7457115873891e-5\n",
+    "# so it might explain the data better than the simple linear model above\n",
+    "#\n",
+    "# in fact the ground truth data y=f(x) originates from a non-linear function f,\n",
     "# hence a linear model must somehow fail to do a good prediction job"
    ]
   },
diff --git a/index.ipynb b/index.ipynb
@@ -158,23 +158,25 @@
     "## Textbook Recommendations\n",
     "Machine Learning (ML) using linear / non-linear models is a vivid topic and dozens of textbooks will be released each year.\n",
     "The following textbook recommendations are very often referenced in the field and brilliant to learn with.  \n",
+    "- Kevin P. **Murphy**: *Probabilistic Machine Learning: An Introduction*, MIT Press, 1st. ed. [open source book and current draft as free pdf](https://probml.github.io/pml-book/book1.html)\n",
+    "- J.A. **Fessler**, R.R. **Nadakuditi**: *Linear Algebra for Data Science, Machine Learning, and Signal Processing*, Cambridge University Press, 2024, 1st ed. [online ebook](https://ebookcentral.proquest.com/lib/ubrostock-ebooks/detail.action?docID=31691281)\n",
     "- Sebastian **Raschka**, Yuxi Liu, Vahid Mirjalili: *Machine Learning with PyTorch and Scikit-Learn*, Packt, 2022, 1st ed.\n",
     "- Gilbert **Strang**: *Linear Algebra and Learning from Data*, Wellesley, 2019, consider to buy your own copy of this brilliant book\n",
     "- Gareth **James**, Daniela Witten, Trevor Hastie, Rob Tibshirani: *An Introduction to Statistical Learning* with Applications in R, Springer, 2nd ed., 2021, [free pdf e-book](https://www.statlearning.com/)\n",
     "- Trevor **Hastie**, Robert Tibshirani, Jerome Friedman: *The Elements of  Statistical Learning: Data Mining, Inference, and Prediction*, Springer, 2nd ed., 2009, [free pdf e-book](https://hastie.su.domains/ElemStatLearn/)\n",
     "- Sergios **Theodoridis**: *Machine Learning*, Academic Press, 2nd ed., 2020, check your university library service for free pdf e-book\n",
-    "- Kevin P. **Murphy**: *Probabilistic Machine Learning: An Introduction*, MIT Press, 1st. ed. [open source book and current draft as free pdf](https://probml.github.io/pml-book/book1.html)\n",
     "- Ian **Goodfellow**, Yoshua Bengio, Aaron Courville: *Deep Learning*, MIT Press, 2016\n",
     "- Marc Peter **Deisenroth**, A. Aldo Faisal, Cheng Soon Ong: *Mathemathics for Machine Learning*, Cambridge University Press, 2020, [free pdf e-book](https://mml-book.github.io/)\n",
     "- Steven L. **Brunton**, J. Nathan Kutz: *Data Driven Science & Engineering - Machine Learning, Dynamical Systems, and Control*, Cambridge University Press, 2020, [free pdf of draft](http://www.databookuw.com/databook.pdf), see also the [video lectures](http://www.databookuw.com/) and [Python tutorials](https://github.com/dylewsky/Data_Driven_Science_Python_Demos)\n",
     "- Aurélien **Géron**: *Hands-on machine learning with Scikit-Learn, Keras and TensorFlow*. O’Reilly, 2nd ed., 2019, [Python tutorials](https://github.com/ageron/handson-ml2)\n",
     "\n",
-    "ML deals with stuff that is actually known for decades (at least the linear modeling part of it), so if we are really serious about to learn ML deeply, we should think over concepts on statistical signal processing, maximum-likelihood, Bayesian vs. frequentist statistics, generalized linear models, hierarchical models...For these topics we could check these respected textbooks\n",
-    "- L. **Fahrmeir**, A. Hamerle, and G. Tutz, Multivariate statistische Verfahren, 2nd ed. de Gruyter, 1996.\n",
-    "- L. **Fahrmeir**, T. Kneib, S. Lang, and B. D. Marx, Regression, 2nd ed. Springer, 2021.\n",
-    "- A. J. **Dobson** and A. G. Barnett, An Introduction to Generalized Linear Models, 4th ed. CRC Press, 2018.\n",
-    "- H. **Madsen**, P. Thyregod, Introduction to General and Generalized Linear Models, CRC Press, 2011.\n",
-    "- A. **Agresti**, Foundations of Linear and Generalized Models, Wiley, 2015"
+    "ML deals with stuff that is actually known for decades (at least the linear modeling part of it), so if we are really serious about to learn ML deeply, we should elaborate concepts on statistical signal processing, maximum-likelihood, Bayesian vs. frequentist statistics, generalized linear models, hierarchical models...For these topics we could check these respected textbooks\n",
+    "- L. **Fahrmeir**, A. Hamerle, and G. Tutz, *Multivariate statistische Verfahren*, 2nd ed. de Gruyter, 1996.\n",
+    "- L. **Fahrmeir**, T. Kneib, S. Lang, and B. D. Marx, *Regression*, 2nd ed. Springer, 2021.\n",
+    "- A. J. **Dobson** and A. G. Barnett, *An Introduction to Generalized Linear Models*, 4th ed. CRC Press, 2018.\n",
+    "- J. F. **Monahan**, *A Primer on Linear Models*, CRC Press, 2008.\n",
+    "- H. **Madsen**, P. Thyregod, *Introduction to General and Generalized Linear Models*, CRC Press, 2011.\n",
+    "- A. **Agresti**, *Foundations of Linear and Generalized Models*, Wiley, 2015"
    ]
   },
   {
diff --git a/line_fit_linear_regression.ipynb b/line_fit_linear_regression.ipynb
@@ -47,7 +47,11 @@
     "import matplotlib.pyplot as plt\n",
     "import numpy as np\n",
     "from scipy.linalg import svd, diagsvd, inv, pinv, norm\n",
-    "from numpy.linalg import matrix_rank"
+    "from numpy.linalg import matrix_rank\n",
+    "\n",
+    "np.set_printoptions(precision=3,\n",
+    "                    floatmode='maxprec',\n",
+    "                    suppress=True)"
    ]
   },
   {
@@ -193,7 +197,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "e = y - y_hat  # e == y_lns\n",
+    "e = y - y_hat  # e == y_left_null\n",
     "e, e.T @ e"
    ]
   },
@@ -203,7 +207,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "y_col.T @ e  # column space is ortho to left null space"
+    "# recap: y_hat = y_col, e = y_left_null\n",
+    "# y = y_col + y_lef_null = y_hat + e\n",
+    "# hence\n",
+    "y_hat.T @ e  # column space is ortho to left null space"
    ]
   },
   {
@@ -212,8 +219,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# projection matrices\n",
-    "\n",
+    "# projection matrices:\n",
     "P_col = X @ Xli\n",
     "P_col, P_col @ y, y_col"
    ]
@@ -224,8 +230,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# check projection in terms of SVD\n",
-    "S @ Sli, np.allclose(U @ S @ Sli @ U.T, P_col)"
+    "# check P_col projection in terms of SVD\n",
+    "S @ Sli, np.allclose(U @ (S @ Sli) @ U.T, P_col)"
    ]
   },
   {
@@ -238,6 +244,16 @@
     "P_left_null, P_left_null @ y, e"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# check P_left_null projection in terms of SVD\n",
+    "np.eye(M) - S @ Sli, np.allclose(U @ (np.eye(M) - S @ Sli) @ U.T, P_left_null)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -254,8 +270,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# check projection in terms of SVD\n",
-    "Sli @ S, np.allclose(V @ Sli @ S @ V.T, P_row)"
+    "# check P_row projection in terms of SVD\n",
+    "Sli @ S, np.allclose(V @ (Sli @ S) @ V.T, P_row)"
    ]
   },
   {
@@ -268,6 +284,16 @@
     "P_null  # null space is spanned only by zero vector"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# check P_null projection in terms of SVD\n",
+    "np.allclose(V @ (np.eye(N) - Sli @ S) @ V.T, P_null)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -285,9 +311,9 @@
     "         ms=10, mew=3,\n",
     "         label='data')\n",
     "# fitted line\n",
-    "plt.plot(X[:,1], theta_hat[0] * X[:,0] + theta_hat[1] * X[:,1], 'k', label='LS fit (interpolation)')\n",
+    "plt.plot(X[:,1], theta_hat[0] * X[:,0] + theta_hat[1] * X[:,1], 'k', label='least squares fit (interpolation)')\n",
     "x = np.linspace(0, 1, 10)\n",
-    "plt.plot(x, theta_hat[0] + theta_hat[1] * x, 'C7:', label='LS fit (extrapolation)')\n",
+    "plt.plot(x, theta_hat[0] + theta_hat[1] * x, 'C7:', label='least squares fit (extrapolation)')\n",
     "x = np.linspace(4, 5, 10)\n",
     "plt.plot(x, theta_hat[0] + theta_hat[1] * x, 'C7:')\n",
     "\n",