Skip to content

Commit 719f7ce

Browse files
committed
updated readme of exercise repo of day2
1 parent 12533db commit 719f7ce

File tree

1 file changed

+182
-51
lines changed

1 file changed

+182
-51
lines changed

README.md

Lines changed: 182 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -6,67 +6,129 @@ Use:
66
nox -s test
77
```
88
While coding, use `nox -s lint`, and `nox -s typing` to check your code.
9-
Autoformatting help is available via `nox -s format`.
9+
Autoformatting help is available via `nox -s` format.
1010
Feel free to read more about nox at https://nox.thea.codes/en/stable/ .
1111

12+
Today's goal is to get more familiar with the three important concepts of linear regression, fitting a polynomial of higher order and regularization.
13+
You will be given some data in the form of pairs of (a, b)-values.
14+
So for each a-value there is one b-value.
15+
The main idea is to find an easy function (i.e. a polynomial) that best explains our data.
16+
That means, that if we plug in an a-value, the result should be close enough to the corresponding b-value.
1217

13-
### Part 1: Proof of concept
14-
Use `b = pandas.read_csv('./data/noisy_signal.tab')` to load a noisy signal.
15-
The first part will be concerned with modelling this signal using polynomials.
18+
A general polynomial $f$ of order $n$ is given by:
1619

17-
#### Regression:
18-
Linear regression is usually a good first step. Start by implementing the function
19-
`set_up_point_matrix` from the `src/regularization.py` module.
20-
The function should produce polynomial-coordinate matrices $\mathbf{A}_n$ of the form:
20+
$$ b = f(a) = c_1 + c_2 \cdot a^1 + c_3 \cdot a^2 + ... + c_n \cdot a^{n-1} $$
21+
22+
The $c$-values are the coefficients of the polynomial - these numbers are to be estimated!
23+
The $a-$ and $b-$values are already given.
24+
If we plug all the given values into the general form of the polynomial we get a system of linear equations which can be reformulated as a matrix multiplication:
2125

2226
$$
23-
\mathbf{A}_n =
2427
\begin{pmatrix}
2528
1 & a_1^1 & a_1^2 & \dots & a_1^{n-1} \\\\
2629
1 & a_2^1 & a_2^2 & \dots & a_2^{n-1} \\\\
2730
1 & a_3^1 & a_3^2 & \dots & a_3^{n-1} \\\\
2831
\vdots & \vdots & \vdots & \ddots & \vdots \\\\
2932
1 & a_m^1 & a_m^2 & \dots & a_m^{n-1} \\\\
33+
\end{pmatrix} \cdot
34+
\begin{pmatrix}
35+
c_1 \\\\
36+
\vdots \\\\
37+
c_n \\\\
38+
\end{pmatrix} = \begin{pmatrix}
39+
b_1 \\\\
40+
b_2 \\\\
41+
b_3 \\\\
42+
\vdots \\\\
43+
b_m \\\\
3044
\end{pmatrix}
3145
$$
3246

33-
With n=2,
47+
Or in short:
48+
49+
$$\mathbf{A}_n\mathbf{c} = \mathbf{b}$$
50+
51+
The optimal $\mathbf{c}$ is given by
52+
53+
$$\mathbf{A}_n^{\dagger}\mathbf{b} = \mathbf{c} ,$$
54+
55+
where $\mathbf{A}_n^{\dagger}$ is the Pseudo-Inverse.
56+
57+
For linear regression, the polynomial will be of order n=2 and will look like this:
58+
59+
$$ f(a) = c_1 + c_2 \cdot a $$
60+
61+
This will just be a straight line and there are only two coefficients $c_1$ and $c_2$ that have to be estimated.
62+
Very often, this line is too simple to explain the data sufficiently.
63+
That is why we want to fit polynomials of higher order, so $n > 2$.
64+
Unfortunately, the more complex the model gets (i.e. the higher the order of the polynomial gets), the more noise will be tracked. Here we can make use of regularization techniques.
65+
In the first and following part, you will be given artificial data and in the second part you will make use of real data!
66+
67+
### Part 1: Proof of concept
68+
The line `b = pandas.read_csv('./data/noisy_signal.tab')` is used to load a noisy signal.
69+
The line `x_axis = np.linspace(0, 1, num=len(b_noise))` will provide you with corresponding x-values (these are the a-values from above and the lecture).
70+
This is some artificial data that serves as a means to try out the concepts you have learned about in the lecture.
71+
The first part will be concerned with modeling this signal using polynomials.
72+
73+
#### ⊙ Task 1.1: Regression
74+
Linear regression is usually a good first step.
3475

35-
$$\mathbf{A}_2^{\dagger}\mathbf{b} = \mathbf{x}$$
76+
1. Start by implementing the function `set_up_point_matrix` from the `src/regularization.py` module.
77+
The function should produce polynomial-coordinate matrices $\mathbf{A}_n$ of the form:
78+
$$
79+
\mathbf{A}_n =
80+
\begin{pmatrix}
81+
1 & a_1^1 & a_1^2 & \dots & a_1^{n-1} \\\\
82+
1 & a_2^1 & a_2^2 & \dots & a_2^{n-1} \\\\
83+
1 & a_3^1 & a_3^2 & \dots & a_3^{n-1} \\\\
84+
\vdots & \vdots & \vdots & \ddots & \vdots \\\\
85+
1 & a_m^1 & a_m^2 & \dots & a_m^{n-1} \\\\
86+
\end{pmatrix}
87+
$$
3688

37-
will produce the coefficients for a straight line. Evaluate your first-degree polynomial via $ax+b$.
38-
Plot the result using `matplotlib.pyplot`'s `plot` function.
89+
2. Go to the main-function and use the function you just implemented to create the point-matrix A for n=2.
3990

91+
3. Now,
92+
$$\mathbf{A}_2^{\dagger}\mathbf{b} = \mathbf{c} = \begin{pmatrix}
93+
c_1 \\\\
94+
c_2 \\\\
95+
\end{pmatrix} $$
96+
will produce the coefficients for a straight line.
4097

41-
#### Fitting a Polynomial to a function:
42-
The straight line above is insufficient to model the data. Using your
43-
implementation of `set_up_point_matrix` set $n=300$ (to set up a square matrix) and fit the polynomial
44-
by computing
98+
4. Evaluate your first-degree polynomial via $c_1 + c_2 \cdot x$ and plot the result as well as the original data using `matplotlib.pyplot`'s `plot` function.
4599

46-
$$\mathbf{A}^{\dagger}\mathbf{b} = \mathbf{x}_{\text{fit}}.$$
47100

48-
Having estimated the coefficients
101+
Solution:
49102

50-
$$\mathbf{A} \mathbf{x}_{\text{fit}}$$
103+
![regression](./figures/regression.png)
51104

52-
computes the function values. Plot the original points and the function values using matplotlib.
105+
#### ⊙Task 1.2: Fitting a Polynomial to a function
106+
The straight line above is insufficient to model the data.
107+
So perform the very same steps as above, but change the degree of the polynomial to n=300 (to set up a square matrix since we have 300 data-points):
108+
1. Set up the point matrix by setting n=300.
109+
2. Estimate the coefficients via $$\mathbf{A}^{\dagger}\mathbf{b} = \mathbf{x}_{\text{fit}}.$$
110+
3. Having estimated the coefficients $$\mathbf{A} \mathbf{x}_{\text{fit}}$$ computes the function values.
111+
Plot the original points and the function values using matplotlib.
53112
What do you see?
54113

55114

115+
Solution:
116+
117+
![regression](./figures/polyfit.png)
56118

57-
#### Regularization:
58-
Unfortunately, the fit is not ideal. The polynomial tracks the noise.
119+
120+
#### ⊙Task 1.3: Regularization
121+
Unfortunately, the fit is not ideal. The polynomial is too complex and tracks the noise.
59122
The singular value decomposition (SVD) can help!
60123
Recall that the SVD turns a matrix
61124

62125
$$\mathbf{A} \in \mathbb{R}^{m,n}$$
63126

64127
into the form:
65128

66-
$$ \mathbf{A} = \mathbf{U} \Sigma \mathbf{V}^T
67-
$$
129+
$$\mathbf{A} = \mathbf{U} \Sigma \mathbf{V}^T$$
68130

69-
In the SVD-Form computing, the inverse is simple. Swap $U$ and $V$ and replace every of the m singular values with it's inverse
131+
In the SVD-Form, computing the pseudoinverse is simple! Swap U and V and replace every of the m singular values with it's inverse
70132

71133
$$1/\sigma_i .$$
72134

@@ -75,26 +137,36 @@ This results in the matrix
75137
\Sigma^\dagger = \begin{pmatrix}
76138
\sigma_1^{-1} & & & \\\\
77139
& \ddots & \\\\
78-
& & \sigma_n^{-1} \\\\ \hline
140+
& & \sigma_m^{-1} \\\\ \hline
79141
& 0 &
80-
\end{pmatrix}^T
142+
\end{pmatrix}
81143
```
82144

83-
A solution to the overfitting problem is to filter the singular values.
145+
A solution to the overfitting problem is to filter the singular values.
146+
The idea is, that small singular values often correspond to directions in the data where noise dominates. And now we want to create a filter matrix, that gets rid of singular values if they are too small (i.e. if they fall below a threshold $\epsilon$).
84147
Compute a diagonal for a filter matrix by evaluating:
85148

86149
$$f_i = \sigma_i^2 / (\sigma_i^2 + \epsilon)$$
87150

88-
The idea is to compute a loop over $i$ for all of the m singular values.
151+
89152
Roughly speaking multiplication by $f_i$ will filter a singular value when
90153

91-
$$\sigma_i \lt \epsilon .$$
154+
$$\sigma_i \lt \epsilon ,$$
155+
since in this case, $f_i$ will be close to $0$.
156+
If however
157+
$$\sigma_i \geq \epsilon ,$$
158+
$f_i$ will be closer to $1$ and the respective singular value will not be filtered out.
92159

93160
Apply the regularization by computing:
94161

95-
96162
$$
97-
\mathbf{x}_r= \mathbf{V} \mathbf{F} \mathbf{\Sigma}^\dagger
163+
\mathbf{x}_r= \mathbf{V} \mathbf{F} \begin{pmatrix}
164+
\sigma_1^{-1} & & & \\\\
165+
& \ddots & \\\\
166+
& & \sigma_n^{-1} \\\\ \hline
167+
& 0 &
168+
\end{pmatrix}
169+
\mathbf{U}^T \mathbf{b} = \mathbf{V} \mathbf{F} \mathbf{\Sigma}^\dagger
98170
\mathbf{U}^T \mathbf{b}
99171
$$
100172

@@ -103,39 +175,98 @@ with
103175

104176
$$\mathbf{V} \in \mathbb{R}^{n,n}, \mathbf{F} \in \mathbb{R}^{n,n}, \Sigma^{\dagger} \in \mathbb{R}^{n,m}, \mathbf{U} \in \mathbb{R}^{m,m} \text{ and } \mathbf{b} \in \mathbb{R}^{m,1}.$$
105177

106-
Setting $n=300$ turns $A$ into a square matrix. In this case, the zero block in the sigma-matrix disappears.
107-
Plot the result for epsilon equal to 0.1, 1e-6, and 1e-12.
178+
Setting n=300 turns A into a square matrix. In this case, the zero block in the sigma-matrix disappears and you don't have to worry about transposing $\sigma$ when computing the pseudoinverse.
108179

109-
#### Model Complexity (Optional):
110-
Another solution to the overfitting problem is reducing the complexity of the model.
111-
To assess the quality of polynomial fit to the data, compute and plot the Mean Squared Error (Mean Squared Error (MSE) measure how close the regression line is to data points) for every degree of polynomial up to 20.
180+
To sum it up, your tasks are:
181+
1. Compute the SVD of A.
112182

113-
MSE can be calculated using the following equation, where $N$ is the number of samples, $y_i$ is the original point and $\hat{y_i}$ is the predictied output.
183+
Perform the following steps 2. - 4. for epsilon equal to 0.1, 1e-6, and 1e-12.
184+
2. Compute the diagonal for the filter matrix and turn it into a matrix.
185+
3. Estimate the regularized coefficients by applying the formula above.
186+
4. Plot the result.
187+
188+
Solution:
189+
190+
![regression](./figures/regularized_fit.png)
191+
192+
#### ✪Task 1.4: Model Complexity (Optional):
193+
Another solution to the overfitting problem is reducing the complexity of the model.
194+
To assess the quality of polynomial fit to the data, compute and plot the Mean Squared Error (Mean Squared Error measure how close the regression line is to data points) for every degree of polynomial upto 20.
195+
So as before:
196+
1. Set up the point matrix for the current degree from 1 to 20.
197+
2. Estimate the coefficients.
198+
3. Compute the predictions.
199+
4. Calculate the MSE.
200+
MSE can be calculated using the following equation, where $N$ is the number of samples, $y_i$ is the original point and $\hat{y_i}$ is the predicted output.
114201
$$MSE=\frac{1}{N} \sum_{i=1}^{N} (y_i-\hat{y_i})^2$$
202+
5. Plot the MSE-error against the degree.
203+
6. Are the degree of the polynomial and the MSE linked?
204+
From the plot, estimate the optimal degree of polynomial and fit the polynomial with this specific degree.
205+
206+
Solution:
207+
208+
![model_complexity](./figures/model_complexity_mse.png)
209+
115210

116-
Are the degree of the polynomial and the MSE linked?
117211

118-
### Part 2: Real data analysis
212+
Solution:
213+
214+
From the plot we observe that after degree 7, the mean squared error doesn't reduce substantially.
215+
216+
![model_complexity](./figures/model_complexity_fit.png)
217+
218+
### Part 2: Rhine water level analysis
119219
Now we are ready to deal with real data! Feel free to use your favorite time series data or work with the Rhine level data we provide.
120220
The file `./data/pegel.tab` contains the Rhine water levels measured in Bonn over the last 100 years.
121221
Data source: https://pegel.bonn.de.
122222

123-
#### Regression:
223+
#### ⊙Task 2.1 Regression
124224
The `src/pegel_bonn.py` file already contains code to pre-load the data for you.
125-
Make the Rhine level measurements your new vector $\mathbf{b}$.
225+
The Rhine level measurements will be your new vector $\mathbf{b}$ from before.
226+
Now we want to do the same as in Part 1 and start with linear regression!
227+
1. Generate a matrix A with n=2 using the timestamps for the data set as your x-values.
228+
2. Compute $$\mathbf{A}^{\dagger}\mathbf{b}$$ to estimate the coefficients of the line.
229+
3. Evaluate your polynomial and plot the result.
230+
4. Compute the zero-intercept with the y-axis. When do the regression line and the x-axis intersect?
231+
Or in other words: On which day will the Rhine water level be at 0 cm?
232+
> **Hint:** Plug in $y=0$ into the equation of your line with the estimated coefficients and solve for the date $x$.
126233
127-
Generate a matrix A with n=2 using the timestamps for the data set and compute
234+
Solution:
128235

129-
$$\mathbf{A}^{\dagger}\mathbf{b}.$$
236+
![regression](./figures/rhine_regression.png)
130237

131-
Plot the result. Compute the zero. When do the regression line and the x-axis intersect?
132238

133-
#### Fitting a higher order Polynomial:
239+
#### ⊙ Task 2.2: Fitting a higher-order Polynomial
134240

135241
Re-using the code you wrote for the proof of concept task, fit a polynomial of degree 20 to the data. Before plotting have a closer look at `datetime_stamps` and its values and scale the axis appropriately.
136-
Plot the result.
137242

138-
#### Regularization:
243+
1. Scale the x-axis.
244+
2. Set up the point matrix for the scaled x-axis with n=20.
245+
3. Compute the coefficients.
246+
4. Evaluate the polynomial.
247+
5. Plot the result.
248+
249+
Solution:
250+
251+
![rhine_polyfit](./figures/rhine_polyfit.png)
252+
253+
254+
255+
#### ⊙Task 2.3: Regularization
139256
Focus on the data from the year 2000 onward and filter the singular values.
140-
Matrix A is not square in this case. Consequently, a zero block must appear in your singular value matrix.
141-
Plot filtered eigen-polynomials using epsilon equal to 0.1, 1e-3, 1e-9.
257+
We will use again a degree of 20.
258+
Matrix A is not square in this case, because the degree is smaller than the number of datapoints! Consequently, a zero block must appear in your singular value matrix and when computing the Pseudoinverse from the SVD, $\sigma$ has to be transposed!
259+
Like in Part 1:
260+
1. Compute the SVD of the point matrix from the previous task.
261+
262+
Perform the following steps 2. - 4. for epsilon equal to 0.1, 1e-3, and 1e-9.
263+
2. Compute the filter matrix.
264+
3. Estimate the regularized coefficients by applying the formula from before.
265+
> **Hint:** Remember the zero-block! You need degree-many rows and number-of-datapoints-many columns!
266+
4. Evaluate the regularized polynomial and plot the results.
267+
268+
269+
Solution:
270+
271+
![rhine_reg_fit](./figures/rhine_regularized_fit.png)
272+

0 commit comments

Comments
 (0)