Skip to content

Commit 667ce12

Browse files
committed
nn post
1 parent 9540d3e commit 667ce12

4 files changed

+69
-1
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,5 @@ _site
66
# general
77
.DS_Store
88
Thumbs.db
9-
ehthumbs.db
9+
ehthumbs.db
10+
Gemfile.lock
+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
layout: post
3+
title: Preprocessing for Neural Networks: Normalization Techniques
4+
subtitle: Scaling, standardization, and so on
5+
tags: [neural networks, preprocessing, standardization, normalization]
6+
image: /img/brain-neural.png
7+
bigimg: /img/brain-background.jpg
8+
---
9+
10+
I mentioned about a critical preprocessing tool for [Lasso](https://alfurka.github.io/2018-11-06-preprocessing-for-lasso/) in my last post. Today I will write about preprocessing for Neural Networks.
11+
12+
Today, it took me three hours to understand why my neural network model sometimes diverges. And sometimes it fits well. However, it is impossible to use that model in such case. Because, model requires a new training in each new data arrival and it needs to start from the beginning. Generally, I normalize my data before training a neural network model but I forget it today which is a very important preprocessing step for neural networks.
13+
14+
The neural network models contain too many weights. If inputs contains data with different scales it may diverge your model as in my case. Even if it does not diverges, it can overestimate, underestimate, or ignore some parameters; and thus it decreases efficiency of your estimation.
15+
16+
Therefore, among the others, normalization is one of the most important preprocessing tool for neural networks. So how to normalize the data?
17+
18+
#### Min-Max Scaling
19+
20+
One of the commonly used techniques is using min-max scaling. It is very straightforward:
21+
$$
22+
X_{i} ^{S} = \dfrac{X_i - X_{min}}{X_{max} - X_{min}}
23+
$$
24+
Note that Min-Max scaling is very sensitive to the outliers.
25+
26+
#### Decimal Scaling
27+
28+
Your data may contain a variable with very extreme values like `house prices` . Its weight is likely to diverge during stochastic gradient descent. If such values are not frequent you can simply apply decimal scaling by dividing it, say, $1e4$.
29+
30+
#### Eliminating Outliers
31+
32+
It might be very efficient if you eliminate the outliers with or without using other normalization techniques. I mean really `outliers`, do not drop $1\%$ quantiles from the beginning.
33+
34+
#### Z-score normalization or Standardization
35+
36+
It is one of the most common standardization technique. You find the z-scores of your variables on their own distribution.
37+
$$
38+
X_i ^S = (X_i - mean(X_i)) * std(X_i)
39+
$$
40+
However, it is efficient only if your data is Gaussian-like distributed. It is also sensitive to the outliers.
41+
42+
#### Mean / Median Absolute Deviation
43+
44+
It is insensitive to the outliers but does not contain the input variances as raw data is. It also means it can be used as a [data augmentation](https://www.google.com.au/search?q=data+augmentation&oq=data+augmentation&aqs=chrome..69i57j0l5.2309j0j7&sourceid=chrome&ie=UTF-8). But it does not increase the number of observations, only increases the number of variables.
45+
$$
46+
X_i ^S = \dfrac{X_i - mean(X_i)}{N} \qquad \text{or} \qquad X_i ^S = \dfrac{X_i - median(X_i)}{N}
47+
$$
48+
49+
It also resembles the using [polynomial feature](https://alfurka.github.io/2018-11-06-preprocessing-for-lasso/) parameters.
50+
51+
#### (Modified) Tanh Estimator
52+
53+
Tanh estimators are considered to be more efficient and robust normalization technique. It is not sensitive to outliers and it also converges faster than Z-score normalization. It yields values between -1 and 1 ($X_i \in [-1, 1]$).
54+
$$
55+
X_i ^S = 0.5 * tanh \Big [ 0.01 * \dfrac{X_i - mean(X_i)}{std(X_i)} \Big]
56+
$$
57+
58+
#### Max-Scaling
59+
60+
It resembles the min-max scaling but it is more efficient.
61+
$$
62+
X_i ^S = X_i / max(X_i)
63+
$$
64+
*Which one should we use?*
65+
66+
This question really depends on your data and model. On the other hand the last two are more appropriate in general than others. You should also consider the eliminating outliers before training and s[ee further discussion here](https://research.ijcaonline.org/volume32/number10/pxc3875530.pdf).
67+

img/brain-background.jpg

483 KB
Loading

img/brain-neural.png

13.4 KB
Loading

0 commit comments

Comments
 (0)