Skip to content

Commit e03eaa2

Browse files
committed
Submit changes before installing new os
1 parent cd66817 commit e03eaa2

File tree

6 files changed

+163
-52
lines changed

6 files changed

+163
-52
lines changed

documents/cv-curriculum-vitae/cv-curriculum-vitae.tex

+17-17
Original file line numberDiff line numberDiff line change
@@ -163,22 +163,6 @@ \section{Work Experience}
163163
and a big, but algorithmically not challenging project. To be honest,
164164
I only fixed some Java bugs.}\\
165165

166-
%----------------------------------------------------------------------------------------
167-
% WORK EXPERIENCE -2-
168-
169-
{\raggedleft\textsc{2011}\par}
170-
171-
{\raggedright\large Student research assistant at \textsc{ Institute of Toxicology and Genetics}, KIT\\
172-
\textit{participating in a university research project}\\[5pt]}
173-
174-
\normalsize{In summer 2011 I worked for over a month for a
175-
research project at KIT. I have written bash scripts for file
176-
conversions, fixed some bugs and re-written a slow Mathematica script
177-
in a much faster Python version. But it quickly turned out that
178-
this project had a lot of C++ source which was rarely commented or
179-
documented. I realized, that I wouldn't have time for this project
180-
after beginning my studies at university.}\\
181-
182166
%----------------------------------------------------------------------------------------
183167
% WORK EXPERIENCE -4-
184168

@@ -208,7 +192,7 @@ \section{Work Experience}
208192

209193
\colorbox{shade}{\textcolor{text1}{
210194
\begin{tabular}{c|p{7cm}}
211-
\raisebox{-4pt}{\textifsymbol{18}} & Parkstraße 17, 76131 Karlsruhe \\ % Address
195+
\raisebox{-4pt}{\textifsymbol{18}} & Alte Allee 107, 81245 Munich \\ % Address
212196
\raisebox{-3pt}{\Mobilefone} & +49 $($1636$)$ 28 04 91 \\ % Phone number
213197
\raisebox{-1pt}{\Letter} & \href{mailto:[email protected]}{[email protected]} \\ % Email address
214198
\Keyboard & \href{http://martin-thoma.com}{martin-thoma.com} \\ % Website
@@ -331,6 +315,22 @@ \section{Language Skills}
331315
%----------------------------------------------------------------------------------------
332316

333317
\section{Work Experience}
318+
%----------------------------------------------------------------------------------------
319+
% WORK EXPERIENCE -2-
320+
321+
{\raggedleft\textsc{2011}\par}
322+
323+
{\raggedright\large Student research assistant at \textsc{ Institute of Toxicology and Genetics}, KIT\\
324+
\textit{participating in a university research project}\\[5pt]}
325+
326+
\normalsize{In summer 2011 I worked for over a month for a
327+
research project at KIT. I have written bash scripts for file
328+
conversions, fixed some bugs and re-written a slow Mathematica script
329+
in a much faster Python version. But it quickly turned out that
330+
this project had a lot of C++ source which was rarely commented or
331+
documented. I realized, that I wouldn't have time for this project
332+
after beginning my studies at university.}\\
333+
334334
%----------------------------------------------------------------------------------------
335335
% WORK EXPERIENCE -3-
336336

Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
\begin{abstract}
22
This paper reviews the most common activation functions for convolution neural
3-
networks. They are evaluated on TODO dataset and possible reasons for the
4-
differences in their performance are given.
3+
networks. They are evaluated on the Asirra, GTSRB, HASYv2, STL-10, CIFAR-10,
4+
CIFAR-100 and MNIST dataset. Possible reasons for the differences in their
5+
performance are given.
56

6-
New state of the art results are achieved for TODO.
7+
New state of the art results are achieved for Asirra, GTSRB, HASYv2 and STL-10.
78
\end{abstract}

publications/activation-functions/appendix.tex

+99-16
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,17 @@ \section*{Overview}
77
\centering
88
\hspace*{-1cm}\begin{tabular}{lllll}
99
\toprule
10-
Name & Function $\varphi(x)$ & Range of Values & $\varphi'(x)$ \\\midrule % & Used by
11-
Sign function$^\dagger$ & $\begin{cases}+1 &\text{if } x \geq 0\\-1 &\text{if } x < 0\end{cases}$ & $\Set{-1,1}$ & $0$ \\%& \cite{971754} \\
12-
\parbox[t]{2.6cm}{Heaviside\\step function$^\dagger$} & $\begin{cases}+1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ & $\Set{0, 1}$ & $0$ \\%& \cite{mcculloch1943logical}\\
13-
Logistic function & $\frac{1}{1+e^{-x}}$ & $[0, 1]$ & $\frac{e^x}{(e^x +1)^2}$ \\%& \cite{duch1999survey} \\
14-
Tanh & $\frac{e^x - e^{-x}}{e^x + e^{-x}} = \tanh(x)$ & $[-1, 1]$ & $\sech^2(x)$ \\%& \cite{LeNet-5,Thoma:2014}\\
15-
\gls{ReLU}$^\dagger$ & $\max(0, x)$ & $[0, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ \\%& \cite{AlexNet-2012}\\
16-
\parbox[t]{2.6cm}{\gls{LReLU}$^\dagger$\footnotemark\\(\gls{PReLU})} & $\varphi(x) = \max(\alpha x, x)$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha &\text{if } x < 0\end{cases}$ \\%& \cite{maas2013rectifier,he2015delving} \\
17-
Softplus & $\log(e^x + 1)$ & $(0, +\infty)$ & $\frac{e^x}{e^x + 1}$ \\%& \cite{dugas2001incorporating,glorot2011deep} \\
18-
\gls{ELU} & $\begin{cases}x &\text{if } x > 0\\\alpha (e^x - 1) &\text{if } x \leq 0\end{cases}$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha e^x &\text{otherwise}\end{cases}$ \\%& \cite{clevert2015fast} \\
19-
Softmax$^\ddagger$ & $o(\mathbf{x})_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ & $[0, 1]^K$ & $o(\mathbf{x})_j \cdot \frac{\sum_{k=1}^K e^{x_k} - e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ \\%& \cite{AlexNet-2012,Thoma:2014}\\
20-
Maxout$^\ddagger$ & $o(\mathbf{x}) = \max_{x \in \mathbf{x}} x$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x_i = \max \mathbf{x}\\0 &\text{otherwise}\end{cases}$ \\%& \cite{goodfellow2013maxout} \\
10+
Name & Function $\varphi(x)$ & Range of Values & $\varphi'(x)$ & Used by \\\midrule %
11+
Sign function$^\dagger$ & $\begin{cases}+1 &\text{if } x \geq 0\\-1 &\text{if } x < 0\end{cases}$ & $\Set{-1,1}$ & $0$ & \cite{971754} \\
12+
\parbox[t]{2.6cm}{Heaviside\\step function$^\dagger$} & $\begin{cases}+1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ & $\Set{0, 1}$ & $0$ & \cite{mcculloch1943logical}\\
13+
Logistic function & $\frac{1}{1+e^{-x}}$ & $[0, 1]$ & $\frac{e^x}{(e^x +1)^2}$ & \cite{duch1999survey} \\
14+
Tanh & $\frac{e^x - e^{-x}}{e^x + e^{-x}} = \tanh(x)$ & $[-1, 1]$ & $\sech^2(x)$ & \cite{LeNet-5,Thoma:2014}\\
15+
\gls{ReLU}$^\dagger$ & $\max(0, x)$ & $[0, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ & \cite{AlexNet-2012}\\
16+
\parbox[t]{2.6cm}{\gls{LReLU}$^\dagger$\footnotemark\\(\gls{PReLU})} & $\varphi(x) = \max(\alpha x, x)$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha &\text{if } x < 0\end{cases}$ & \cite{maas2013rectifier,he2015delving} \\
17+
Softplus & $\log(e^x + 1)$ & $(0, +\infty)$ & $\frac{e^x}{e^x + 1}$ & \cite{dugas2001incorporating,glorot2011deep} \\
18+
\gls{ELU} & $\begin{cases}x &\text{if } x > 0\\\alpha (e^x - 1) &\text{if } x \leq 0\end{cases}$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha e^x &\text{otherwise}\end{cases}$ & \cite{clevert2015fast} \\
19+
Softmax$^\ddagger$ & $o(\mathbf{x})_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ & $[0, 1]^K$ & $o(\mathbf{x})_j \cdot \frac{\sum_{k=1}^K e^{x_k} - e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ & \cite{AlexNet-2012,Thoma:2014}\\
20+
Maxout$^\ddagger$ & $o(\mathbf{x}) = \max_{x \in \mathbf{x}} x$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x_i = \max \mathbf{x}\\0 &\text{otherwise}\end{cases}$ & \cite{goodfellow2013maxout} \\
2121
\bottomrule
2222
\end{tabular}
2323
\caption[Activation functions]{Overview of activation functions. Functions
@@ -63,13 +63,11 @@ \section*{Evaluation Results}
6363
\end{tabular}
6464
\caption[Activation function evaluation results on CIFAR-100]{Training and
6565
test accuracy of adjusted baseline models trained with different
66-
activation functions on CIFAR-100. For LReLU, $\alpha = 0.3$ was
66+
activation functions on CIFAR-100. For \gls{LReLU}, $\alpha = 0.3$ was
6767
chosen.}
6868
\label{table:CIFAR-100-accuracies-activation-functions}
6969
\end{table}
7070

71-
\glsreset{LReLU}
72-
7371
\begin{table}[H]
7472
\centering
7573
\setlength\tabcolsep{1.5pt}
@@ -91,7 +89,7 @@ \section*{Evaluation Results}
9189
\end{tabular}
9290
\caption[Activation function evaluation results on HASYv2]{Test accuracy of
9391
adjusted baseline models trained with different activation
94-
functions on HASYv2. For LReLU, $\alpha = 0.3$ was chosen.}
92+
functions on HASYv2. For \gls{LReLU}, $\alpha = 0.3$ was chosen.}
9593
\label{table:HASYv2-accuracies-activation-functions}
9694
\end{table}
9795

@@ -116,8 +114,93 @@ \section*{Evaluation Results}
116114
\end{tabular}
117115
\caption[Activation function evaluation results on STL-10]{Test accuracy of
118116
adjusted baseline models trained with different activation
119-
functions on STL-10. For LReLU, $\alpha = 0.3$ was chosen.}
117+
functions on STL-10. For \gls{LReLU}, $\alpha = 0.3$ was chosen.}
120118
\label{table:STL-10-accuracies-activation-functions}
121119
\end{table}
122120

121+
\begin{table}[H]
122+
\centering
123+
\hspace*{-1cm}\begin{tabular}{lllll}
124+
\toprule
125+
Name & Function $\varphi(x)$ & Range of Values & $\varphi'(x)$ \\\midrule % & Used by
126+
Sign function$^\dagger$ & $\begin{cases}+1 &\text{if } x \geq 0\\-1 &\text{if } x < 0\end{cases}$ & $\Set{-1,1}$ & $0$ \\%& \cite{971754} \\
127+
\parbox[t]{2.6cm}{Heaviside\\step function$^\dagger$} & $\begin{cases}+1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ & $\Set{0, 1}$ & $0$ \\%& \cite{mcculloch1943logical}\\
128+
Logistic function & $\frac{1}{1+e^{-x}}$ & $[0, 1]$ & $\frac{e^x}{(e^x +1)^2}$ \\%& \cite{duch1999survey} \\
129+
Tanh & $\frac{e^x - e^{-x}}{e^x + e^{-x}} = \tanh(x)$ & $[-1, 1]$ & $\sech^2(x)$ \\%& \cite{LeNet-5,Thoma:2014}\\
130+
\gls{ReLU}$^\dagger$ & $\max(0, x)$ & $[0, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ \\%& \cite{AlexNet-2012}\\
131+
\parbox[t]{2.6cm}{\gls{LReLU}$^\dagger$\footnotemark\\(\gls{PReLU})} & $\varphi(x) = \max(\alpha x, x)$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha &\text{if } x < 0\end{cases}$ \\%& \cite{maas2013rectifier,he2015delving} \\
132+
Softplus & $\log(e^x + 1)$ & $(0, +\infty)$ & $\frac{e^x}{e^x + 1}$ \\%& \cite{dugas2001incorporating,glorot2011deep} \\
133+
\gls{ELU} & $\begin{cases}x &\text{if } x > 0\\\alpha (e^x - 1) &\text{if } x \leq 0\end{cases}$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha e^x &\text{otherwise}\end{cases}$ \\%& \cite{clevert2015fast} \\
134+
Softmax$^\ddagger$ & $o(\mathbf{x})_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ & $[0, 1]^K$ & $o(\mathbf{x})_j \cdot \frac{\sum_{k=1}^K e^{x_k} - e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ \\%& \cite{AlexNet-2012,Thoma:2014}\\
135+
Maxout$^\ddagger$ & $o(\mathbf{x}) = \max_{x \in \mathbf{x}} x$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x_i = \max \mathbf{x}\\0 &\text{otherwise}\end{cases}$ \\%& \cite{goodfellow2013maxout} \\
136+
\bottomrule
137+
\end{tabular}
138+
\caption[Activation functions]{Overview of activation functions. Functions
139+
marked with $\dagger$ are not differentiable at 0 and functions
140+
marked with $\ddagger$ operate on all elements of a layer
141+
simultaneously. The hyperparameters $\alpha \in (0, 1)$ of Leaky
142+
ReLU and ELU are typically $\alpha = 0.01$. Other activation
143+
function like randomized leaky ReLUs exist~\cite{xu2015empirical},
144+
but are far less commonly used.\\
145+
Some functions are smoothed versions of others, like the logistic
146+
function for the Heaviside step function, tanh for the sign
147+
function, softplus for ReLU.\\
148+
Softmax is the standard activation function for the last layer of
149+
a classification network as it produces a probability
150+
distribution. See \Cref{fig:activation-functions-plot} for a plot
151+
of some of them.}
152+
\label{table:activation-functions-overview}
153+
\end{table}
154+
\footnotetext{$\alpha$ is a hyperparameter in leaky ReLU, but a learnable parameter in the parametric ReLU function.}
155+
156+
\begin{figure}[ht]
157+
\centering
158+
\begin{tikzpicture}
159+
\definecolor{color1}{HTML}{E66101}
160+
\definecolor{color2}{HTML}{FDB863}
161+
\definecolor{color3}{HTML}{B2ABD2}
162+
\definecolor{color4}{HTML}{5E3C99}
163+
\begin{axis}[
164+
legend pos=north west,
165+
legend cell align={left},
166+
axis x line=middle,
167+
axis y line=middle,
168+
x tick label style={/pgf/number format/fixed,
169+
/pgf/number format/fixed zerofill,
170+
/pgf/number format/precision=1},
171+
y tick label style={/pgf/number format/fixed,
172+
/pgf/number format/fixed zerofill,
173+
/pgf/number format/precision=1},
174+
grid = major,
175+
width=16cm,
176+
height=8cm,
177+
grid style={dashed, gray!30},
178+
xmin=-2, % start the diagram at this x-coordinate
179+
xmax= 2, % end the diagram at this x-coordinate
180+
ymin=-1, % start the diagram at this y-coordinate
181+
ymax= 2, % end the diagram at this y-coordinate
182+
xlabel=x,
183+
ylabel=y,
184+
tick align=outside,
185+
enlargelimits=false]
186+
\addplot[domain=-2:2, color1, ultra thick,samples=500] {1/(1+exp(-x))};
187+
\addplot[domain=-2:2, color2, ultra thick,samples=500] {tanh(x)};
188+
\addplot[domain=-2:2, color4, ultra thick,samples=500] {max(0, x)};
189+
\addplot[domain=-2:2, color4, ultra thick,samples=500, dashed] {ln(exp(x) + 1)};
190+
\addplot[domain=-2:2, color3, ultra thick,samples=500, dotted] {max(x, exp(x) - 1)};
191+
\addlegendentry{$\varphi_1(x)=\frac{1}{1+e^{-x}}$}
192+
\addlegendentry{$\varphi_2(x)=\tanh(x)$}
193+
\addlegendentry{$\varphi_3(x)=\max(0, x)$}
194+
\addlegendentry{$\varphi_4(x)=\log(e^x + 1)$}
195+
\addlegendentry{$\varphi_5(x)=\max(x, e^x - 1)$}
196+
\end{axis}
197+
\end{tikzpicture}
198+
\caption[Activation functions]{Activation functions plotted in $[-2, +2]$.
199+
$\tanh$ and ELU are able to produce negative numbers. The image of
200+
ELU, ReLU and Softplus is not bound on the positive side, whereas
201+
$\tanh$ and the logistic function are always below~1.}
202+
\label{fig:activation-functions-plot}
203+
\end{figure}
204+
205+
\glsreset{LReLU}
123206
\twocolumn

publications/activation-functions/content.tex

+33-15
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,42 @@
11
%!TEX root = main.tex
22
\section{Introduction}
3-
TODO\cite{Thoma:2014}
4-
5-
\section{Terminology}
6-
TODO
3+
Artificial neural networks have dozends of hyperparameters which influence
4+
their behaviour during training and evaluation time. One parameter is the
5+
choice of activation functions. While in principle every neuron could have a
6+
different activation function, in practice networks only use two activation
7+
functions: The softmax function for the output layer in order to obtain a
8+
probability distribution over the possible classes and one activation function
9+
for all other neurons.
710

11+
Activation functions should have the following properties:
12+
\begin{itemize}
13+
\item \textbf{Non-linearity}: A linear activation function in a simple feed
14+
forward network leads to a linear function. This means no matter how
15+
many layers the network uses, there is an equivalent network with
16+
only the input and the output layer. Please note that \glspl{CNN} are
17+
different. Padding and pooling are also non-linear operations.
18+
\item \textbf{Differentiability}: Activation functions need to be
19+
differentiable in order to be able to apply gradient descent. It is
20+
not necessary that they are differentiable at any point. In practice,
21+
the gradient at non-differentiable points can simply be set to zero
22+
in order to prevent weight updates at this point.
23+
\item \textbf{Non-zero gradient}: The sign function is not suitable for
24+
gradient descent based optimizers as its gradient is zero at all
25+
differentiable points. An activation function should have infinitely
26+
many points with non-zero gradient.
27+
\end{itemize}
828

9-
\section{Activation Functions}
10-
Nonlinear, differentiable activation functions are important for neural
11-
networks to allow them to learn nonlinear decision boundaries. One of the
12-
simplest and most widely used activation functions for \glspl{CNN} is
13-
\gls{ReLU}~\cite{AlexNet-2012}, but others such as
29+
One of the simplest and most widely used activation functions for \glspl{CNN}
30+
is \gls{ReLU}~\cite{AlexNet-2012}, but others such as
1431
\gls{ELU}~\cite{clevert2015fast}, \gls{PReLU}~\cite{he2015delving}, softplus~\cite{7280459}
15-
and softsign~\cite{bergstra2009quadratic} have been proposed. The baseline uses
16-
\gls{ELU}.
32+
and softsign~\cite{bergstra2009quadratic} have been proposed.
1733

1834
Activation functions differ in the range of values and the derivative. The
1935
definitions and other comparisons of eleven activation functions are given
2036
in~\cref{table:activation-functions-overview}.
2137

38+
39+
\section{Important Differences of Proposed Activation Functions}
2240
Theoretical explanations why one activation function is preferable to another
2341
in some scenarios are the following:
2442
\begin{itemize}
@@ -96,6 +114,7 @@ \section{Activation Functions}
96114
logistic function has a much shorter training time and a noticeably lower test
97115
accuracy.
98116

117+
\glsunset{LReLU}
99118
\begin{table}[H]
100119
\centering
101120
\begin{tabular}{lccc}
@@ -111,7 +130,7 @@ \section{Activation Functions}
111130
ReLU & \cellcolor{yellow!25}Yes\footnotemark & \cellcolor{red!25} No & \cellcolor{yellow!25}Half-sided \\
112131
Softplus & \cellcolor{green!25}No & \cellcolor{red!25} No & \cellcolor{yellow!25}Half-sided \\
113132
S2ReLU & \cellcolor{green!25}No & \cellcolor{green!25}Yes & \cellcolor{green!25} No \\
114-
LReLU/PReLU & \cellcolor{green!25}No & \cellcolor{green!25}Yes & \cellcolor{green!25} No \\
133+
\gls{LReLU}/PReLU & \cellcolor{green!25}No & \cellcolor{green!25}Yes & \cellcolor{green!25} No \\
115134
ELU & \cellcolor{green!25}No & \cellcolor{green!25}Yes & \cellcolor{green!25} No \\
116135
\bottomrule
117136
\end{tabular}
@@ -120,8 +139,6 @@ \section{Activation Functions}
120139
\end{table}
121140
\footnotetext{The dying ReLU problem is similar to the vanishing gradient problem.}
122141

123-
\glsunset{LReLU}
124-
125142
\begin{table}[H]
126143
\centering
127144
\begin{tabular}{lccclllll}
@@ -173,4 +190,5 @@ \section{Activation Functions}
173190
functions on MNIST.}
174191
\label{table:MNIST-accuracies-activation-functions}
175192
\end{table}
176-
\glsreset{LReLU}
193+
\glsreset{LReLU}
194+

0 commit comments

Comments
 (0)