Skip to content

Commit 4636d53

Browse files
committed
add quick link to softmax stability
1 parent 9e03f7c commit 4636d53

File tree

1 file changed

+13
-9
lines changed

1 file changed

+13
-9
lines changed

linear-classify.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,17 @@ permalink: /linear-classify/
55

66
Table of Contents:
77

8-
- [Intro to Linear classification](#intro)
9-
- [Linear score function](#score)
10-
- [Interpreting a linear classifier](#interpret)
11-
- [Loss function](#loss)
12-
- [Multiclass SVM](#svm)
13-
- [Softmax classifier](#softmax)
14-
- [SVM vs Softmax](#svmvssoftmax)
15-
- [Interactive Web Demo of Linear Classification](#webdemo)
16-
- [Summary](#summary)
8+
- [Linear Classification](#linear-classification)
9+
- [Parameterized mapping from images to label scores](#parameterized-mapping-from-images-to-label-scores)
10+
- [Interpreting a linear classifier](#interpreting-a-linear-classifier)
11+
- [Loss function](#loss-function)
12+
- [Multiclass Support Vector Machine loss](#multiclass-support-vector-machine-loss)
13+
- [Practical Considerations](#practical-considerations)
14+
- [Softmax classifier](#softmax-classifier)
15+
- [SVM vs. Softmax](#svm-vs-softmax)
16+
- [Interactive web demo](#interactive-web-demo)
17+
- [Summary](#summary)
18+
- [Further Reading](#further-reading)
1719

1820
<a name='intro'></a>
1921

@@ -285,6 +287,8 @@ $$
285287

286288
can be interpreted as the (normalized) probability assigned to the correct label \\(y_i\\) given the image \\(x_i\\) and parameterized by \\(W\\). To see this, remember that the Softmax classifier interprets the scores inside the output vector \\(f\\) as the unnormalized log probabilities. Exponentiating these quantities therefore gives the (unnormalized) probabilities, and the division performs the normalization so that the probabilities sum to one. In the probabilistic interpretation, we are therefore minimizing the negative log likelihood of the correct class, which can be interpreted as performing *Maximum Likelihood Estimation* (MLE). A nice feature of this view is that we can now also interpret the regularization term \\(R(W)\\) in the full loss function as coming from a Gaussian prior over the weight matrix \\(W\\), where instead of MLE we are performing the *Maximum a posteriori* (MAP) estimation. We mention these interpretations to help your intuitions, but the full details of this derivation are beyond the scope of this class.
287289

290+
<a name='softmax-stability'></a>
291+
288292
**Practical issues: Numeric stability**. When you're writing code for computing the Softmax function in practice, the intermediate terms \\(e^{f_{y_i}}\\) and \\(\sum_j e^{f_j}\\) may be very large due to the exponentials. Dividing large numbers can be numerically unstable, so it is important to use a normalization trick. Notice that if we multiply the top and bottom of the fraction by a constant \\(C\\) and push it into the sum, we get the following (mathematically equivalent) expression:
289293

290294
$$

0 commit comments

Comments
 (0)