Merge pull request #174 from luisoala/patch-1

ranjaykrishna · web-flow · commit e29445b64a53 · 2021-06-14T11:01:09.000-07:00
linear-classify.md - regularizer: SQUARED l2 norm
diff --git a/linear-classify.md b/linear-classify.md
@@ -162,7 +162,7 @@ A last piece of terminology we'll mention before we finish with this section is
 
 **Regularization**. There is one bug with the loss function we presented above. Suppose that we have a dataset and a set of parameters **W** that correctly classify every example (i.e. all scores are so that all the margins are met, and \\(L_i = 0\\) for all i). The issue is that this set of **W** is not necessarily unique: there might be many similar **W** that correctly classify the examples. One easy way to see this is that if some parameters **W** correctly classify all examples (so loss is zero for each example), then any multiple of these parameters \\( \lambda W \\) where \\( \lambda > 1 \\) will also give zero loss because this transformation uniformly stretches all score magnitudes and hence also their absolute differences. For example, if the difference in scores between a correct class and a nearest incorrect class was 15, then multiplying all elements of **W** by 2 would make the new difference 30.
 
-In other words, we wish to encode some preference for a certain set of weights **W** over others to remove this ambiguity. We can do so by extending the loss function with a **regularization penalty** \\(R(W)\\). The most common regularization penalty is the **L2** norm that discourages large weights through an elementwise quadratic penalty over all parameters:
+In other words, we wish to encode some preference for a certain set of weights **W** over others to remove this ambiguity. We can do so by extending the loss function with a **regularization penalty** \\(R(W)\\). The most common regularization penalty is the squared **L2** norm that discourages large weights through an elementwise quadratic penalty over all parameters:
 
 $$
 R(W) = \sum_k\sum_l W_{k,l}^2