Add bias to documentation of linear classifiers (#3538)

yaeldMS · web-flow · commit b2214d858450 · 2019-04-24T10:03:47.000-07:00
* Add bias to the score formula for binary classifiers * Fix multi-class linear trainers * Fix #3471 * Change back unnecessary changes
diff --git a/src/Microsoft.ML.Mkl.Components/SymSgdClassificationTrainer.cs b/src/Microsoft.ML.Mkl.Components/SymSgdClassificationTrainer.cs
@@ -52,7 +52,8 @@ namespace Microsoft.ML.Trainers
     /// ### Training Algorithm Details
     /// The symbolic stochastic gradient descent is an algorithm that makes its predictions by finding a separating hyperplane.
     /// For instance, with feature values $f0, f1,..., f_{D-1}$, the prediction is given by determining what side of the hyperplane the point falls into.
-    /// That is the same as the sign of the feature's weighted sum, i.e. $\sum_{i = 0}^{D-1} (w_i * f_i)$, where $w_0, w_1,..., w_{D-1}$ are the weights computed by the algorithm.
+    /// That is the same as the sign of the feature's weighted sum, i.e. $\sum_{i = 0}^{D-1} (w_i * f_i) + b$, where $w_0, w_1,..., w_{D-1}$
+    /// are the weights computed by the algorithm, and $b$ is the bias computed by the algorithm.
     ///
     /// While most symbolic stochastic gradient descent algorithms are inherently sequential - at each step, the processing of the current example depends on the parameters learned from previous examples.
     /// This algorithm trains local models in separate threads and probabilistic model cobminer that allows the local models to be combined
diff --git a/src/Microsoft.ML.StandardTrainers/Standard/Online/AveragedPerceptron.cs b/src/Microsoft.ML.StandardTrainers/Standard/Online/AveragedPerceptron.cs
@@ -42,8 +42,9 @@ namespace Microsoft.ML.Trainers
     ///
     /// ### Training Algorithm Details
     /// The perceptron is a classification algorithm that makes its predictions by finding a separating hyperplane.
-    /// For instance, with feature values $f0, f1,..., f_{D-1}$, the prediction is given by determining what side of the hyperplane the point falls into.
-    /// That is the same as the sign of the feautures' weighted sum, i.e. $\sum_{i = 0}^{D-1} (w_i * f_i)$, where $w_0, w_1,..., w_{D-1}$ are the weights computed by the algorithm.
+    /// For instance, with feature values $f_0, f_1,..., f_{D-1}$, the prediction is given by determining what side of the hyperplane the point falls into.
+    /// That is the same as the sign of the feautures' weighted sum, i.e. $\sum_{i = 0}^{D-1} (w_i * f_i) + b$, where $w_0, w_1,..., w_{D-1}$
+    /// are the weights computed by the algorithm, and $b$ is the bias computed by the algorithm.
     ///
     /// The perceptron is an online algorithm, which means it processes the instances in the training set one at a time.
     /// It starts with a set of initial weights (zero, random, or initialized from a previous learner). Then, for each example in the training set, the weighted sum of the features is computed.
diff --git a/src/Microsoft.ML.StandardTrainers/Standard/Online/LinearSvm.cs b/src/Microsoft.ML.StandardTrainers/Standard/Online/LinearSvm.cs
@@ -48,7 +48,8 @@ namespace Microsoft.ML.Trainers
     /// Linear [SVM](https://en.wikipedia.org/wiki/Support-vector_machine#Linear_SVM) implements
     /// an algorithm that finds a hyperplane in the feature space for binary classification, by solving an [SVM problem](https://en.wikipedia.org/wiki/Support-vector_machine#Computing_the_SVM_classifier).
     /// For instance, with feature values $f_0, f_1,..., f_{D-1}$, the prediction is given by determining what side of the hyperplane the point falls into.
-    /// That is the same as the sign of the feautures' weighted sum, i.e. $\sum_{i = 0}^{D-1} \left(w_i * f_i \right) + b$, where $w_0, w_1,..., w_{D-1}$ and $b$ are the weights and bias computed by the algorithm.
+    /// That is the same as the sign of the feautures' weighted sum, i.e. $\sum_{i = 0}^{D-1} \left(w_i * f_i \right) + b$, where $w_0, w_1,..., w_{D-1}$
+    /// are the weights computed by the algorithm, and $b$ is the bias computed by the algorithm.
     ///
     /// This algorithm implemented is the PEGASOS method, which alternates between stochastic gradient descent steps and projection steps,
     /// introduced in [this paper](http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf) by Shalev-Shwartz, Singer and Srebro.