Aalaa4444
diff --git a/‎DataAugmentation/bird.jpg
52.3 KB b/‎DataAugmentation/bird.jpg
52.3 KB
diff --git a/‎DataAugmentation/featureEngineering.ipynb
+401 b/‎DataAugmentation/featureEngineering.ipynb
+401
diff --git a/‎RF&DT/RF&DT.ipynb
+1 b/‎RF&DT/RF&DT.ipynb
+1
diff --git a/‎RF&DT/randomForest.ipynb
+369 b/‎RF&DT/randomForest.ipynb
+369
diff --git a/‎RF&DT/summer-products-with-rating-and-performance_2020-08.csv
+1,576 b/‎RF&DT/summer-products-with-rating-and-performance_2020-08.csv
+1,576
diff --git a/‎SVMkernels/KNN.ipynb
+52 b/‎SVMkernels/KNN.ipynb
+52
diff --git a/‎SVMkernels/SupervisedLearningModels.ipynb
+385 b/‎SVMkernels/SupervisedLearningModels.ipynb
+385
diff --git a/‎constructFeatureMatrix/data/census/test.csv
+16,282 b/‎constructFeatureMatrix/data/census/test.csv
+16,282
diff --git a/‎constructFeatureMatrix/data/census/train.csv
+32,562 b/‎constructFeatureMatrix/data/census/train.csv
+32,562
@@ -0,0 +1,52 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Bonus Assignments\n",
+    "<ul>\n",
+    "<li> What is the disadvantages of the KNN classifier</li>\n",
+    "1. Does not work well with large dataset:\n",
+    "In large datasets, the cost of calculating the distance between the new point and each existing points is huge which degrades the performance of the algorithm.\n",
+    "\n",
+    "2. Does not work well with high dimensions:\n",
+    "The KNN algorithm doesn't work well with high dimensional data because with large number of dimensions, it becomes difficult for the algorithm to calculate the distance in each dimension.\n",
+    "\n",
+    "3. Need feature scaling:\n",
+    "We need to do feature scaling (standardization and normalization) before applying KNN algorithm to any dataset. If we don't do so, KNN may generate wrong predictions.\n",
+    "\n",
+    "4. Sensitive to noisy data, missing values and outliers:\n",
+    "KNN is sensitive to noise in the dataset. We need to manually impute missing values and remove outliers.\n",
+    "<li> How to optimize the KNN algorithm</li>\n",
+    "for a given test sample x:\n",
+    "\n",
+    "   - find K most similar samples from training set, according to similarity measure s\n",
+    "\n",
+    "   - return the majority vote of the class from the above set\n",
+    "   \n",
+    "Consequently the only thing used to define KNN besides K is the similarity measure s, and that's all. There is literally nothing else in this algorithm (as it has 3 lines of pseudocode). On the other hand finding \"the best similarity measure\" is equivalently hard problem as learning a classifier itself, thus there is no real method of doing so, and people usually end up using either simple things (Euclidean distance) or use their domain knowledge to adapt s to the problem at hand.\n",
+    "</ul>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.9.7 ('base')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.9.7"
+  },
+  "orig_nbformat": 4,
+  "vscode": {
+   "interpreter": {
+    "hash": "5179d32cf6ec497baf3f8a3ef987cc77c5d2dc691fdde20a56316522f61a7323"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}