|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "## Bonus Assignments\n", |
| 8 | + "<ul>\n", |
| 9 | + "<li> What is the disadvantages of the KNN classifier</li>\n", |
| 10 | + "1. Does not work well with large dataset:\n", |
| 11 | + "In large datasets, the cost of calculating the distance between the new point and each existing points is huge which degrades the performance of the algorithm.\n", |
| 12 | + "\n", |
| 13 | + "2. Does not work well with high dimensions:\n", |
| 14 | + "The KNN algorithm doesn't work well with high dimensional data because with large number of dimensions, it becomes difficult for the algorithm to calculate the distance in each dimension.\n", |
| 15 | + "\n", |
| 16 | + "3. Need feature scaling:\n", |
| 17 | + "We need to do feature scaling (standardization and normalization) before applying KNN algorithm to any dataset. If we don't do so, KNN may generate wrong predictions.\n", |
| 18 | + "\n", |
| 19 | + "4. Sensitive to noisy data, missing values and outliers:\n", |
| 20 | + "KNN is sensitive to noise in the dataset. We need to manually impute missing values and remove outliers.\n", |
| 21 | + "<li> How to optimize the KNN algorithm</li>\n", |
| 22 | + "for a given test sample x:\n", |
| 23 | + "\n", |
| 24 | + " - find K most similar samples from training set, according to similarity measure s\n", |
| 25 | + "\n", |
| 26 | + " - return the majority vote of the class from the above set\n", |
| 27 | + " \n", |
| 28 | + "Consequently the only thing used to define KNN besides K is the similarity measure s, and that's all. There is literally nothing else in this algorithm (as it has 3 lines of pseudocode). On the other hand finding \"the best similarity measure\" is equivalently hard problem as learning a classifier itself, thus there is no real method of doing so, and people usually end up using either simple things (Euclidean distance) or use their domain knowledge to adapt s to the problem at hand.\n", |
| 29 | + "</ul>" |
| 30 | + ] |
| 31 | + } |
| 32 | + ], |
| 33 | + "metadata": { |
| 34 | + "kernelspec": { |
| 35 | + "display_name": "Python 3.9.7 ('base')", |
| 36 | + "language": "python", |
| 37 | + "name": "python3" |
| 38 | + }, |
| 39 | + "language_info": { |
| 40 | + "name": "python", |
| 41 | + "version": "3.9.7" |
| 42 | + }, |
| 43 | + "orig_nbformat": 4, |
| 44 | + "vscode": { |
| 45 | + "interpreter": { |
| 46 | + "hash": "5179d32cf6ec497baf3f8a3ef987cc77c5d2dc691fdde20a56316522f61a7323" |
| 47 | + } |
| 48 | + } |
| 49 | + }, |
| 50 | + "nbformat": 4, |
| 51 | + "nbformat_minor": 2 |
| 52 | +} |
0 commit comments