K nearest neighbors or KNN is one of the simplest Supervised machine learning algorithms, this algorithm assumes that similar things should be near to each other. KNN finds the similarity between elements by calculating the distance between points on a graph, and the distance between the elements is calculated by using Euclidean Distance. KNN algorithm can be used in both classification and regression problems.
Basically, in kNN the distance of all the points is measured from a point k, and then using this distance the similarity can be determined between them, later this data can also be determined to find the class of any new point.
To understand the working of KNN algorithm we can look at the example below:
Here we need to find the class for a person with height = 171 and weight = 57,
By using KNN algorithm we need to find the nearest neighbor using Euclidean Distance Formula, i.e., to find the distance between 2 points x and y
Using KNN we were able to find the appropriate class, i.e., Normal for the person with Height = 171cm and weight = 57kg.
Python Code
Applications of kNN
- Text Mining
- Recommendation Systems
- Agriculture
- Finance, and many more…
Assumptions of kNN
- No multi collinearity
- Non-parametric in nature
- Lazy evaluation
- Requires Scaling
Advantages
- Easy to understand and simple to explain yet extremely powerful
- Used commonly in filtering algorithms in recommender systems.
- Non-parametric and do not have any data assumptions
Disadvantages
- It searches for nearest neighbors for a training point at prediction stage
- Memory intensive
- Sensitive to outliers
- Sensitive to different scales of features in each dataset