k Nearest Neighbors

I am a student at NMIMS, currently working as a data science intern at Sophos Knowledge Services, Indore.

K nearest neighbors or KNN is one of the simplest Supervised machine learning algorithms, this algorithm assumes that similar things should be near to each other. KNN finds the similarity between elements by calculating the distance between points on a graph, and the distance between the elements is calculated by using Euclidean Distance. KNN algorithm can be used in both classification and regression problems.

Basically, in kNN the distance of all the points is measured from a point k, and then using this distance the similarity can be determined between them, later this data can also be determined to find the class of any new point.

To understand the working of KNN algorithm we can look at the example below:

Here we need to find the class for a person with height = 171 and weight = 57,

By using KNN algorithm we need to find the nearest neighbor using Euclidean Distance Formula, i.e., to find the distance between 2 points x and y

Using KNN we were able to find the appropriate class, i.e., Normal for the person with Height = 171cm and weight = 57kg.

Python Code

Applications of kNN

Text Mining
Recommendation Systems
Agriculture
Finance, and many more…

Assumptions of kNN

No multi collinearity
Non-parametric in nature
Lazy evaluation
Requires Scaling

Advantages

Easy to understand and simple to explain yet extremely powerful
Used commonly in filtering algorithms in recommender systems.
Non-parametric and do not have any data assumptions

Disadvantages

It searches for nearest neighbors for a training point at prediction stage
Memory intensive
Sensitive to outliers
Sensitive to different scales of features in each dataset

Share this Post with the world!

Part-Time

Full-Time

Advanced

Career