AUC-ROC in MACHINE LEARNING
Anuj Singh
Student, Data Science
NMIMS, Indore
AUC-ROC stands for Area Under Curve for Receiver Operating Characteristics. It was used during the world war II for the first time to detect signals whether it was affected by noise or not. It measures the performance of the binary classifiers. ROC curve is the trade between TPR and FPR. TPR stands for True Positive Rate and FPR for False Positive Rates.
TPR is the fraction of the positive that is correctly classified value (Sensitivity)
FPR is the fraction of the negative that is incorrectly classified value (Fall out)
Critical points:
- TPR = FPR = 0 :- negative
- TPR = FPR = 1 :- Positive
- TPR = 1 FPR = 0 :- Ideal Class
AUC-ROC curve helps us visualize how well our machine learning classifier is performing. Although it works for only binary classification problems.
Sensitivity tells us what proportion of the positive class got correctly classified. A simple example would be to determine what proportion of the actual sick people were correctly detected by the model.
Sensitivity = TP / (TP+FN)
False Negative Rate (FNR) tells us what proportion of the positive class got incorrectly classified by the classifier. A higher TPR and a lower FNR is desirable since we want to correctly classify the positive class.
FNR = FN/ (TP + FN)
Specificity tells us what proportion of the negative class was correctly classified for eg. determining the proportion of healthy people correctly classified by the model.
Specificity = TN / (TN + FP)
False Positive Rate tells us what proportion of the negative class got incorrectly classified by the classifier. A higher TNR and a lower FPR is desirable since we want to correctly classify the negative class. Out of these metrics, Sensitivity and Specificity are perhaps the most important and we will see later on how these are used to build an evaluation metric. But before that, let’s understand why the probability of prediction is better than predicting the target class directly.
FPR = 1 – specificity
What is the AUC-ROC curve?
The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems. It is a probability curve that plots the TPR against FPR at various threshold values and essentially separates the ‘signal’ from the ‘noise’. The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve.
The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.
When AUC = 1, then the classifier is able to perfectly distinguish between all the Positive and the Negative class points correctly. If, however, the AUC had been 0, then the classifier would be predicting all Negatives as Positives, and all Positives as Negatives.
When 0.5<AUC<1, there is a high chance that the classifier will be able to distinguish the positive class values from the negative class values. This is so because the classifier is able to detect more numbers of True positives and True negatives than False negatives and False positives.
When AUC=0.5, then the classifier is not able to distinguish between Positive and Negative class points. Meaning either the classifier is predicting random class or constant class for all the data points.
So, the higher the AUC value for a classifier, the better its ability to distinguish between positive and negative classes.
How Does the AUC-ROC Curve Work?
In a ROC curve, a higher X-axis value indicates a higher number of False positives than True negatives. While a higher Y-axis value indicates a higher number of True positives than False negatives. So, the choice of the threshold depends on the ability to balance between False positives and False negatives.