The following three error metrics are typically used to evaluate and report the performance of a regression model:
- Mean Squared Error
- Root Mean Squared Error
- Mean Absolute Error
Mean Squared Error (MSE)
A regression line’s MSE indicates how near it is to a set of points. It accomplishes this by squaring and averaging the distances between the points to the best fit line or regression line (these distances are the “errors”). Squaring is required to eliminate any negative signals. It also gives significant discrepancies more weight. The better the forecast, the lower the MSE.
MSE = (1/n) * Σ(yi – yicap)2
n= number of samples
yi= actual value
yicap= predicted value
MSE is biased when comes to higher values hence we take RMSE
Root Mean Square Error (RMSE)
In supervised learning, the RMSE is frequently used. It determines the accuracy of the forecast as well as the dispersion of data along the regression line.
Assume that a student’s exam grade is 50 (out of 100) and that the RMSE is 10.
Then we may safely predict that the student’s real grade will be either of [40, 60].
It shows the predictive power and the possibility of deviation from the expected outcome.
n= number of samples
yi= actual value
yicap= predicted value
so,
RMSE= √MSE
the lower the value of RMSE the better.
Mean Absolute Error
The average of all individual prediction errors on all occurrences in the test set is the mean absolute error of a model with respect to a test set.
n = no. of errors,
Σ = add them all up
|xi – x| = the absolute errors.
The formula may appear complicated; however, the steps are simple:
- Identify all of your absolute errors (xi–x).
- Add them all together.
- Divide by the number of errors.
A data science student and currently an intern at Sophos Knowledge Services