Bias and variance are nothing more than the errors in the machine learning model which can be understood by the concept of underfitting and overfitting of the curve. First, let’s understand what over and under fitting means. Both the terms mean how badly the algorithm is unable to match the given data. It tells how goodbad the accuracy of the model would be. In underfitting, the model is unable to represent the data which is in the target variable while in overfitting the data is non-existing, and the model is trying to reach it. We can more understand this concept by plotting graphs in both cases.
This picture is the representation of how the underfitting, and overfitting looks.
Error of the major problems through which every model is trained. So, this error is something which is how inaccurately the model is predicting the values. Then these errors are broken down to two parts, one is reducible error and other is irreducible error. Reducible errors are the errors which can be further reduced to irreducible error. So, irreducible error is the successor version of reducible error. Bias and variance are both reducible errors.
Bias:
It is the inability of the machine learning algorithm to catch the actual relationship among the data points. It is defined as the difference between the actual value and the predicted value. Sometimes the algorithm gets confused among the data points and it happens because of the assumptions of the models that they take while regressing or classifying or clustering. For example, Linear regression and logistic regression have high bias meaning that they are totally unable to capture the important features of the dataset. While models like Decision tree, KNN, etc. are known as low bias as they make fewer assumptions about the target variable.
Now, the question remains how to reduce high bias, to solve this question we can increase the features in the model, decrease the regularization term and use higher degree models.
Variance:
It is the spread of the target variable from the data we have or the actual value of the target variable. In other words, it can be taken as how much more variation is in the predicted values than the actual values. Variance can be high or less. In high variance, there is a large variance of the data that we have predicted while in less variance we have less variations in the predicted values from the actual values.
Algorithms like decision tree, KNN, etc. have high variance while logistic regression and linear regression, etc. modes have less variance.
How to solve the variance issue can be seen like this, we can reduce the input feature. We can increase the training data etc.
Bias-Variance Trade off:
In this graph we can see that the line passing from the intersection of the curves of variance decreases as the model complexity decreases while the biasness decreases as the model complexity increases. So, from here we can conclude that,
If we increase the variance, bias will reduce while if we increase the bias variance will automatically decrease.