Introduction:
Gradient descent is a popular optimization approach for determining the weights or coefficients of machine learning algorithms like artificial neural networks and logistic regression.
It works by having the model make predictions on training data and then utilising the prediction error to update the model in order to lower the error.
The algorithm’s purpose is to select model parameters (e.g., coefficients or weights) that minimise the model’s error on the training dataset. It accomplishes this by modifying the model to bring it along a gradient or slope of mistakes toward a minimum error value. As a result, the algorithm is known as “gradient descent.”
Mini-Batch Gradient Descent:
The mini-batch gradient descent approach divides the training dataset into small batches that are used to calculate model error and update model coefficients.
Implementations may choose to sum the gradient over the mini-batch, which minimises the gradient’s variance even further.
Mini-batch gradient descent attempts to strike a balance between stochastic gradient descent’s resilience and batch gradient descent’s efficiency. It is the most frequent gradient descent implementation used in deep learning.
Source: https://stats.stackexchange.com/questions/488017/understanding-mini-batch-gradient-descent
Algorithm:
Source: https://www.geeksforgeeks.org/ml-mini-batch-gradient-descent-with-python/
Advantages:
- The model update frequency is greater than that of batch gradient descent, allowing for more robust convergence and the avoidance of local minima.
- Batched updates are a more efficient computational approach than stochastic gradient descent.
- Batching provides for both the efficiency of not having all training data in memory and the implementation of algorithms.
Disadvantages:
- Mini-batch requires the learning algorithm to be configured with an additional “mini-batch size” hyperparameter.
- Like batch gradient descent, error information must be accumulated over mini-batches of training samples.
Conclusion:
Mini-Batch Gradient Descent is one of these three types of gradient descent algorithms. It’s not as if one variety is preferred over the others. Depending on the situation and context of the problem, each alternative is employed uniformly.