Gradient Descent in Machine Learning

4 min readJun 13, 2021

Life is a Gradient Descent. We set up the goals and we gradually move to achieve them from our current position. Similarly, different events in our lives are influenced by gradient descent. Like you walking the smallest distance from home to school, choosing the shortest path for reaching down at the base of the camp, predicting the stock in the stock market for getting less loss etc. We want all these things to occur with precision and accuracy. So let’s look into the popular optimization technique called Gradient Descent Algorithm that can help in minimizing the loss and makes every Machine learning model predict more accurately.

What is Gradient Descent?

Gradient Descent is one of the most popular optimization algorithms for training any Machine Learning model. The purpose of this optimization is to achieve the best design relative to minimizing losses and maximizing accuracy.

Before knowing how this all works we must understand the cost function.

Cost function

The objective in the case of gradient descent is to find a line that best fits given inputs, to give the appropriate output. With a known set of inputs and their corresponding outputs, a machine learning model attempts to make predictions according to the new set of inputs. We can find the best fit by changing the weight and bias so that our predicted value can get closer to the output value.

From the predicted result as shown in the figure, got by best fitting our linear regression model, we got an error like d1,d2,d3,d4 and d5. Thus it can also be understood as the difference between predicted output and actual value. Thus we get,

ERROR = Y’(PREDICTED) — Y(ACTUAL)

The Error would be the difference between the two predictions. The loss function is the error of a single training example whereas the cost function is the mean sum of all the loss function i.e. cost(MSE) = (d1²+d2²+d3²+d4²+d5²)/n where n=5.

Thus,

The goal of Machine learning algorithm is to minimize this cost function.

How does it work and how the goal be achieved?

A gradient is the slope of a function measuring the degree of change of a variable in response to the changes of another variable in each iteration. It gives a convex output by partial derivative of the parameters of input. It can be given by:

This shows in every iteration our new position converses by the partial derivative(gradient) multiply by learning rate(alpha). We should take every step to ensure that we minimize the cost function and reach the global minimum.

But the question is how to reach the minima???

Gradient Descent Algorithm helps us to make these decisions efficiently and effectively with the use of derivatives which is the slope of the graph at a particular position. The slope is described by drawing a tangent that directs to reach the local minima. It can be given as below.

Now how can we make it to the global minimum so precisely? How to take the step? This can be understood by adjusting a step called the learning rate (alpha). A larger learning rate covers more distance and tends to overshoot the minimum whereas a smaller learning rate takes small efficient steps that lead to the lowest altitude. We can see this in the example below.

learning rate: 0.10 reached minima at 84 steps

learning rate at 1.0 reached minima at 6 steps

learning rate at 2.80 reached minima with17 steps *source*

Hence from the above example, we ensured that we should correctly choose the learning rate. Technically, the small value of the learning rate tends to reach the minima more easily than the larger one gets overshoot and diverge instead of conversing into global minimum giving the false result. So, we should also be careful in choosing the learning rate.

To sum up, gradient descent has become the best optimization technique for linear Regression model and also be used in further different Machine Learning techniques.

Gradient Descent in Machine Learning

Cost function

ERROR = Y’(PREDICTED) — Y(ACTUAL)

Written by Aayushma Pant