Gradient Descent in Machine Learning

Life is a Gradient Descent. We set up the goals and we gradually move to achieve them from our current position. Similarly, different events in our lives are influenced by gradient descent. Like you walking the smallest distance from home to school, choosing the shortest path for reaching down at the base of the camp, predicting the stock in the stock market for getting less loss etc. We want all these things to occur with precision and accuracy. So let’s look into the popular optimization technique called Gradient Descent Algorithm that can help in minimizing the loss and makes every Machine learning model predict more accurately.

What is Gradient Descent?

Gradient Descent is one of the most popular optimization algorithms for training any Machine Learning model. The purpose of this optimization is to achieve the best design relative to minimizing losses and maximizing accuracy.

Before knowing how this all works we must understand the cost function.

Cost function

The objective in the case of gradient descent is to find a line that best fits given inputs, to give the appropriate output. With a known set of inputs and their corresponding outputs, a machine learning model attempts to make predictions according to the new set of inputs. We can find the best fit by changing the weight and bias so that our predicted value can get closer to the output value.

From the predicted result as shown in the figure, got by best fitting our linear regression model, we got an error like d1,d2,d3,d4 and d5. Thus it can also be understood as the difference between predicted output and actual value. Thus we get,

ERROR = Y’(PREDICTED) — Y(ACTUAL)

The Error would be the difference between the two predictions. The loss function is the error of a single training example whereas the cost function is the mean sum of all the loss function i.e. cost(MSE) = (d1²+d2²+d3²+d4²+d5²)/n where n=5.

Thus,

The goal of Machine learning algorithm is to minimize this cost function.

How does it work and how the goal be achieved?

A gradient is the slope of a function measuring the degree of change of a variable in response to the changes of another variable in each iteration. It gives a convex output by partial derivative of the parameters of input. It can be given by:

This shows in every iteration our new position converses by the partial derivative(gradient) multiply by learning rate(alpha). We should take every step to ensure that we minimize the cost function and reach the global minimum.

But the question is how to reach the minima???

Gradient Descent Algorithm helps us to make these decisions efficiently and effectively with the use of derivatives which is the slope of the graph at a particular position. The slope is described by drawing a tangent that directs to reach the local minima. It can be given as below.

Now how can we make it to the global minimum so precisely? How to take the step? This can be understood by adjusting a step called the learning rate (alpha). A larger learning rate covers more distance and tends to overshoot the minimum whereas a smaller learning rate takes small efficient steps that lead to the lowest altitude. We can see this in the example below.

learning rate: 0.10 reached minima at 84 steps
learning rate at 1.0 reached minima at 6 steps
learning rate at 2.80 reached minima with17 steps source

Hence from the above example, we ensured that we should correctly choose the learning rate. Technically, the small value of the learning rate tends to reach the minima more easily than the larger one gets overshoot and diverge instead of conversing into global minimum giving the false result. So, we should also be careful in choosing the learning rate.

To sum up, gradient descent has become the best optimization technique for linear Regression model and also be used in further different Machine Learning techniques.

--

--

--

AI enthusiast

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Behind the buzzwords — Natural Language Processing

KNN- Implementation from scratch (96.6% Accuracy)| Python | Machine Learning

Cell Instance Segmentation Using Mask R-CNN

Artificial Intelligence meets Art: Neural Transfer Style

[ Archived Post ] RL Course by David Silver — Lecture 8: Integrating Learning and Planning 2

Things You Should Do Before Applying Any Machine Learning Model

Understand the data

Meet DeepDPM: No Predefined Number of Clusters Needed for Deep Clustering Tasks

Cross-Validation and Hyperparameter Tuning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aayushma Pant

Aayushma Pant

AI enthusiast

More from Medium

Usage of quantum computing to enhance machine learning

Training ML model to predict personality type..

FORECASTING COVID-19 IN ZAMBIA

Z-score in detail with examples