Machine Learning- A Student Guide: OPTIMISING GRADIENT DESCENT

MACHINE LEARNING - DAY 4

If you have left the earlier tutorials then you can learn them which are the basics by clicking the following links in the day- order wise :

Day 1: What is Machine Learning and it's Types

Day 2: Linear Regression with single variable

Day 3: Multiple Linear Regression

Continuing our learning in machine learning let's now move on to today's topic.

IMPROVING THE PERFORMANCE OF GRADIENT DESCENT

Technique 1: Feature Scaling

Feature Scaling is dividing the input values by some fixed value, for eg. maximum input value. It would make the input values lie within the range of 1.

Gradient Descent works faster if each of the features lies roughly within the same range. This is because Θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

Feature 1: 2,3,4,5

Feature 2: 180,200,170,120

Feature Scaling for feature 1: 2/5,3/5,4/5,5/5

Feature Scaling for feature 2: 180/200, 200/200, 170/200, 120/200

That will make both the features lie between the range

0 ≤ x ≤ 1

which will enhance the speed of calculating the gradient descent.

The range should mostly lie between:

-1 ≤ x ≤ 1 or -0.5 ≤ x ≤ 0.5

The above mentioned range is the most effective range.

Ranges can vary from

0 ≤ x ≤ 3

Till this range the results are considerable.

Technique 2: Mean Normalization

Mean Normalization is subtracting the average value from the input variables and then dividing that by the standard deviation.

x_i= x_i - μ_i

s_i

Technique to Learn if Gradient Descent is working properly:

Method 1:

Plot the cost function against number of iterations.

In the above graph after 350 iterations there doesn't seem to be a considerable change in the value of the cost function, i.e., it took 350 iterations to find the accurate Θ values.

After each iteration the graph should converge or decrease.

If the graph or the y-value doesn't decrease, it means there is some problem with the gradient descent calculation, generally the learning rate.

Method 2:

Declare convergence or optimum value:

Choose some minimum value or threshold value and if the decrement in the Θ value is less than that it means we have reached the optimum or the required value.

NOTE: Method 1 is better than Method 2 since deciding the threshold value in Method 2 is very difficult.0.

LEARNING RATE:

· It is mathematically proven, if α is sufficiently small, J(Θ) should decrease at each iteration.

· If α is too small it will take very baby steps and will lead to delay or too much time consumption in finding the optimum value i.e., very slow computation.

CHOOSING THE LEARNING RATE(α):

Optimum value for α is 0.1, 0.01, 0.001.

To increase the value of α multiply the values with 3 i.e., 0.3, 0.03, 0.003.

That’s all for our Day 4. Next we will learn about Normal Equations in Day 5 which will be uploaded soon.

If you feel this article helped you in any way do not forget to share and if you have any thoughts or doubts do write them in the comment section.

Till then Happy Learning!!

Machine Learning- A Student Guide

Friday, April 6, 2018

OPTIMISING GRADIENT DESCENT

MACHINE LEARNING - DAY 4

Day 1: What is Machine Learning and it's Types

Day 2: Linear Regression with single variable

Day 3: Multiple Linear Regression

No comments:

Post a Comment