MACHINE LEARNING - DAY 4
If you have left the earlier tutorials
then you can learn them which are the basics by clicking the following links in the day- order wise :
Day 1: What is Machine Learning and it's Types
Day 2: Linear Regression with single variable
Day 3: Multiple Linear Regression
Continuing our learning in machine
learning let's now move on to today's topic.
IMPROVING THE PERFORMANCE OF GRADIENT DESCENT
Technique 1: Feature Scaling
Feature Scaling is dividing the input values by some fixed
value, for eg. maximum input value. It would make the input values lie within
the range of 1.
Gradient Descent works faster
if each of the features lies roughly within the same range. This is because Θ
will descend quickly on small ranges and slowly on large ranges, and so will
oscillate inefficiently down to the optimum when the variables are very uneven.
Feature 1: 2,3,4,5
Feature 2: 180,200,170,120
Feature Scaling for feature 1:
2/5,3/5,4/5,5/5
Feature Scaling for feature 2:
180/200, 200/200, 170/200, 120/200
That will make both the features lie between the range
0 ≤ x ≤ 1
which will enhance the speed of
calculating the gradient descent.
The range should mostly lie
between:
-1 ≤ x ≤ 1 or -0.5
≤ x ≤ 0.5
The above mentioned range is
the most effective range.
Ranges can vary from
0 ≤ x ≤ 3
Till this range the results are
considerable.
Technique 2: Mean
Normalization
Mean Normalization is
subtracting the average value from the input variables and then dividing that
by the standard deviation.
xi = xi -
μi
si
Technique to Learn if
Gradient Descent is working properly:
Method 1:
Plot the cost function against
number of iterations.
In the above graph after 350 iterations there doesn't seem to be a considerable change in the value of the cost function, i.e., it took 350 iterations to find the accurate Θ values.
After each iteration the graph
should converge or decrease.
If the graph or the y-value
doesn't decrease, it means there is some problem with the gradient descent
calculation, generally the learning rate.
Method 2:
Declare convergence or
optimum value:
Choose some minimum value or
threshold value and if the decrement in the Θ value is less than that it means
we have reached the optimum or the required value.
NOTE: Method 1 is better than
Method 2 since deciding the threshold value in Method 2 is very difficult.0.
LEARNING RATE:
· It is mathematically proven, if α is sufficiently small, J(Θ) should
decrease at each iteration.
· If α is too small it will take very baby steps and will lead to
delay or too much time consumption in finding the optimum value i.e., very slow
computation.
CHOOSING THE LEARNING RATE(α):
Optimum value for α is 0.1, 0.01, 0.001.
To increase the value of α multiply the values with 3 i.e.,
0.3, 0.03, 0.003.
That’s all for our Day 4. Next
we will learn about Normal Equations in Day 5 which will be uploaded soon.
If you feel this article helped
you in any way do not forget to share and if you have any thoughts or doubts
do write them in the comment section.
No comments:
Post a Comment