MACHINE LEARNING - DAY 5
NORMAL EQUATIONS
For previous blogs and notes you can click on the following links:
NORMAL EQUATIONS ROLE AND USE
Normal equations provide a faster way of computation then gradient descent depending on the number of features since there are not many iterations required.
Formula for computing the features coefficient:
Θ = (XT * X)-1 * XT * Y
where,
X: the features matrix
Y: the output matrix
XT : transpose of the features matrix
(XT * X)-1 : inverse of the product of two matrices
Let's consider an example in the given image:
So x0, x1, x2, x3, and x4 are the features and X is the matrix of all the features and Y is the vector of size m * 1 with the real outputs.
With these matrices, the value of Θ for an optimized solution is calculated and we get a nice predicting model.
NOTE: There is no need of feature scaling in normal equation i.e.,
0 < x < 1000
0 < x < 0.00005
doesn’t matter here.
Question: What to choose then Gradient Descent or Normal Equation? Which is better??
Answer:
GRADIENT DESCENT
|
NORMAL EQUATION
|
|
|
|
|
|
|
|
|
So, now we know when to use Gradient Descent and when to use Normal Equations method.
Generally, when the number of features i.e., n = 10000, the performance of normal equations start to decrease since the inverse computation of such a large matrix (m* (10001)) becomes time-consuming.
Question: Since we need to take inverse in the computation of Θ in normal equations then how to deal with the non- invertibility of a matrix i.e., if a matrix in a singular matrix or it doesn’t have an inverse, then what to do?
Answer:
NORMAL EQUATIONS NON-INVERTIBILITY
If a matrix is a singular or degenerate or non-invertible matrix, then it’s inverse is not possible.
So, for this there are only a few cases where this happens, they can be:
1. Redundant Features(linear dependency):
When the features are related to each other linearly.
x1 = 2.5*x2
2. Too many features(m <= n):
Delete some features or use regularization. Use only important features.
Eg. m = 10, n = 100
This creates a problem since we are trying to fit 100+1 features from just 10 records.
That’s all for Day 5. Next, we will learn about CLASSIFICATION AND
REPRESENTATION along-with LOGISTIC REGRESSION.
That’s all for Day 5. Next, we will learn about CLASSIFICATION AND
REPRESENTATION along-with LOGISTIC REGRESSION.
If you think this article helped you in learning some new things or refreshing your
knowledge then do share this article with your friends and everyone. If you have any
thoughts upon this article then do write them in the comment section and hope we learn
new things every day.
knowledge then do share this article with your friends and everyone. If you have any
thoughts upon this article then do write them in the comment section and hope we learn
new things every day.
Till then Happy Learning!!!