Machine Learning- A Student Guide: MACHINE LEARNING

MACHINE LEARNING - DAY 5

NORMAL EQUATIONS

For previous blogs and notes you can click on the following links:

NORMAL EQUATIONS ROLE AND USE

Normal equations provide a faster way of computation then gradient descent depending on the number of features since there are not many iterations required.

Formula for computing the features coefficient:

Θ = (X^T* X)^-1* X^T * Y

where,

X: the features matrix

Y: the output matrix

X^T: transpose of the features matrix

(X^T* X)^-1: inverse of the product of two matrices

Let's consider an example in the given image:

So x_0,x₁, x₂, x₃, and x₄are the features and X is the matrix of all the features and Y is the vector of size m * 1 with the real outputs.

With these matrices, the value of Θ for an optimized solution is calculated and we get a nice predicting model.

NOTE: There is no need of feature scaling in normal equation i.e.,

0 < x < 1000

0 < x < 0.00005

doesn’t matter here.

Question: What to choose then Gradient Descent or Normal Equation? Which is better??

Answer:

GRADIENT DESCENT	NORMAL EQUATION
Alpha value needs to be decided.	No need for alpha.
Needs many iterations	No iterations required.
Complexity: O(kn²)	Complexity: O(n³), since inverse needs to be calculated of X^TX
Works well when n i.e., number of features is large.	It slows down when the number of features increases.

So, now we know when to use Gradient Descent and when to use Normal Equations method.

Generally, when the number of features i.e., n = 10000, the performance of normal equations start to decrease since the inverse computation of such a large matrix (m* (10001)) becomes time-consuming.

Question: Since we need to take inverse in the computation of Θ in normal equations then how to deal with the non- invertibility of a matrix i.e., if a matrix in a singular matrix or it doesn’t have an inverse, then what to do?

Answer:

NORMAL EQUATIONS NON-INVERTIBILITY

If a matrix is a singular or degenerate or non-invertible matrix, then it’s inverse is not possible.

So, for this there are only a few cases where this happens, they can be:

1. Redundant Features(linear dependency):

When the features are related to each other linearly.

x₁= 2.5*x₂

2. Too many features(m <= n):

Delete some features or use regularization. Use only important features.

Eg. m = 10, n = 100

This creates a problem since we are trying to fit 100+1 features from just 10 records.

That’s all for Day 5. Next, we will learn about CLASSIFICATION AND
REPRESENTATION along-with LOGISTIC REGRESSION.

If you think this article helped you in learning some new things or refreshing your
knowledge then do share this article with your friends and everyone. If you have any
thoughts upon this article then do write them in the comment section and hope we learn
new things every day.

Till then Happy Learning!!!

Machine Learning- A Student Guide

Thursday, April 19, 2018

MACHINE LEARNING | NORMAL EQUATIONS

MACHINE LEARNING - DAY 5

No comments:

Post a Comment