Friday, May 4, 2018

MACHINE LEARNING | CLASSIFICATION AND LOGISTIC REGRESSION

MACHINE LEARNING - DAY 6

CLASSIFICATION AND LOGISTIC REGRESSION MODEL



Before moving to today's topic, if you haven't checked the earlier topics which are the basics then visit the following links:





CLASSIFICATION:

E-mail : spam/ not spam
Online transactions: Fraudulent(yes/no)

These two are examples of where classification algorithms are used.

y belongs to {0,1}

0: negative class(absence of something or some value eg. not spam mail).
1: positive class(presence of something or some value eg. spam email).

0 and 1 are classification levels or indicators, it doesn’t matters which class they belong to. No hard rules.


LOGISTIC REGRESSION MODEL:

A classification algorithm which helps in providing the probability of the occurrence of an event or the probability of non-occurrence of an event.
It gives the output in 1 or 0.

HYPOTHESIS:

In linear regression the hypothesis was:

hΘ(x) = ΘTX

In logistic regression the hypothesis is:

hΘ(x) = g(ΘTX)

where,

g(z) = 1/(1+(e^(-z))

z = ΘTX

1/(1+(e^(-z)) is called the sigmoid function or the logistic function.



Therefore, the hypothesis for the logistic regression is:

hΘ(x)  = 1/(1+(e^(-z))

INTERPRETATION OF HYPOTHESIS OUTPUT

hΘ(x) = estimated probability that y = 1 on input x.

Eg. If x = [x0  = [1
x1]   tumorsize]

hΘ(x) = 0.7, y = 1

It means that there is a chance of 70% that the patient has a malignant tumor.

It can be written in probability as

hΘ(x) = P(y = 1 | x; Θ )

P(y = 0 | x; Θ) = 1 - P(y = 1 | x; Θ )

It is read like ‘probability that y = 1, on a given x, parameterized by Θ.

LINEAR REGRESSION VS LOGISTIC REGRESSION

For now, let’s discuss classification algorithm for 2 possible outcomes i.e., 0 and 1. We’ll see classification for multiple outcomes i.e., 0,1,2,3 etc. in the upcoming blogs.

Example for classification problem:



Parameters can be threshold classifier output hΘ(x) at 0.5.

if hΘ(x) >= 0.5, predict y=1,

if hΘ(x) < 0.5, predict y=0.

In the above example the linear regression model fits nicely but what if there is a point a bit far towards positive x-axis as shown in the figure below,




the fit gets changed and is not good now. In the previous example, it was just a matter of luck or chance that regression model was able to fit the hypothesis.

NOTE:

1. In linear regression:

y = 0 or 1

but

hΘ(x) can be greater than 1( hΘ(x) > 1) or less than 0( hΘ(x) < 0 ).

2. In logistic regression:

y = 0 or 1

and

hΘ(x) is also between 0 and 1, i.e., 0 <= hΘ(x) <= 1.

So, to conclude in classification problems, linear regression is not a good choice, only logistic regression should be preferred.


SUMMARY:

lWhat is Classification Algorithm?
lLogistic Regression
lHypothesis of Logistic Regression
lInterpretation of output in Logistic Regression
lLinear Regression vs Logistic Regression

That's all for DAY 6. In Day 7 we will be learning about the working of the logistic 
regression and multi-class linear regression.

If you like this blog or if you think it can help someone gain some knowledge in
machine learning then do share it with them. If you have any doubt mention them in
the comment section.

Till then Happy Learning!!!


No comments:

Post a Comment