Machine Learning- A Student Guide: machine learning

Showing posts with label machine learning. Show all posts

Sunday, May 6, 2018

MACHINE LEARNING | MULTICLASS CLASSIFICATION

MACHINE LEARNING - DAY 9

MULTI-CLASS CLASSIFICATION: ONE-VS-ALL

For the basics, you can check the earlier articles.

Terms used in this article can be understood from:

DAY 6: CLASSIFICATION AND LOGISTIC REGRESSION

Continuing our learning in machine learning today we’ll learn about the multi-class classification in logistic regression also known as one vs all.

Till now we have discussed about the 2 classification possibilities or 2 outcomes i.e., 1 or 0. Now, let’s see what happens when there are more number of possibilities.

for eg.,

lWeather: sunny, rainy, pleasant, windy

The outcome or the categorical value can be: 0, 1, 2, 3

lHealth: ill, dizzy, well

The outcome or the categorical value can be: 0, 1, 2

The numbering doesn’t matter. It can be 1,2,3,4 or 0,1,2,3. These are just values which categorizes the given data or output into different categories.

y ∈ {0,1,2…,n}

h_Θ⁽⁰⁾(x)= P(y = 0 | x; Θ )

h_Θ⁽¹⁾(x)= P(y = 1 | x; Θ )

h_Θ⁽ⁿ⁾(x)= P(y = n | x; Θ )

prediction : max(h_Θ⁽ⁱ⁾(x))

STEPS OF COMPUTATION:

1. Plot the data

2. Take the classes one by one and rest of the 2 classes will behave as a single class or category. The probability of the single class is calculated in this way.

For eg,

CONCLUSION:

Train a logistic regression h_Θ(x) for each class to predict the probability that y = i.

To make a prediction on a new x, pick the class that maximizes h_Θ(x) and that will be the output.

That’s all for day 9. Today we learned about the multi-class classification and how to compute it.

In day 10, we will be learning about the issue known as Overfitting which originates due to over-training of the model. The solution for this issue is Regularization which we’ll also cover in the next article.

If you think this article helped you in learning something new or can help someone then do share this article among the peers.

Till then Happy Learning!!!

MACHINE LEARNING | LOGISTIC REGRESSION COST FUNCTION

MACHINE LEARNING - DAY 8

LOGISTIC REGRESSION COST FUNCTION AND ANALYSIS

For the basics, you can check the earlier articles.

Terms used in this article can be understood from:

DAY 2: Linear Regression

DAY 6: Classification and Logistic Regression

DAY 7: Decision Boundary for Logistic Regression

Continuing our learning in logistic regression today we’ll learn about the cost function in logistic regression and make it efficient to be able to find proper parameter values for the model.

COST FUNCTION:

Cost function as in linear regression is used to compute the parameter, Θ_i, values automatically which will give the best fit for a given model.

The graph of Θ and number of iterations should always be a convex curve to achieve a global minima.

Training set: {(x¹,y¹),(x²,y²),…..,(x^m,y^m)} (Number of examples: m)

x = [x₀;x₁;x₂;….;x_n] x₀= 1, y ∈ {0,1}

The dimension of the x matrix is n+1 x 1.

h_Θ(x) = 1/(1+(e^(-Θ^TX))

How to choose parameter value for Θ_i:

In linear regression the cost function was

J(Θ)=1/m ∑(1/2)( h_Θ(xⁱ) − yⁱ)²

i=1

J(Θ) = 1/m ∑ cost( h_Θ(xⁱ) − yⁱ)

i=1

cost( h_Θ(xⁱ) − yⁱ) = 1/2 ( h_Θ(xⁱ) − yⁱ)²

The only difference between the cost function of linear regression and logistic regression is that of the hypothesis. The hypothesis in logistic regression is:

h_Θ(x) = 1/(1+(e^(-Θ^TX))

This will give a non- convex graph or output curve which will lead to multiple local optima.

To prevent the non- convex output and get a convex output with a single optima i.e., global optima, we make some alterations in the cost function for logistic regression.

cost( h_Θ(x) − y) = { - log(h_Θ(x)), if y = 1} and { - log(1 - h_Θ(x)), if y = 0}

For simplification the cost function can be written in a single line as,

cost( h_Θ(x) − y) = - [y * log(h_Θ(x)) + (1 - y) * log(1 - h_Θ(x))]

for y = 0,

cost( h_Θ(x) − y) = - [0 * log(h_Θ(x)) + (1 - 0) * log(1 - h_Θ(x))]

cost( h_Θ(x) − y) = - [0 + (1) * log(1 - h_Θ(x))] ≈ - log(1 - h_Θ(x))

for y = 1,

cost( h_Θ(x) − y) = - [1 * log(h_Θ(x)) + (1 - 1) * log(1 - h_Θ(x))]

cost( h_Θ(x) − y) = - [log(h_Θ(x)) + 0] ≈ - log(h_Θ(x))

Hence, it gives the same equations as we have seen earlier.

GRADIENT DESCENT:

Gradient descent helps in iterating the equation until and unless a global minima is achieved.

Compute:

Notice, the gradient descent is also similar to the one used in the linear regression. The only difference here also is of the hypothesis used in linear regression and logistic regression which is:

h_Θ(x) = 1/(1+(e^(-Θ^TX))

That’s all for day 8. Today we learned about the cost function in logistic regression and how to make it efficient to attain a global minima for our parameters values to obtain the perfect fit for a model. We also learned the gradient descent for logistic regression.

In day 9, we will be learning about the Multi-Class Classification problem which includes more possible outcomes than 0 and 1.

If you think this article helped you in learning something new or can help someone then do share this article among others.

Till then Happy Learning!!!

Friday, May 4, 2018

MACHINE LEARNING | CLASSIFICATION AND LOGISTIC REGRESSION

MACHINE LEARNING - DAY 6

CLASSIFICATION AND LOGISTIC REGRESSION MODEL

Before moving to today's topic, if you haven't checked the earlier topics which are the basics then visit the following links:

DAY 1: Machine Learning and Types of Machine Learning Algorithms

DAY 2: Linear Regression

DAY 3: Multiple Linear Regression

DAY 4: Gradient Descent and Learning Rate

DAY 5: Normal Equations

CLASSIFICATION:

E-mail : spam/ not spam

Online transactions: Fraudulent(yes/no)

These two are examples of where classification algorithms are used.

y belongs to {0,1}

0: negative class(absence of something or some value eg. not spam mail).

1: positive class(presence of something or some value eg. spam email).

0 and 1 are classification levels or indicators, it doesn’t matters which class they belong to. No hard rules.

LOGISTIC REGRESSION MODEL:

A classification algorithm which helps in providing the probability of the occurrence of an event or the probability of non-occurrence of an event.

It gives the output in 1 or 0.

HYPOTHESIS:

In linear regression the hypothesis was:

h_Θ(x) = Θ^TX

In logistic regression the hypothesis is:

h_Θ(x) = g(Θ^TX)

where,

g(z) = 1/(1+(e^(-z))

z = Θ^TX

1/(1+(e^(-z)) is called the sigmoid function or the logistic function.

Therefore, the hypothesis for the logistic regression is:

h_Θ(x) = 1/(1+(e^(-z))

INTERPRETATION OF HYPOTHESIS OUTPUT

h_Θ(x) = estimated probability that y = 1 on input x.

Eg. If x = [x₀ = [1

x₁] tumorsize]

h_Θ(x) = 0.7, y = 1

It means that there is a chance of 70% that the patient has a malignant tumor.

It can be written in probability as

h_Θ(x) = P(y = 1 | x; Θ )

P(y = 0 | x; Θ) = 1 - P(y = 1 | x; Θ )

It is read like ‘probability that y = 1, on a given x, parameterized by Θ.

LINEAR REGRESSION VS LOGISTIC REGRESSION

For now, let’s discuss classification algorithm for 2 possible outcomes i.e., 0 and 1. We’ll see classification for multiple outcomes i.e., 0,1,2,3 etc. in the upcoming blogs.

Example for classification problem:

Parameters can be threshold classifier output h_Θ(x) at 0.5.

if h_Θ(x) >= 0.5, predict y=1,

if h_Θ(x) < 0.5, predict y=0.

In the above example the linear regression model fits nicely but what if there is a point a bit far towards positive x-axis as shown in the figure below,

the fit gets changed and is not good now. In the previous example, it was just a matter of luck or chance that regression model was able to fit the hypothesis.

NOTE:

1. In linear regression:

y = 0 or 1

but

h_Θ(x) can be greater than 1( h_Θ(x) > 1) or less than 0( h_Θ(x) < 0 ).

2. In logistic regression:

y = 0 or 1

and

h_Θ(x) is also between 0 and 1, i.e., 0 <= h_Θ(x) <= 1.

So, to conclude in classification problems, linear regression is not a good choice, only logistic regression should be preferred.

SUMMARY:

lWhat is Classification Algorithm?

lLogistic Regression

lHypothesis of Logistic Regression

lInterpretation of output in Logistic Regression

lLinear Regression vs Logistic Regression

That's all for DAY 6. In Day 7 we will be learning about the working of the logistic
regression and multi-class linear regression.

If you like this blog or if you think it can help someone gain some knowledge in
machine learning then do share it with them. If you have any doubt mention them in
the comment section.

Till then Happy Learning!!!