MACHINE LEARNING - DAY 8
LOGISTIC REGRESSION COST FUNCTION AND ANALYSIS
For the basics, you can check the earlier articles.
Terms used in this article can be understood from:
Continuing our learning in logistic regression today we’ll learn about the cost function in logistic regression and make it efficient to be able to find proper parameter values for the model.
COST FUNCTION:
Cost function as in linear regression is used to compute the parameter, Θi, values automatically which will give the best fit for a given model.
The graph of Θ and number of iterations should always be a convex curve to achieve a global minima.
Training set: {(x1,y1),(x2,y2),…..,(xm,ym)} (Number of examples: m)
x = [x0;x1;x2;….;xn] x0 = 1, y ∈ {0,1}
The dimension of the x matrix is n+1 x 1.
hΘ(x) = 1/(1+(e^(-ΘTX))
How to choose parameter value for Θi :
In linear regression the cost function was
n
J(Θ)=1/m ∑(1/2)( hΘ(xi) − yi)2
i=1
n
J(Θ) = 1/m ∑ cost( hΘ(xi) − yi)
i=1
cost( hΘ(xi) − yi) = 1/2 ( hΘ(xi) − yi)2
The only difference between the cost function of linear regression and logistic regression is that of the hypothesis. The hypothesis in logistic regression is:
hΘ(x) = 1/(1+(e^(-ΘTX))
This will give a non- convex graph or output curve which will lead to multiple local optima.
To prevent the non- convex output and get a convex output with a single optima i.e., global optima, we make some alterations in the cost function for logistic regression.
cost( hΘ(x) − y) = { - log(hΘ(x)), if y = 1} and { - log(1 - hΘ(x)), if y = 0}
For simplification the cost function can be written in a single line as,
cost( hΘ(x) − y) = - [y * log(hΘ(x)) + (1 - y) * log(1 - hΘ(x))]
for y = 0,
cost( hΘ(x) − y) = - [0 * log(hΘ(x)) + (1 - 0) * log(1 - hΘ(x))]
cost( hΘ(x) − y) = - [0 + (1) * log(1 - hΘ(x))] ≈ - log(1 - hΘ(x))
for y = 1,
cost( hΘ(x) − y) = - [1 * log(hΘ(x)) + (1 - 1) * log(1 - hΘ(x))]
cost( hΘ(x) − y) = - [log(hΘ(x)) + 0] ≈ - log(hΘ(x))
Hence, it gives the same equations as we have seen earlier.
GRADIENT DESCENT:
Gradient descent helps in iterating the equation until and unless a global minima is achieved.
Compute:
Notice, the gradient descent is also similar to the one used in the linear regression. The only difference here also is of the hypothesis used in linear regression and logistic regression which is:
hΘ(x) = 1/(1+(e^(-ΘTX))
That’s all for day 8. Today we learned about the cost function in logistic regression and how to make it efficient to attain a global minima for our parameters values to obtain the perfect fit for a model. We also learned the gradient descent for logistic regression.
In day 9, we will be learning about the Multi-Class Classification problem which includes more possible outcomes than 0 and 1.
If you think this article helped you in learning something new or can help someone then do share this article among others.
Till then Happy Learning!!!
Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more. which are the three required parts of a text ad
ReplyDeleteLearn the essential skills of the Machine Learning industry as an integral part of the AI Patasala advanced Machine Learning Course in the Hyderabad program.
ReplyDeleteMachine Learning Certification in Hyderabad