Saturday, March 31, 2018

MACHINE LEARNING | LINEAR REGRESSION

MACHINE LEARNING - DAY 2



LINEAR REGRESSION

Notations:

m: number of training sets

x: input variable/feature

y: output variable/ target variable


( x(i) , y(i)): ith training set

Linear regression is mostly used in supervised learning and with the given data-set, our aim is to learn a function or a hypothesis  h: x -> y, so that h(x) is a “good” predictor for the corresponding value of y.





Hypothesis for linear regression:

hΘ(x) = Θ0 + Θ1x
Or

h(x) = Θ0 + Θ1x

Where,

h(x): hypothesis for the problem

Θ0: constant or the intercept

Θ1: the slope of the line

COST FUNCTION

The accuracy of the prepared hypothesis can be found out by using a cost function. This takes an average difference of all the results of the hypothesis with the inputs from x and the actual output y.

Given below is the required cost function:



The function is called Squared Error Function.

Now, hΘ(x(i)) - y(i)) is the difference between the predicted results for the input x and the real output y. Taking summation of it will provide the total difference between the predicted output and the real output.

Here 1/2 is taken to simplify the calculations which we will see in gradient descent.


AIM: To minimize the cost function J(Θ0Θ1)

Hypothesis : hΘ(x) = Θ0 + Θ1x

Cost function :



To minimize the cost function or to get the best fit the line should pass through all the values in the result set. In such a case

J(Θ0, Θ1) = 0

as the distance between the prediction value and the actual value is zero.

Note: for Θ0 , Θ1 and J(Θ0, Θ1) we plot contour plots as they are 3D plots and hence are used to plot 3 values. The smallest circle in the contour plot, when shown in 2D, depicts the global minimum which is the perfect fit for the given hypothesis.

Now the question arises how to find the accurate (Θ0, Θ1) values??

The solution for the above question is a technique called Gradient Descent which is our next topic.

GRADIENT DESCENT

Gradient Descent is used to find the Θi values for i = 0,1,….n to minimize the cost function J(Θi).

Formula of Gradient Descent :

 


Functioning of Gradient Descent:




Note: If α is too small then it will take a lot of time to converge to the global minimum.

Note: If α is too large then instead of converging to the global minimum it will start diverging.

So the choice of α i.e., the learning rate is very important.

Now since for every next value of (Θ0, Θ1) the gradient descent algorithm is executed and hence the partial derivative is taken which leads to updating the values of (Θ0, Θ1) simultaneously after each iteration.




Now let’s see how the parameter i.e.,(Θ0, Θ1) values are found



In the first case, the point A has a positive slope which depicts the value of partial derivative of J(Θ0, Θ1) and so the gradient formula decreases the value of (Θ0, Θ1) as

Θ0 - (+ve) = decrease in value of Θ0

Θ1 - (+ve) = decrease in value of Θ1

and finally it will reach to point B which is the global minimum.

Similarly, in the second case, point A has a negative slope which depicts the value of partial derivative of J(Θ0, Θ1) and so the gradient formula increases the value of (Θ0, Θ1) as

Θ0 - (-ve) = increase in value of Θ0

Θ1 - (-ve) = increase in value of Θ1

and finally it will reach to point B which is the global minimum.



As the point reaches to point B, the derivative will come 0 because the actual output and the predicted output are same and hence derivative of a constant is 0, so the value of (Θ0, Θ1) will not change any further and that’s the required values for our parameters.

In this way gradient descent help in figuring out which value set for (Θ0, Θ1) would suit the hypothesis for the best fit.

This process is also called Batch Processing since for every computation for (Θ0, Θ1), the process looks upon the entire batch of data until and unless it finds the global minimum.



That's all for day 2. Next we will learn about linear regression with multiple variables in DAY 3.
      
If you feel this article helped you in any way do not forget to share and if you have any thoughts or doubts upon it do write them in the comment section.

Till then Happy Learning..

Friday, March 30, 2018

Machine Learning | Types of Machine Learning

 MACHINE LEARNING - DAY 1



In today's world hard coding is something which people don't prefer. Automation is taking up the business and everyone needs work to be done in a perfect manner. Everyone wants to know the outcome or the results even before executing anything and this is where machine learning comes in handy. It is a technique of teaching a machine about what needs to be done and it generates the result according to which future decisions are made to prevent loss of reputation as well as income. For eg. it can predict what does user wants by learning the past experiences or choices of users, a famous example for this is Netflix uses this technique to show recommendation for different series based upon your preferences.

So let's begin this journey of machine learning by asking the very basic question which is:

What is Machine Learning?


Arthur Samuel one of the pioneers of machine learning defined it as "the field of study that gives a computer the ability to learn without being explicitly programmed".

In 1998 Tom Mitchell another pioneer of machine learning came up with a more precise definition :

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if it's performance on T as measured by P, increases with experience E.

So in simpler words, if you need to predict some outcome then it can be done with machine learning. It learns with time, it gains some experience and according to that it produces an output whose effectiveness is measured by P(performance) without writing some hard code.

Hard code means telling exact steps to generate output.

Some of the applications of machine learning are:

  1.  Virtual Personal Assistants like Siri, Alexa
  2.  Video surveillance
  3.  Email spam and malware filtering etc.

After having some knowledge of what machine learning is let's learn what are :

Types of Machine Learning


1. Supervised Learning: Supervised learning means feeding the machine with some pre-defined data with well-defined labels. It contains a lot of data over which the machine learns the behavior of the users and hence based upon the learning from the data predicts the output for new input. Some supervised algorithms are linear regression, Naive Bayes.
     
        Characteristics:
  • Predictive model
  • Labelled data
  • Well- defined data
  • Outcomes are based on the data-set provided to the machine. 

2. Unsupervised Learning: Unlike supervised learning in this category of algorithms the machine is fed with a data-set including random data with no labels and the machine has to learn a way to find some similarity between the data and according to that cluster the data into different categories. Mostly the real-time data is random and hence unsupervised learning algorithms are widely used. Some unsupervised learning algorithms are k-means clustering, association rules.

        Characteristics:
  • Descriptive model
  • Find patterns 
  • Random data with no labels.

3. Semi-Supervised Learning: Unlike the previous two techniques with complete labelled data and with no labelled data this data-set contain both labelled data and non-labelled data. Hence it is between the supervised and unsupervised learning. To produce labelled data it requires skilled human interaction and hence in semi supervised learning the cost for that decreases non-labelled data when combined with labelled data increases the accuracy of the model.Semi-supervised learning may refer to either transductive or inference learning. The  aim of transductive learning is to infer labels for the non-labelled data and of inference learning is to predict an effective mapping between variables.Some of semi-supervised learning algorithms are classification algorithms.

        Characteristics:
  • Mixed form of data (label + non-label)
  • Classification based algorithms

4. Reinforcement Learning: Reinforcement learning algorithms are those algorithms which learn through trial i.e., the algorithm perform a task various times and then learn how to react to it by experience. Using this the machine is made to make specific decisions. It uses it's past experience and captures the best possible knowledge for making business decisions. for eg, a machine can play Mario game by itself by learning each frame and the obstacles in it and how to tackle them. It will take several attempts to learn these. Machine will fail each time it encounter a new obstacle but eventually it will learn to avoid each obstacle and finally it can play the entire game by itself.




That's all for day 1. Next we will learn about hypothesis, cost function and linear regression with one variable in day 2 : right here
      
If you feel this article helped you in any way do not forget to share and if you have any thoughts or doubts upon it do write them in the comment section.

Till then Happy Learning..