In my post on Categorising Deep Seas of ML, I introduced you to problems of Classification (a subcategory of Supervised Learning).
But wait..we are talking Logistic “Regression”. Blame history for it, but the only thing common between Logistic Regression and Regression is the word itself.
Logistic Regression Intuition
Consider a problem where you have to find the probability of a student being studious one or goofy one.
What can we do?
Data..key to every solution..yeah
So, we grade our test students on certain basis. Lets just say we rate them out of 10 on following points – study hrs/day(x1), attention in class(x2), interaction in class(x3), behaviour with peers(x4)..well for sake of simplicity lets just take these 4 features.
Representing this in a matrix
Now, you might be thinking – Hey, I have seen probability in my mathematics class and I know it always lies in range [0, 1].
Hold your horse my friend. We are getting to that very step.
The world of ML has taken many inputs from the field of Mathematics and perhaps this very part of activation function is taken entirely from the latter field.
A function used to transform the activation level of a unit(neuron) into an output signal is called an Activation Function.
Well, we’re going a little off from our topic here. But you can think of activation function as a function which provides us with the probability of our test being positive. In this case, it gives us the probability of a student being studious.
The function we’ll be using here is the Sigmoid function.
Lets just have a look at the graph of the function.
The graph clearly depicts that for any value of z, sigmoid function will return us a value between 0 and 1….MIND == BLOWN.. (“==” because for Programmers “=” != “==”).
So what’s left?
The only thing left for us to do is to define a mapping from our test data to z in sigmoid(z) and “minimize the error” in the mapping to get the best result.
“Minimize the error”..hmm..we have done something similar in Linear Regression too. Gradient Descent is the key.
So what’s our hypothesis? It will be nothing but
We’ll calculate these coefficients by minimizing our cost function.
The very basic step of Gradient descent is to find a Cost Function. I know all these functions are getting on your nerve so lets just depict these by using a flow chart. Stick with me and we’ll make it easy.
Our Cost function here will be :
Don’t worry you don’t have to memorise it. But lets just understand how this Cost function is implemented. Consider for our test data when y = 1 then our cost function is -log(h(theta)). The graph for the same is:
This shows that as the value of calculated hypothesis goes from 0 to 1(required value) our cost function decreases.
Now, consider when y = 0 then our cost function is -log(1-h(theta)). The graph for same is:
This shows that as the value of the calculated hypothesis goes from 1 to 0(required value) our cost function decreases. Pretty much what required.
Now, as we are familiar with our cost function lets just remember how Gradient Descent works.
Hmmmm..partial differentiation and our apparent hide and seek with it..Let me make your task easy.
So, now we know how to get our coefficients tuned and how to run our gradient descent.
What about making predictions?
Well, that’s easy! A student with higher probability of being a studious one is of course more studious. But how will I compute it?. Deciding a threshold is upto you. For me.. a student with a probability greater than or equal to 0.5 works just fine. I am a little lenient I know. 😉
So, now you have it. Every tiny detail of logistic regression.
Now I’ve a task for you. I’ll be providing you with a dataset and you have to apply logistic regression on your own. No worries though, my next post will explain my way of logistic regression on the same dataset.
Explanation of dataset: The provided dataset contains 4 columns, namely – ‘admit’, ‘rank’, ‘gpa’ and ‘gre’. When given the ‘rank’ of the college then the ‘admit’ shows whether the person is provided with the admit to the college(1) or not(0) provided he has a corresponding ‘gpa’ and ‘gre’ scores. Your task is to find a mapping from ‘rank’, ‘gre’ and ‘gpa’ to ‘admit’ so as to find whether a person will be admitted to college or not.