mlgrp4log.txt

<ujdcodr> : Welcome to the first session of the Winter Mentorship Program 2017<ujdcodr> : I'll wait for 5 mins<ujdcodr> : until then you can just reply "Connected" so that i know that the channel is working<Ravali> : Connected<dharma> : connected<Swastik> : connected<Shivani> : Connected<ujdcodr> : OK<ujdcodr> : Let's begin<ujdcodr> : Field of study that gives computers the ability to learn without being explicitly programmed<ujdcodr> : That's the layman's way of defining Machine Learning<ujdcodr> : You want an automaton to learn how to solve a task on its own by training it<ujdcodr> : But how do i phrase this is a more "mathematically aesthetic" way<ujdcodr> : Let's look at what a "well posed learning problem" is:<ujdcodr> : "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."<ujdcodr> : take a minute to read this and understand it<ujdcodr> : sounds very poetic doesn't it?<ujdcodr> : everyone done with reading the definition?<dharma> : ya<ujdcodr> : I'll explain this once everyone is done<Shivani> : yes<Swastik> : yes<Ravali> : done<Prabhjot> : Yes<ujdcodr> : so over here "T" is the problem you want to solve<ujdcodr> : "E" is what the learning program has "learnt" after conducting several tests on the given dataset<ujdcodr> : And your performance measure "P" is like a probability value<ujdcodr> : this value should improve(as in tend closer to 1 or 0) depending on the type of problem posed<ujdcodr> : Let me illustrate with an example<ujdcodr> : Suppose you've made a program that classifies your mail as Spam and Not Spam<ujdcodr> : and you run this on a dataset of a 100000 emails<ujdcodr> : this number can be arbitrary<ujdcodr> : T would be the task of of "correctly" classifying an email as SPAM or NOT SPAM<ujdcodr> : you must be very clear, the emails are already labeled(so we know as humans what kind of mail is and is not spam)<ujdcodr> : but the algorithm doesn't<ujdcodr> : So you know the right answer<ujdcodr> : and your algorithm should produce the right answer for a good "fraction" of the test cases <ujdcodr> : So the experience "E" over here will be the algorithm learning how an email is classified as SPAM or NOT SPAM<ujdcodr> : like i said, we are feeding the "right answers" from a lot of training examples<ujdcodr> : and the algo should figure out, "given a new email, should i classify it as spam or not spam"<ujdcodr> : has everyone followed upto now<ujdcodr> : you can raise doubts at any point of time<ujdcodr> : Is the definition of "a well defined problem" clear to all?<Ravali> : yeah<Swastik> : yep<ujdcodr> : because i'm going to pose a question after this<Prabhjot> : Yes<Shivani> : yes<dharma> : yes, little confusion in feeding the "right answers" from a lot of training examples<ujdcodr> : Your dataset will contain a bunch of emails<ujdcodr> : each one will be labeled SPAM or NOT SPAM<dharma> : ok<ujdcodr> : there are certain characteristics of an email that determine whether it's spam or not<ujdcodr> : think about how you normally would classify such an email<ujdcodr> : maybe a promotional offer from FLIPKART<ujdcodr> : is spam<ujdcodr> : an email from a friend isn't spam<ujdcodr> : both emails have different structure and characteristics<ujdcodr> : and the algo "learns" these characteristics<dharma> : ok got it<ujdcodr> : so the next time it receives an email, it'll do the classification for you<ujdcodr> : clear><dharma> : yes<ujdcodr> : so now for the question<ujdcodr> : In the given example can someone tell me what "P" is?<ujdcodr> : think about it<Prabhjot> : Whether it is spam or not spam<Shivani> : It's the probability that the machine will classify the email correctly as spam or not spam?<Ravali> : the number of correct classifications?<Swastik> : Probability that it classifies it correctly?<dharma> : fraction correct  answer by machin<ujdcodr> : That's correct, looks like you've already done the course. What are you here for then? XD XD <ujdcodr> : Jokes apart the performance measure "P" has to improve, right? So the only thing that can improve is the accuracy with which the algo determines whether an email is SPAM or NOT SPAM<ujdcodr> : and this is reflected in the Probability of the algo's answer matching the "right" answer<ujdcodr> : which is nothing but the fraction of correctly classified emails<ujdcodr> : So i hope this is clear to everyone?<Ravali> : yes<dharma> : yes<arvind97> : Yeah<Swastik> : yeah<Shivani> : yes<Prabhjot> : Yes<ujdcodr> : people who were late, logs of the session will be uploaded after the session(url will be shared on the whatsapp group)<VS> : Yes<ujdcodr> : So there are 2 types of "Learning" methodologies<ujdcodr> : One is Supervised Learning and the other is Unsupervised Learning<ujdcodr> : The difference is very intuitive<ujdcodr> : The example i just demonstrated was that of Supervised learning<ujdcodr> : The dataset was already labeled with the "right"and "wrong" answers<ujdcodr> : so this helped the algorithm learn how to predict easily <Vikram> : What does algo do if emails are already labelled and what actually accuracy mean then?<ujdcodr> : The algorithm learns what the characteristics of an email classified as SPAM<ujdcodr> : it works just like a human<ujdcodr> : a promotional email from flipkart is obviously different in structure than an email from your friend<ujdcodr> : the algorithm is "trained" by the labeled dataset<Shivani> : What if I repeatedly report a friend's email as spam? Does that give algo experience to mark it as spam?<ujdcodr> : now when i throw a brand new "unlabeled" email at my algorithm, it should be able to tell whether the email is SPAM or NOT SPAM based on it characteristic features(which it learnt from the dataset)<ujdcodr> : Yes there are many factors as to classifying an email as spam(not just one or two)<Shivani> : Okay.<dharma> : what kind of parameters(characteristic feature) does the algorithm uses to learn in this example?<ujdcodr> : the algorithm will note that an email received from "XYZ@gmail.com" is labelled as spam most of the time <ujdcodr> : so i know that when i receive an email from this email id, it "should" go to the SOPAM folder<dharma> : okey it will only look at the email id not the contents like images attached ... right??<ujdcodr> : @dharma i don't make the algorithms, but email address, frequency and a particular style of header could be a few of the many parameters involved<dharma> : okey got it<ujdcodr> : And that dataset is fed in by humans<ujdcodr> : so the algo will learn to filter these emails "almost" like us<ujdcodr> : clear<ujdcodr> : ?<dharma> : yes<ujdcodr> : good<ujdcodr> : take a look at this image<ujdcodr> : https://imagebin.ca/v/351HTZbN11P0<ujdcodr> : This is dataset of determining whether a patient's cancerous tumor is harmless(benign) or harmful(malignant)<ujdcodr> : so we say 0 as good(blue circles) and 1 as bad(red crosses)<ujdcodr> : everyone has seen the image?<Ravali> : yeah<Shivani> : yes<dharma> : yes<ujdcodr> : This is another supervised learning example<Prabhjot> : Yes<Vikram> : Yes<arvind97> : yes<VS> : Yes<ujdcodr> : and the parameter we are concerned with is the tumor size<ujdcodr> : so here we have the "right" and "wrong" answers<ujdcodr> : imagine a dataset which has only blue circles<ujdcodr> : we have simply graphed each data element, not labeled it<ujdcodr> : the algo has to a lot to figure out<ujdcodr> : on its own<ujdcodr> : and this is more practical,but obviously harder to achieve<ujdcodr> : That's Unsupervised learning and Classification into two classes(0 and 1) will fall under logistic regression<ujdcodr> : both of which will be taken up in tomorrow's session<ujdcodr> : But you've all come here for Linear Regression<ujdcodr> : and that's what we're gonna do<ujdcodr> : Can someone give a rough intuition about what "linear regression" is?<ujdcodr> : i mean...just by looking at the two words<ujdcodr> : don't go full "Andrew Ng" on me XD<ujdcodr> : Use this link and try to figure it out<ujdcodr> : https://imagebin.ca/v/35160xtB7Exw<ujdcodr> : anyone?<ujdcodr> : You see the image right?<Vikram> : Price as a  function of size! I guess<Prabhjot> : By taking the price as y and size as x we develop a linear relation ad y= mx+c so for any new x u can find y<ujdcodr> : Can't get a better intuition than that i guess. You're right @Prabhjot<Vikram> : But y can be like y^2=4ax<dharma> : linear regression is finding a straight line from which the variance of data set is minimum right?  <ujdcodr> : @Vikram that falls under polynomial regression<ujdcodr> : liner regression => fitting a straight line through the data<ujdcodr> : or atleast trying to<ujdcodr> : https://imagebin.ca/v/351MTuyXRhkK<ujdcodr> : the given link is an example dataset<ujdcodr> : x is a parameter, y is the output<ujdcodr> : the learning algorithm's aim is to look at this vast dataset and derive a linear relationship between the area of a house and it's cost price <ujdcodr> : that m=47 is the no. of training examples<ujdcodr> : now our job is to generate a hypothesis<ujdcodr> : what do i mean by that?<ujdcodr> hθ(x) = θ0 + θ1x* ad has quit (Remote host closed the connection)<ujdcodr> surprised?<ujdcodr> don't be<Vikram> No it just a linear function<ujdcodr> it's just y=mx+C<ujdcodr> yep* ad (~ad@223.176.16.251) has joined #mlgrp4<ujdcodr> where theta are your parameters<ujdcodr> you can have multiple parameters<ujdcodr> but for the sake of simplicity let's just start off with 2<ujdcodr> We initialize θ0 and θ1 to 0 initially* ad has quit (Remote host closed the connection)<ujdcodr> and the algorithm tries to learn from the dataset what is the optimal value for these 2 parameters by slightly modifying their values and seeing whether the resulting line fits the dataset<ujdcodr> It's just that the way these "adjustments" are made is not as simple as hit and trial<ujdcodr> We use a technique called Gradient descent<ujdcodr> Let's look at why we're doing all this<ujdcodr> you know that we have a "labeled" dataset<ujdcodr> every house with a given area has an associated price<ujdcodr> The hypothesis h(x) is trying to be as close to the expected value(y in this case)<ujdcodr> so essentially you're trying to minimize h(x) - y<ujdcodr> and this is done by adjusting the parameters θ0 and θ1<ujdcodr> everyone clear upto this point?<Ravali> yeah<Prabhjot> Yes<Shivani> yes<VS> Yes<ujdcodr> good<Vikram> Yes<dharma> yes<ujdcodr> let us define the cost J = h(x) - y<ujdcodr> so if you still remember your 12th grade calculus aren't we just trying to solve a minimization problem?<ujdcodr> Because we know x and y, the only things that can vary are the theta values<ujdcodr> take a look at this link<ujdcodr> https://imagebin.ca/v/351Rl7egC0hq<ujdcodr> This is the actual cost function we are using<ujdcodr> we're trying to reduce the mean square error over hear<ujdcodr> That's as far as i will go with the math<ujdcodr> but that's pretty much all you need<ujdcodr> can anyone tell me what the graph of J vs θ looks like?<ujdcodr> : just look at the function without the summation<Vikram> : 3-d parabola because it is quadratic function <Vikram> : And 2 input variable <ujdcodr> : ok assume it was just a one input variable<ujdcodr> : then?<ujdcodr> : i want others to answer<Shivani> : Parabola<ujdcodr> : more specifically a downward parabola(bowl shaped)<ujdcodr> : think of the equation y=x^2<ujdcodr> : y takes place of J, x is(h(x) -y)<ujdcodr> : it's important that you guys visualize this<ujdcodr> : shall i proceed? we should be done in about 10-15 mins<ujdcodr> : or maybe lesser<Shivani> : yes<Prabhjot> : Yes<Vikram> : Yes <Swastik> : Yes<Ravali> : yes<ujdcodr> : so we assign random values for θ0 and θ1 at he beginning<ujdcodr> : this will gives a random value for J<ujdcodr> : somewhere on this parabola<ujdcodr> : now how do we minimize a function?<ujdcodr> : anyone?<Prabhjot> : By gradient descent?<Vikram> : Differentiate it w.r.t to variable and equate it to zero in case of 1 variable or by partial derivative in case of 2 variable<ujdcodr> : @Vikram is correct<ujdcodr> : 12th grade calculus, basic minimization problem<ujdcodr> : so we have to compute partial derivatives of J(θ) w.r.t θ0 and θ1<ujdcodr> : and simultaneously update those values<ujdcodr> : View the image: https://imagebin.ca/v/351Wdf5h44YZ<ujdcodr> : You're wondering what this "alpha" is doing here<ujdcodr> : This is gradient descent in action<ujdcodr> : i am computing the most optimum value for θ0 and θ1<ujdcodr> : the new values of the parameters "should" give a smaller value for J right?<ujdcodr> : everyone following?<Vikram> : Yes<Prabhjot> : Yes<Ravali> : yeah<Shivani> : yes<Swastik> : yes<ujdcodr> : we have only taken one step in our descent and we can definitely do better<ujdcodr> : hence this sequence of θ0 and θ1 updates are run in a loop<ujdcodr> : and this loop runs until we converge to the global minimum of this cost function J<ujdcodr> : which in this case is the bottom of that parabola<ujdcodr> : In short, run the update until convergence<ujdcodr> : where alpha is the "size" of the step we'll take during gradient descent<ujdcodr> : is alpha is very small, your code will take a lot of iterations to converge<ujdcodr> : if it's too big, you will overshoot your global minimum and may never converge<ujdcodr> : the beauty of gradient descent is that the derivative term is like the tangent of the function at that point.<ujdcodr> : with every iteration the slope of the tangent becomes smaller(as you can visualize in the parabola)<ujdcodr> : since the slope is reducing gradually with every step of gradient descent, you will slowly and steadily converge to the global minimum<ujdcodr> : this is because the values of θ0 and θ1 are related to the partial derivative values linearly<ujdcodr> : so as the slope reduces at a slower rate<ujdcodr> : even the values of the parameters get updated at lower rates<ujdcodr> : makes sense?<ujdcodr> : this last image should make everything clear<ujdcodr> : https://imagebin.ca/v/351Ut0frsukQ<ujdcodr> : bottom line <ujdcodr> : Start with initial guesses<ujdcodr> :     Start at 0,0 (or any other value)<ujdcodr> :     Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)<ujdcodr> : Each time you change the parameters, you select the gradient which reduces J(θ0,θ1) the most possible <ujdcodr> : Repeat<ujdcodr> : Do so until you converge to a local minimum<ujdcodr> : And that concludes our discussion on linear regression in one variable using gradient descent<ujdcodr> : I hope everyone has understood everything explained upto this point<ujdcodr> : any doubts? I'll wrap up with an announcement<ujdcodr> : a 3-d view https://imagebin.ca/v/351bhROgoqvo<ujdcodr> : I'm assuming everyone is clear(and probably saturated). So our first session comes to a close<ujdcodr> : You can look up the Andrew Ng course videos to now more about regression using multiple variables<ujdcodr> : and also polynomial regression<ujdcodr> : The announcement is that, we're trying to wrap up this program as fast as possible, so we will have sessions everyday for the next 5 days<ujdcodr> : Tomorrow Vadiraja will be taking up Logistic regression<Shivani> : Time?<ujdcodr> : This might have been a long session, but i hope you have made your foundations in Machine Learning solid<ujdcodr> : @Shivani that's up to Vadiraja<Shivani> : okay.<ujdcodr> : but it'll mostly be around the same time<Shivani> : That would be great if it's at same time.<ujdcodr> : if you want we can shift the sessions to 7 pm daily(considering how long they're taking)<ujdcodr> : that can be discussed later during daytime<ujdcodr> : For now, thank you for attending your first session on Machine Learning. Hope you enjoyed it<Ravali> : Thank you<Shivani> : Thanks <Swastik> : Thank you<Prabhjot> : Thank you<VS> : Thanks<Vikram> : Thanks