A Friendly Introduction to Machine Learning

hi and welcome to the machine learning

hi and welcome to the machine learning peek of the grief and audacity so we’re

peek of the grief and audacity so we’re

peek of the grief and audacity so we’re going to talk about today is what is

going to talk about today is what is

going to talk about today is what is machine learning well this is the world

machine learning well this is the world

machine learning well this is the world and in the world we add humans and we

and in the world we add humans and we

and in the world we add humans and we got computers and one of the main

got computers and one of the main

got computers and one of the main differences between humans and computers

differences between humans and computers

differences between humans and computers is that humans learn from past

is that humans learn from past

is that humans learn from past experience whereas computers need to be

experience whereas computers need to be

experience whereas computers need to be told what to do need to be programmed so

told what to do need to be programmed so

told what to do need to be programmed so they follow instructions now the

they follow instructions now the

they follow instructions now the question is can we get computers to

question is can we get computers to

question is can we get computers to learn from experience too and the answer

learn from experience too and the answer

learn from experience too and the answer is yes we can and that’s precisely with

is yes we can and that’s precisely with

is yes we can and that’s precisely with machine learning is of course for

machine learning is of course for

machine learning is of course for computers fast experiences have a name

computers fast experiences have a name

computers fast experiences have a name called data so in the next few minutes

called data so in the next few minutes

called data so in the next few minutes I’m going to show you a few examples in

I’m going to show you a few examples in

I’m going to show you a few examples in which we can teach the computer how to

which we can teach the computer how to

which we can teach the computer how to learn from previous data and most

learn from previous data and most

learn from previous data and most importantly I’m going to show you that

importantly I’m going to show you that

importantly I’m going to show you that these algorithms are actually pretty

these algorithms are actually pretty

these algorithms are actually pretty easy and the machine learning is really

easy and the machine learning is really

easy and the machine learning is really nothing to fear so let’s go to the first

nothing to fear so let’s go to the first

nothing to fear so let’s go to the first example let’s say we’re studying the

example let’s say we’re studying the

example let’s say we’re studying the housing market on our task is to predict

housing market on our task is to predict

housing market on our task is to predict the price of a house given its size so

the price of a house given its size so

the price of a house given its size so we have a small house that cost $70,000

we have a small house that cost $70,000

we have a small house that cost $70,000 we have a big house that cost one

we have a big house that cost one

we have a big house that cost one hundred and sixty thousand dollars and

hundred and sixty thousand dollars and

hundred and sixty thousand dollars and we’d like to estimate the price of this

we’d like to estimate the price of this

we’d like to estimate the price of this medium-sized house here so how do we do

medium-sized house here so how do we do

medium-sized house here so how do we do it well first put them in a grid where

it well first put them in a grid where

it well first put them in a grid where the x-axis represents the size of the

the x-axis represents the size of the

the x-axis represents the size of the house and square feet and the y-axis

house and square feet and the y-axis

house and square feet and the y-axis represents the price of the house and

represents the price of the house and

represents the price of the house and dollars and so to help us out we have

dollars and so to help us out we have

dollars and so to help us out we have collected some previous data in the form

collected some previous data in the form

collected some previous data in the form of these blue dots these are other

of these blue dots these are other

of these blue dots these are other houses that we’ve looked at and we’ve

houses that we’ve looked at and we’ve

houses that we’ve looked at and we’ve recorded their prices with respect to

recorded their prices with respect to

recorded their prices with respect to their size so in this graph we can see

their size so in this graph we can see

their size so in this graph we can see that the small house is priced $70,000

that the small house is priced $70,000

that the small house is priced $70,000 and the big house is priced at a hundred

and the big house is priced at a hundred

and the big house is priced at a hundred and sixty thousand dollars so now it’s

and sixty thousand dollars so now it’s

and sixty thousand dollars so now it’s time for a small quiz what do you think

time for a small quiz what do you think

time for a small quiz what do you think is the best guess for the price of the

is the best guess for the price of the

is the best guess for the price of the medium house given this data would it be

medium house given this data would it be

medium house given this data would it be a thousand dollars one hundred and

a thousand dollars one hundred and

a thousand dollars one hundred and twenty thousand dollars or one hundred

twenty thousand dollars or one hundred

twenty thousand dollars or one hundred and ninety thousand dollars well to help

and ninety thousand dollars well to help

and ninety thousand dollars well to help us out we can see that these blue points

us out we can see that these blue points

us out we can see that these blue points kind of form a line so we can draw the

kind of form a line so we can draw the

kind of form a line so we can draw the line that best fits the data

line that best fits the data

line that best fits the data now in this line we can say that our

now in this line we can say that our

now in this line we can say that our best guess for the price of the house is

best guess for the price of the house is

best guess for the price of the house is this point over here which corresponds

this point over here which corresponds

this point over here which corresponds to one hundred and twenty thousand

to one hundred and twenty thousand

to one hundred and twenty thousand dollars so if you set one hundred and

dollars so if you set one hundred and

dollars so if you set one hundred and twenty thousand dollars that is correct

twenty thousand dollars that is correct

twenty thousand dollars that is correct this method is known as linear

this method is known as linear

this method is known as linear regression now you may ask how do we

regression now you may ask how do we

regression now you may ask how do we find this line well let’s look at a

find this line well let’s look at a

find this line well let’s look at a simple example this three points we’re

simple example this three points we’re

simple example this three points we’re going to try to find the best line that

going to try to find the best line that

going to try to find the best line that fits through those three points

fits through those three points

fits through those three points obviously best line is subjective while

obviously best line is subjective while

obviously best line is subjective while we try to find a line that works well

we try to find a line that works well

we try to find a line that works well since we’re teaching the computer how to

since we’re teaching the computer how to

since we’re teaching the computer how to do it computer can’t really eyeball the

do it computer can’t really eyeball the

do it computer can’t really eyeball the line so you have to get it to draw a

line so you have to get it to draw a

line so you have to get it to draw a random line and then see how bad this

random line and then see how bad this

random line and then see how bad this line is so in order to see how bad the

line is so in order to see how bad the

line is so in order to see how bad the line is we calculate the error so we’re

line is we calculate the error so we’re

line is we calculate the error so we’re gonna for calculate the error look at

gonna for calculate the error look at

gonna for calculate the error look at the lengths of the distances from the

the lengths of the distances from the

the lengths of the distances from the line to the three points and we’re just

line to the three points and we’re just

line to the three points and we’re just going to simply say that the error of

going to simply say that the error of

going to simply say that the error of this line is the sum of those three red

this line is the sum of those three red

this line is the sum of those three red lengths now what we’re going to do is

lengths now what we’re going to do is

lengths now what we’re going to do is move the line around and see if we can

move the line around and see if we can

move the line around and see if we can reduce this error so let’s say we moved

reduce this error so let’s say we moved

reduce this error so let’s say we moved in this direction and we calculate the

in this direction and we calculate the

in this direction and we calculate the error it’s given by the yellow distances

error it’s given by the yellow distances

error it’s given by the yellow distances we add them up and realize that we’ve

we add them up and realize that we’ve

we add them up and realize that we’ve increased the error so that’s not a good

increased the error so that’s not a good

increased the error so that’s not a good direction to go let’s try moving the

direction to go let’s try moving the

direction to go let’s try moving the other direction we move it here

other direction we move it here

other direction we move it here calculate the error now it’s given by

calculate the error now it’s given by

calculate the error now it’s given by the sum of these three green distances

the sum of these three green distances

the sum of these three green distances and we see that the error is smaller so

and we see that the error is smaller so

and we see that the error is smaller so we actually reduced it so let’s say we

we actually reduced it so let’s say we

we actually reduced it so let’s say we take that step we’re a little closer to

take that step we’re a little closer to

take that step we’re a little closer to our solution if we continue doing this

our solution if we continue doing this

our solution if we continue doing this procedure several times we will always

procedure several times we will always

procedure several times we will always be decreasing the error and we’ll

be decreasing the error and we’ll

be decreasing the error and we’ll finally arrive to a good solution in the

finally arrive to a good solution in the

finally arrive to a good solution in the form of this line this general procedure

form of this line this general procedure

form of this line this general procedure is known as gradient descent now in real

is known as gradient descent now in real

is known as gradient descent now in real life we don’t want to deal with negative

life we don’t want to deal with negative

life we don’t want to deal with negative distances corresponding to a point being

distances corresponding to a point being

distances corresponding to a point being on one or the other side of the line so

on one or the other side of the line so

on one or the other side of the line so what we do to solve this is add the

what we do to solve this is add the

what we do to solve this is add the square of the distance from the point to

square of the distance from the point to

square of the distance from the point to the line instead and this procedure is

the line instead and this procedure is

the line instead and this procedure is called least squares

so we’re going to cover in the census

so we’re going to cover in the census trying to the central mountain this is

trying to the central mountain this is

trying to the central mountain this is our Mountain Mount Everest this mounting

our Mountain Mount Everest this mounting

our Mountain Mount Everest this mounting the hi we are the larger error is so

the hi we are the larger error is so

the hi we are the larger error is so descending means reducing the error so

descending means reducing the error so

descending means reducing the error so what are we doing the credit the

what are we doing the credit the

what are we doing the credit the cinematic well look at our surroundings

cinematic well look at our surroundings

cinematic well look at our surroundings and try to figure out which way we can

and try to figure out which way we can

and try to figure out which way we can descend more for example here we can go

descend more for example here we can go

descend more for example here we can go in two directions to the right or to the

in two directions to the right or to the

in two directions to the right or to the left let’s go to the left then we’re

left let’s go to the left then we’re

left let’s go to the left then we’re going up insert error is ascending this

going up insert error is ascending this

going up insert error is ascending this is equivalent to moving the line

is equivalent to moving the line

is equivalent to moving the line downwards and getting farther from the

downwards and getting farther from the

downwards and getting farther from the three points but if we go to the right

three points but if we go to the right

three points but if we go to the right instead then we’re actually descending

instead then we’re actually descending

instead then we’re actually descending which means our error is decreasing this

which means our error is decreasing this

which means our error is decreasing this is equivalent to moving the line upwards

is equivalent to moving the line upwards

is equivalent to moving the line upwards and getting closer to the three points

and getting closer to the three points

and getting closer to the three points so we decide to take a step towards or

so we decide to take a step towards or

so we decide to take a step towards or right then we can start this procedure

right then we can start this procedure

right then we can start this procedure again and again and again until we

again and again and again until we

again and again and again until we successfully descend from the mountain

successfully descend from the mountain

successfully descend from the mountain this is equivalent to reducing the error

this is equivalent to reducing the error

this is equivalent to reducing the error until we find its minimum value which

until we find its minimum value which

until we find its minimum value which gives us the best line fit so you can

gives us the best line fit so you can

gives us the best line fit so you can think of linear regression as a painter

think of linear regression as a painter

think of linear regression as a painter and will look at your data and draw the

and will look at your data and draw the

and will look at your data and draw the best fitting line now this method is

best fitting line now this method is

best fitting line now this method is actually much stronger if the data

actually much stronger if the data

actually much stronger if the data doesn’t form a line with a very very

doesn’t form a line with a very very

doesn’t form a line with a very very similar method we can draw a circle

similar method we can draw a circle

similar method we can draw a circle through it or a parabola or even a

through it or a parabola or even a

through it or a parabola or even a higher degree curve for example the data

higher degree curve for example the data

higher degree curve for example the data here we can actually fit a cubic

here we can actually fit a cubic

here we can actually fit a cubic polynomial okay so let’s move to the

polynomial okay so let’s move to the

polynomial okay so let’s move to the next example in this example we’re going

next example in this example we’re going

next example in this example we’re going to build an email spam detection

to build an email spam detection

to build an email spam detection classifier so something that will tell

classifier so something that will tell

classifier so something that will tell us if an email is spam or not and how do

us if an email is spam or not and how do

us if an email is spam or not and how do we do this we do this by looking at

we do this we do this by looking at

we do this we do this by looking at previous data the previous data is 100

previous data the previous data is 100

previous data the previous data is 100 emails that we looked at already out of

emails that we looked at already out of

emails that we looked at already out of these 100 emails we have flagged 25 of

these 100 emails we have flagged 25 of

these 100 emails we have flagged 25 of them are spam and 75 of them is not spam

them are spam and 75 of them is not spam

them are spam and 75 of them is not spam now let’s try to think of features of

now let’s try to think of features of

now let’s try to think of features of spam emails may be likely to display and

spam emails may be likely to display and

spam emails may be likely to display and analyze these features so one feature

analyze these features so one feature

analyze these features so one feature could be containing the word cheap

could be containing the word cheap

could be containing the word cheap seems reasonable to think that an email

seems reasonable to think that an email

seems reasonable to think that an email containing the word cheap is likely to

containing the word cheap is likely to

containing the word cheap is likely to be spam so let’s analyze this claim we

be spam so let’s analyze this claim we

be spam so let’s analyze this claim we look for the word cheap in all these 100

look for the word cheap in all these 100

look for the word cheap in all these 100 emails and find that 20 out of spam

emails and find that 20 out of spam

emails and find that 20 out of spam loads and 5 out of the non spam ones

loads and 5 out of the non spam ones

loads and 5 out of the non spam ones contain that word so we can forget about

contain that word so we can forget about

contain that word so we can forget about all the rest of the emails and focus

all the rest of the emails and focus

all the rest of the emails and focus only on the ones that contain the word

only on the ones that contain the word

only on the ones that contain the word cheap okay so time for a quiz here’s the

cheap okay so time for a quiz here’s the

cheap okay so time for a quiz here’s the question based on our data if an email

question based on our data if an email

question based on our data if an email contains the word cheap what is the

contains the word cheap what is the

contains the word cheap what is the probability of this email being spam is

probability of this email being spam is

probability of this email being spam is it 40% 60% or 80% well to help us out we

it 40% 60% or 80% well to help us out we

it 40% 60% or 80% well to help us out we can see that out of the 25 emails with

can see that out of the 25 emails with

can see that out of the 25 emails with the word cheap 20 of them are spam while

the word cheap 20 of them are spam while

the word cheap 20 of them are spam while 5 of them are not so these form an 80/20

5 of them are not so these form an 80/20

5 of them are not so these form an 80/20 split so the correct answer with 80

split so the correct answer with 80

split so the correct answer with 80 if you said 80 you were correct so from

if you said 80 you were correct so from

if you said 80 you were correct so from analyzing the data we can conclude a

analyzing the data we can conclude a

analyzing the data we can conclude a rule the rule says if an email contains

rule the rule says if an email contains

rule the rule says if an email contains the word cheap then we’re going to say

the word cheap then we’re going to say

the word cheap then we’re going to say the probability of it being spam is 80%

the probability of it being spam is 80%

the probability of it being spam is 80% so we then associate this feature with

so we then associate this feature with

so we then associate this feature with the probability 80% and we’re going to

the probability 80% and we’re going to

the probability 80% and we’re going to use it to flag future messages as spam

use it to flag future messages as spam

use it to flag future messages as spam or not spam we can also look at other

or not spam we can also look at other

or not spam we can also look at other features and try to find our Associated

features and try to find our Associated

features and try to find our Associated probability let’s say we look at emails

probability let’s say we look at emails

probability let’s say we look at emails containing a spelling mistake and

containing a spelling mistake and

containing a spelling mistake and realize that the probability of an email

realize that the probability of an email

realize that the probability of an email containing a spelling mistake being spam

containing a spelling mistake being spam

containing a spelling mistake being spam is 70% or let’s say we look at emails

is 70% or let’s say we look at emails

is 70% or let’s say we look at emails that are missing a title and find the

that are missing a title and find the

that are missing a title and find the probability of those being spam is 95%

probability of those being spam is 95%

probability of those being spam is 95% etc etc so now when future emails come

etc etc so now when future emails come

etc etc so now when future emails come we can combine these features to guess

we can combine these features to guess

we can combine these features to guess their spam or not this algorithm is

their spam or not this algorithm is

their spam or not this algorithm is known as the naive Bayes algorithm okay

known as the naive Bayes algorithm okay

known as the naive Bayes algorithm okay so now another example we are the App

so now another example we are the App

so now another example we are the App Store or Google Play and our goal is to

Store or Google Play and our goal is to

Store or Google Play and our goal is to recommend apps to users so to each user

recommend apps to users so to each user

recommend apps to users so to each user we’re going to try to recommend them

we’re going to try to recommend them

we’re going to try to recommend them app that they are most likely to

app that they are most likely to

app that they are most likely to download we have gathered a table of

download we have gathered a table of

download we have gathered a table of data that we’re going to use to make the

data that we’re going to use to make the

data that we’re going to use to make the rules on the table contains six people

rules on the table contains six people

rules on the table contains six people for each one of those six people we have

for each one of those six people we have

for each one of those six people we have recorded their gender and their age and

recorded their gender and their age and

recorded their gender and their age and the app they downloaded so for example

the app they downloaded so for example

the app they downloaded so for example the first person is a 15 year old female

the first person is a 15 year old female

the first person is a 15 year old female and she downloaded pokemon gold so

and she downloaded pokemon gold so

and she downloaded pokemon gold so here’s a small quiz between gender and

here’s a small quiz between gender and

here’s a small quiz between gender and age which one seems like the more

age which one seems like the more

age which one seems like the more decisive feature for predicting what app

decisive feature for predicting what app

decisive feature for predicting what app will be users download well to help us

will be users download well to help us

will be users download well to help us out first let’s look at gender if we

out first let’s look at gender if we

out first let’s look at gender if we split them by gender than the females

split them by gender than the females

split them by gender than the females downloaded Pokemon go on whatsapp

downloaded Pokemon go on whatsapp

downloaded Pokemon go on whatsapp whereas the male is downloaded Pokemon

whereas the male is downloaded Pokemon

whereas the male is downloaded Pokemon go and snapchat so not much for split

go and snapchat so not much for split

go and snapchat so not much for split here on the other hand if we look at age

here on the other hand if we look at age

here on the other hand if we look at age we realize that everybody who’s under 20

we realize that everybody who’s under 20

we realize that everybody who’s under 20 years old downloaded pokemon gold

years old downloaded pokemon gold

years old downloaded pokemon gold whereas everybody who is 20 or older

whereas everybody who is 20 or older

whereas everybody who is 20 or older didn’t

didn’t

didn’t that’s a nice split so the feature the

that’s a nice split so the feature the

that’s a nice split so the feature the best splits the data is H therefore if

best splits the data is H therefore if

best splits the data is H therefore if you said age that was correct so we’re

you said age that was correct so we’re

you said age that was correct so we’re going to do is we’re going to add a

going to do is we’re going to add a

going to do is we’re going to add a question here the question is are you

question here the question is are you

question here the question is are you younger than 20 if yes then we’ll

younger than 20 if yes then we’ll

younger than 20 if yes then we’ll recommend Pokemon go to you if not then

recommend Pokemon go to you if not then

recommend Pokemon go to you if not then we’ll see so what happens if you’re 20

we’ll see so what happens if you’re 20

we’ll see so what happens if you’re 20 or older then we look at the gender it

or older then we look at the gender it

or older then we look at the gender it seems like here if you’re a female

seems like here if you’re a female

seems like here if you’re a female you’ve downloaded what’s up whereas if

you’ve downloaded what’s up whereas if

you’ve downloaded what’s up whereas if you’re a male you download it snapchat

you’re a male you download it snapchat

you’re a male you download it snapchat so we add another question here the

so we add another question here the

so we add another question here the question is are you female or male and

question is are you female or male and

question is are you female or male and if you’re female

if you’re female

if you’re female we recommend what’s up and if you’re

we recommend what’s up and if you’re

we recommend what’s up and if you’re male then we recommend snapchat so what

male then we recommend snapchat so what

male then we recommend snapchat so what we end up here is with a decision tree

we end up here is with a decision tree

we end up here is with a decision tree and the decisions are given by the

and the decisions are given by the

and the decisions are given by the question we asked and this decision tree

question we asked and this decision tree

question we asked and this decision tree was built with the data and now whenever

was built with the data and now whenever

was built with the data and now whenever we have any user we can put them to the

we have any user we can put them to the

we have any user we can put them to the decision tree and recommend them

decision tree and recommend them

decision tree and recommend them whatever app the tree suggests is to

whatever app the tree suggests is to

whatever app the tree suggests is to recommend for example you have a young

recommend for example you have a young

recommend for example you have a young person

person

person you recommend them Pokemon go if you

you recommend them Pokemon go if you

you recommend them Pokemon go if you have an older person you check their

have an older person you check their

have an older person you check their gender if it’s a female you recommend

gender if it’s a female you recommend

gender if it’s a female you recommend them what’s up and it’s a male you

them what’s up and it’s a male you

them what’s up and it’s a male you recommend them snapchat obviously there

recommend them snapchat obviously there

recommend them snapchat obviously there won’t always be a tree that perfectly

won’t always be a tree that perfectly

won’t always be a tree that perfectly fits our data but in this class we’re

fits our data but in this class we’re

fits our data but in this class we’re going to learn an algorithm which

going to learn an algorithm which

going to learn an algorithm which actually will help us find the best

actually will help us find the best

actually will help us find the best fitting tree to your table of data okay

fitting tree to your table of data okay

fitting tree to your table of data okay so let’s go to the next example

so let’s go to the next example

so let’s go to the next example now let’s say we’re the admissions

now let’s say we’re the admissions

now let’s say we’re the admissions office at a university and we’re trying

office at a university and we’re trying

office at a university and we’re trying to figure out which students to admit

to figure out which students to admit

to figure out which students to admit we’re going to admit them or reject them

we’re going to admit them or reject them

we’re going to admit them or reject them based on two pieces of information one

based on two pieces of information one

based on two pieces of information one is an entrance exam that we provide them

is an entrance exam that we provide them

is an entrance exam that we provide them the test and the other one is their

the test and the other one is their

the test and the other one is their grades from school so for example here

grades from school so for example here

grades from school so for example here we have student 1 with scores of 9 out

we have student 1 with scores of 9 out

we have student 1 with scores of 9 out of 10 in the test and 8 out of 10 and

of 10 in the test and 8 out of 10 and

of 10 in the test and 8 out of 10 and the grades and that student got accepted

the grades and that student got accepted

the grades and that student got accepted we also have student 2 with scores a 3

we also have student 2 with scores a 3

we also have student 2 with scores a 3 in the test and 4 in the grades and that

in the test and 4 in the grades and that

in the test and 4 in the grades and that student did not get accepted and then a

student did not get accepted and then a

student did not get accepted and then a new student comes in student 3 this

new student comes in student 3 this

new student comes in student 3 this person has a son has scores of 7 and 6

person has a son has scores of 7 and 6

person has a son has scores of 7 and 6 and the question is should we accept

and the question is should we accept

and the question is should we accept them or not

them or not

them or not so let’s first put them in a grid or the

so let’s first put them in a grid or the

so let’s first put them in a grid or the x-axis represents our score on the tests

x-axis represents our score on the tests

x-axis represents our score on the tests and the y-axis represents their grades

and the y-axis represents their grades

and the y-axis represents their grades here we can see that student 1 would lie

here we can see that student 1 would lie

here we can see that student 1 would lie over here in the point with coordinates

over here in the point with coordinates

over here in the point with coordinates 9 8 since their scores were 9 and 8 and

9 8 since their scores were 9 and 8 and

9 8 since their scores were 9 and 8 and the student 2 would lie right here in

the student 2 would lie right here in

the student 2 would lie right here in the point with coordinates 3 4 since

the point with coordinates 3 4 since

the point with coordinates 3 4 since their scores were 3 & 4 so in order to

their scores were 3 & 4 so in order to

their scores were 3 & 4 so in order to see if we should accept or reject Stu

see if we should accept or reject Stu

see if we should accept or reject Stu and 3 we should try to find it training

and 3 we should try to find it training

and 3 we should try to find it training that data so we look at the previous

that data so we look at the previous

that data so we look at the previous data in the form of all the students

data in the form of all the students

data in the form of all the students we’ve already accepted or rejected and

we’ve already accepted or rejected and

we’ve already accepted or rejected and it turns out that the previous data

it turns out that the previous data

it turns out that the previous data looks like this the green dots represent

looks like this the green dots represent

looks like this the green dots represent students that we’ve previously accepted

students that we’ve previously accepted

students that we’ve previously accepted and the red dots represent students that

and the red dots represent students that

and the red dots represent students that we’ve previously rejected so time for a

we’ve previously rejected so time for a

we’ve previously rejected so time for a quiz

quiz

quiz based on the previous data do we think

based on the previous data do we think

based on the previous data do we think student 3 gets accepted yes or no so to

student 3 gets accepted yes or no so to

student 3 gets accepted yes or no so to answer this question let’s look closely

answer this question let’s look closely

answer this question let’s look closely at the data the red and green dots seem

at the data the red and green dots seem

at the data the red and green dots seem to be nicely separated by a line here’s

to be nicely separated by a line here’s

to be nicely separated by a line here’s the line and most of the points over at

the line and most of the points over at

the line and most of the points over at are green and most of the points under

are green and most of the points under

are green and most of the points under it are red with some exceptions which

it are red with some exceptions which

it are red with some exceptions which makes sense since the students who got

makes sense since the students who got

makes sense since the students who got high scores are over the line and they

high scores are over the line and they

high scores are over the line and they got accepted in soon so what lowest

got accepted in soon so what lowest

got accepted in soon so what lowest scores are under the line and they

scores are under the line and they

scores are under the line and they didn’t get accepted so we’re going to

didn’t get accepted so we’re going to

didn’t get accepted so we’re going to say that that line is going to be our

say that that line is going to be our

say that that line is going to be our model and now every time we get a new

model and now every time we get a new

model and now every time we get a new student we check their scores and plot

student we check their scores and plot

student we check their scores and plot them in this graph and if they end up

them in this graph and if they end up

them in this graph and if they end up over the line we predict that they’ll

over the line we predict that they’ll

over the line we predict that they’ll get accepted and if they end up below

get accepted and if they end up below

get accepted and if they end up below the line we predict that they’ll get

the line we predict that they’ll get

the line we predict that they’ll get rejected so since students 3 has grades

rejected so since students 3 has grades

rejected so since students 3 has grades 7 and 6 a person will end up here at the

7 and 6 a person will end up here at the

7 and 6 a person will end up here at the point 7 6 which is over the line so we

point 7 6 which is over the line so we

point 7 6 which is over the line so we conclude that this students gets

conclude that this students gets

conclude that this students gets accepted so if you said yes that’s a

accepted so if you said yes that’s a

accepted so if you said yes that’s a correct answer this method is known as

correct answer this method is known as

correct answer this method is known as logistic regression another question is

logistic regression another question is

logistic regression another question is how do you find this line that best cuts

how do you find this line that best cuts

how do you find this line that best cuts the data and – so let’s look at a simple

the data and – so let’s look at a simple

the data and – so let’s look at a simple example is 6 points 3 Green 3 red and

example is 6 points 3 Green 3 red and

example is 6 points 3 Green 3 red and we’re going to try to draw a line that

we’re going to try to draw a line that

we’re going to try to draw a line that best separates the green points from the

best separates the green points from the

best separates the green points from the red points and again a computer can’t

red points and again a computer can’t

red points and again a computer can’t really eyeball the line so you can just

really eyeball the line so you can just

really eyeball the line so you can just start by drawing a random line like this

start by drawing a random line like this

start by drawing a random line like this one and given this line let’s just

one and given this line let’s just

one and given this line let’s just randomly say that we label the region

randomly say that we label the region

randomly say that we label the region over the line is green and the region

over the line is green and the region

over the line is green and the region under line is red so just like with

under line is red so just like with

under line is red so just like with linear regression we’re going to try to

linear regression we’re going to try to

linear regression we’re going to try to see how bad this first line is and the

see how bad this first line is and the

see how bad this first line is and the measure of how bad the line is would be

measure of how bad the line is would be

measure of how bad the line is would be how many points are we miss classifying

how many points are we miss classifying

how many points are we miss classifying we’re going to call that number

we’re going to call that number

we’re going to call that number misclassified points the error this line

misclassified points the error this line

misclassified points the error this line for example misclassified two points one

for example misclassified two points one

for example misclassified two points one red and one green so we’ll say that it

red and one green so we’ll say that it

red and one green so we’ll say that it has two errors so again like with linear

has two errors so again like with linear

has two errors so again like with linear regression what we’ll do is move the

regression what we’ll do is move the

regression what we’ll do is move the line around

line around

line around and try to minimize the number of errors

and try to minimize the number of errors

and try to minimize the number of errors using gradient descent so I’ve removed

using gradient descent so I’ve removed

using gradient descent so I’ve removed the line a bit in this direction we can

the line a bit in this direction we can

the line a bit in this direction we can see that we start correctly classifying

see that we start correctly classifying

see that we start correctly classifying one of the points bringing down the

one of the points bringing down the

one of the points bringing down the number of errors to one and if we move

number of errors to one and if we move

number of errors to one and if we move it a little more correctly classify the

it a little more correctly classify the

it a little more correctly classify the other one of the points bringing down

other one of the points bringing down

other one of the points bringing down the number of errors to zero in reality

the number of errors to zero in reality

the number of errors to zero in reality since we use calculus for a gradient

since we use calculus for a gradient

since we use calculus for a gradient descent method it turns out that the

descent method it turns out that the

descent method it turns out that the number of errors is not what we need to

number of errors is not what we need to

number of errors is not what we need to minimize but instead something that

minimize but instead something that

minimize but instead something that captures the number of errors called the

captures the number of errors called the

captures the number of errors called the log loss function and the idea behind

log loss function and the idea behind

log loss function and the idea behind the log loss function is that it’s a

the log loss function is that it’s a

the log loss function is that it’s a function which assigns a large value to

function which assigns a large value to

function which assigns a large value to the misclassified points and a small

the misclassified points and a small

the misclassified points and a small value to the classified points ok so

value to the classified points ok so

value to the classified points ok so let’s look more carefully at this model

let’s look more carefully at this model

let’s look more carefully at this model for accepting or rejecting students

for accepting or rejecting students

for accepting or rejecting students let’s say we have a student for who got

let’s say we have a student for who got

let’s say we have a student for who got nine in the test and one on the grades

nine in the test and one on the grades

nine in the test and one on the grades so the student gets accepted according

so the student gets accepted according

so the student gets accepted according to our model since they are over here on

to our model since they are over here on

to our model since they are over here on top of the line but that seems wrong

top of the line but that seems wrong

top of the line but that seems wrong since I student got very low grades you

since I student got very low grades you

since I student got very low grades you can get accepted no matter what their

can get accepted no matter what their

can get accepted no matter what their test score was so maybe it’s simplistic

test score was so maybe it’s simplistic

test score was so maybe it’s simplistic to think this data can be separated by

to think this data can be separated by

to think this data can be separated by just one line right maybe the real data

just one line right maybe the real data

just one line right maybe the real data should look more like this where these

should look more like this where these

should look more like this where these students over here

students over here

students over here we’ve got a load test score or low

we’ve got a load test score or low

we’ve got a load test score or low grades don’t get accepted so now it

grades don’t get accepted so now it

grades don’t get accepted so now it seems like a line won’t cut the data

seems like a line won’t cut the data

seems like a line won’t cut the data into so what’s the next thing after a

into so what’s the next thing after a

into so what’s the next thing after a line maybe a circle circle could work

line maybe a circle circle could work

line maybe a circle circle could work maybe two lines that could work too

maybe two lines that could work too

maybe two lines that could work too actually it looks like that works better

actually it looks like that works better

actually it looks like that works better so let’s go with that now the question

so let’s go with that now the question

so let’s go with that now the question is how do we find these two lines again

is how do we find these two lines again

is how do we find these two lines again we can do it using gradient descent to

we can do it using gradient descent to

we can do it using gradient descent to minimize a similar log loss function at

minimize a similar log loss function at

minimize a similar log loss function at the for this is called a neural network

the for this is called a neural network

the for this is called a neural network now why is it called a neural network

now why is it called a neural network

now why is it called a neural network well let’s see we have this green area

well let’s see we have this green area

well let’s see we have this green area here by and about two lines this area

here by and about two lines this area

here by and about two lines this area can be constructed as an intersection

can be constructed as an intersection

can be constructed as an intersection namely the intersection between the

namely the intersection between the

namely the intersection between the green area on top of one lines and the

green area on top of one lines and the

green area on top of one lines and the green area to the right of the other one

green area to the right of the other one

green area to the right of the other one of the lines so

of the lines so

of the lines so we’re going to graph it like this we

we’re going to graph it like this we

we’re going to graph it like this we have two nodes each node is a line that

have two nodes each node is a line that

have two nodes each node is a line that separates the plane into two regions and

separates the plane into two regions and

separates the plane into two regions and from the two nodes we get the

from the two nodes we get the

from the two nodes we get the intersection which is the desired area

intersection which is the desired area

intersection which is the desired area the reason why this is called the neural

the reason why this is called the neural

the reason why this is called the neural network is because this mimics the

network is because this mimics the

network is because this mimics the behavior the brain in the brain we have

behavior the brain in the brain we have

behavior the brain in the brain we have the neurons which connect to each other

the neurons which connect to each other

the neurons which connect to each other and they either fire electricity or not

and they either fire electricity or not

and they either fire electricity or not they resemble the nodes in our graph

they resemble the nodes in our graph

they resemble the nodes in our graph which split the plane into regions and

which split the plane into regions and

which split the plane into regions and fire electricity for given point belongs

fire electricity for given point belongs

fire electricity for given point belongs to one of those regions and won’t fire

to one of those regions and won’t fire

to one of those regions and won’t fire if it doesn’t so we can’t explain your

if it doesn’t so we can’t explain your

if it doesn’t so we can’t explain your aggression as a ninja we’ll look at your

aggression as a ninja we’ll look at your

aggression as a ninja we’ll look at your data and cut it in half based on the

data and cut it in half based on the

data and cut it in half based on the labels and we can think of a neural

labels and we can think of a neural

labels and we can think of a neural network as a team of ninjas who will

network as a team of ninjas who will

network as a team of ninjas who will look at your data and cut it into

look at your data and cut it into

look at your data and cut it into regions based on the labels okay

regions based on the labels okay

regions based on the labels okay so let’s dive a bit deeper into the art

so let’s dive a bit deeper into the art

so let’s dive a bit deeper into the art of splitting data into two we can look

of splitting data into two we can look

of splitting data into two we can look at this points three green and three red

at this points three green and three red

at this points three green and three red and there seem to be many lines that can

and there seem to be many lines that can

and there seem to be many lines that can split them for example there is this

split them for example there is this

split them for example there is this yellow line and there is this purple

yellow line and there is this purple

yellow line and there is this purple line so quiz which of these two lines do

line so quiz which of these two lines do

line so quiz which of these two lines do athing cuts the data better the purple

athing cuts the data better the purple

athing cuts the data better the purple or the yellow one well if we look at the

or the yellow one well if we look at the

or the yellow one well if we look at the yellow line it seems that it’s close to

yellow line it seems that it’s close to

yellow line it seems that it’s close to failing it’s too close to two of the

failing it’s too close to two of the

failing it’s too close to two of the points so if we were to wiggle it a

points so if we were to wiggle it a

points so if we were to wiggle it a little bit we would miss classify some

little bit we would miss classify some

little bit we would miss classify some of the points the purple one on the

of the points the purple one on the

of the points the purple one on the other hand seems to be nicely spaced and

other hand seems to be nicely spaced and

other hand seems to be nicely spaced and as far as we can from all the points so

as far as we can from all the points so

as far as we can from all the points so it seems like the best line is a purple

it seems like the best line is a purple

it seems like the best line is a purple one now the question is how do we find

one now the question is how do we find

one now the question is how do we find the purple line well the first

the purple line well the first

the purple line well the first observation is that we don’t really need

observation is that we don’t really need

observation is that we don’t really need to worry about these points because

to worry about these points because

to worry about these points because they’re too far from the boundary so we

they’re too far from the boundary so we

they’re too far from the boundary so we can forget about them and only focus on

can forget about them and only focus on

can forget about them and only focus on the points that are close and now what

the points that are close and now what

the points that are close and now what we’re going to use is not gradient

we’re going to use is not gradient

we’re going to use is not gradient descent but we’re going to use linear

descent but we’re going to use linear

descent but we’re going to use linear optimization to find the line that

optimization to find the line that

optimization to find the line that maximizes the distance from the boundary

maximizes the distance from the boundary

maximizes the distance from the boundary points this method is called a support

points this method is called a support

points this method is called a support vector machine

so you can think of support vector

so you can think of support vector machines that surgeon will see your data

machines that surgeon will see your data

machines that surgeon will see your data and cut it but before she will carefully

and cut it but before she will carefully

and cut it but before she will carefully look at what’s the best way to separate

look at what’s the best way to separate

look at what’s the best way to separate the data into and then make the cut okay

the data into and then make the cut okay

the data into and then make the cut okay so now let’s say we have these four

so now let’s say we have these four

so now let’s say we have these four points arranged like this and we want to

points arranged like this and we want to

points arranged like this and we want to split them it seems like a line won’t do

split them it seems like a line won’t do

split them it seems like a line won’t do the job since they’re already over the

the job since they’re already over the

the job since they’re already over the line and the red ones are on the sides

line and the red ones are on the sides

line and the red ones are on the sides and the green ones are in the middle so

and the green ones are in the middle so

and the green ones are in the middle so we need to think outside the box one way

we need to think outside the box one way

we need to think outside the box one way to think outside the box is to use a

to think outside the box is to use a

to think outside the box is to use a curve like this to split them another

curve like this to split them another

curve like this to split them another one is to actually think outside the

one is to actually think outside the

one is to actually think outside the plain and to think of the points is

plain and to think of the points is

plain and to think of the points is lying in a three-dimensional space so

lying in a three-dimensional space so

lying in a three-dimensional space so here are the points over the plane and

here are the points over the plane and

here are the points over the plane and here we add an extra axis the z axis for

here we add an extra axis the z axis for

here we add an extra axis the z axis for the third dimension and if we can find a

the third dimension and if we can find a

the third dimension and if we can find a way to lift it to green points then we’d

way to lift it to green points then we’d

way to lift it to green points then we’d be able to separate them with a plane so

be able to separate them with a plane so

be able to separate them with a plane so what seems like a better solution the

what seems like a better solution the

what seems like a better solution the curve over here or the plane over here

curve over here or the plane over here

curve over here or the plane over here well it turns out that these two are

well it turns out that these two are

well it turns out that these two are actually the same method don’t worry if

actually the same method don’t worry if

actually the same method don’t worry if it seems confusing we’ll get into a

it seems confusing we’ll get into a

it seems confusing we’ll get into a little bit more detail later this method

little bit more detail later this method

little bit more detail later this method is called the kernel trick as very well

is called the kernel trick as very well

is called the kernel trick as very well used in support vector machines so let’s

used in support vector machines so let’s

used in support vector machines so let’s study one of them in more detail let’s

study one of them in more detail let’s

study one of them in more detail let’s start with the curve trick so let’s

start with the curve trick so let’s

start with the curve trick so let’s start by putting coordinates on the

start by putting coordinates on the

start by putting coordinates on the points this one is the point zero three

points this one is the point zero three

points this one is the point zero three this one is 1 2 this one is 2 1 and this

this one is 1 2 this one is 2 1 and this

this one is 1 2 this one is 2 1 and this one is 3 0 and what we need is a way to

one is 3 0 and what we need is a way to

one is 3 0 and what we need is a way to separate the green points from the red

separate the green points from the red

separate the green points from the red points so the points coordinates are X Y

points so the points coordinates are X Y

points so the points coordinates are X Y then we need an equation on the

then we need an equation on the

then we need an equation on the variables x and y that gives us large

variables x and y that gives us large

variables x and y that gives us large values for the green points and small

values for the green points and small

values for the green points and small values for the red points or vice versa

values for the red points or vice versa

values for the red points or vice versa so quiz which of the following equations

so quiz which of the following equations

so quiz which of the following equations could come to our rescue

could come to our rescue

could come to our rescue X plus y the product x times y

X plus y the product x times y

X plus y the product x times y or x squared the first coordinates

or x squared the first coordinates

or x squared the first coordinates squared this is a not an easy question

squared this is a not an easy question

squared this is a not an easy question so let’s actually make a table with the

so let’s actually make a table with the

so let’s actually make a table with the values of these equations on each of the

values of these equations on each of the

values of these equations on each of the four points so here’s our table here we

four points so here’s our table here we

four points so here’s our table here we have the four points on the top row and

have the four points on the top row and

have the four points on the top row and now each of the other rows will be one

now each of the other rows will be one

now each of the other rows will be one of the functions so here’s the sum X

of the functions so here’s the sum X

of the functions so here’s the sum X plus y we fill in the first row the

plus y we fill in the first row the

plus y we fill in the first row the following way 0 plus 3 is 3 1 plus 2 is

following way 0 plus 3 is 3 1 plus 2 is

following way 0 plus 3 is 3 1 plus 2 is 3 2 plus 1 3 3 plus 0 3 now for the

3 2 plus 1 3 3 plus 0 3 now for the

3 2 plus 1 3 3 plus 0 3 now for the second row we’re going to get the

second row we’re going to get the

second row we’re going to get the products 0 times 3 is 0 1 times 2 is 2 2

products 0 times 3 is 0 1 times 2 is 2 2

products 0 times 3 is 0 1 times 2 is 2 2 times 1 is 2 and 3 times 0 is 0 and for

times 1 is 2 and 3 times 0 is 0 and for

times 1 is 2 and 3 times 0 is 0 and for the third row x squared is the first

the third row x squared is the first

the third row x squared is the first coordinate squared so 0 squared is 0 1

coordinate squared so 0 squared is 0 1

coordinate squared so 0 squared is 0 1 squared is 1 2 squared is 4 and 3

squared is 1 2 squared is 4 and 3

squared is 1 2 squared is 4 and 3 squared is 9 so let’s think which one of

squared is 9 so let’s think which one of

squared is 9 so let’s think which one of these equations separates the green and

these equations separates the green and

these equations separates the green and the red points we look at the sum X plus

the red points we look at the sum X plus

the red points we look at the sum X plus y and that gives us 3 at every value so

y and that gives us 3 at every value so

y and that gives us 3 at every value so it doesn’t really separate the points we

it doesn’t really separate the points we

it doesn’t really separate the points we can look at x squared and that gives us

can look at x squared and that gives us

can look at x squared and that gives us different values for every point but we

different values for every point but we

different values for every point but we get 0 & 9 for the red values and 1 & 4

get 0 & 9 for the red values and 1 & 4

get 0 & 9 for the red values and 1 & 4 for the green ones so this one also

for the green ones so this one also

for the green ones so this one also don’t doesn’t separate them but now we

don’t doesn’t separate them but now we

don’t doesn’t separate them but now we look at the product x times y and that

look at the product x times y and that

look at the product x times y and that gives us 0 for the red values and 2 for

gives us 0 for the red values and 2 for

gives us 0 for the red values and 2 for the green ones so that one seems to do

the green ones so that one seems to do

the green ones so that one seems to do the job right it’s a function that can

the job right it’s a function that can

the job right it’s a function that can tell them apart so that’s the equation

tell them apart so that’s the equation

tell them apart so that’s the equation we’re going to use you can see their

we’re going to use you can see their

we’re going to use you can see their products here and now for the red points

products here and now for the red points

products here and now for the red points X comma Y we have that the product X y

X comma Y we have that the product X y

X comma Y we have that the product X y equals 0 and for the green points we

equals 0 and for the green points we

equals 0 and for the green points we have that the product X y equals 2 and

have that the product X y equals 2 and

have that the product X y equals 2 and what separates a 0 and a 2 well a 1 so

what separates a 0 and a 2 well a 1 so

what separates a 0 and a 2 well a 1 so the equation x y equals 1 will separate

the equation x y equals 1 will separate

the equation x y equals 1 will separate them

them

them and what is XY equals one it’s the same

and what is XY equals one it’s the same

and what is XY equals one it’s the same as y equals one over X and the graph for

as y equals one over X and the graph for

as y equals one over X and the graph for y was 1 over X is precisely this

y was 1 over X is precisely this

y was 1 over X is precisely this hyperbola over here that is the curve we

hyperbola over here that is the curve we

hyperbola over here that is the curve we want it so that is the kernel trick now

want it so that is the kernel trick now

want it so that is the kernel trick now we can also see it in 3d here we have

we can also see it in 3d here we have

we can also see it in 3d here we have the point 0 3 1 2 2 1 and 3 0 and we’re

the point 0 3 1 2 2 1 and 3 0 and we’re

the point 0 3 1 2 2 1 and 3 0 and we’re going to consider them in 3 space so

going to consider them in 3 space so

going to consider them in 3 space so we’re going to take the map that takes

we’re going to take the map that takes

we’re going to take the map that takes the point X comma Y 2 X comma Y comma X

the point X comma Y 2 X comma Y comma X

the point X comma Y 2 X comma Y comma X times y so where does 0 3 go 0 3 goes to

times y so where does 0 3 go 0 3 goes to

times y so where does 0 3 go 0 3 goes to 0 comma 3 comma 0 since the product of 0

0 comma 3 comma 0 since the product of 0

0 comma 3 comma 0 since the product of 0 & 3 is 0 1 2 goes to 1 comma 2 comma 2

& 3 is 0 1 2 goes to 1 comma 2 comma 2

& 3 is 0 1 2 goes to 1 comma 2 comma 2 so it goes all the way up since the

so it goes all the way up since the

so it goes all the way up since the third coordinate is the height the point

third coordinate is the height the point

third coordinate is the height the point 2 1 also goes to 2 comma 1 comma 2 and

2 1 also goes to 2 comma 1 comma 2 and

2 1 also goes to 2 comma 1 comma 2 and the point 3 0 goes to 3 comma 0 comma 0

the point 3 0 goes to 3 comma 0 comma 0

the point 3 0 goes to 3 comma 0 comma 0 so there we go we can split them using a

so there we go we can split them using a

so there we go we can split them using a plane so you can think of a support

plane so you can think of a support

plane so you can think of a support vector machine a kernel method as a

vector machine a kernel method as a

vector machine a kernel method as a surgeon who is a slightly confused

surgeon who is a slightly confused

surgeon who is a slightly confused trying to split some apples and oranges

trying to split some apples and oranges

trying to split some apples and oranges all of a sudden she comes up with a

all of a sudden she comes up with a

all of a sudden she comes up with a great idea the idea consists of moving

great idea the idea consists of moving

great idea the idea consists of moving the apples up and the oranges down and

the apples up and the oranges down and

the apples up and the oranges down and then successfully cutting a line through

then successfully cutting a line through

then successfully cutting a line through between them ok so let’s move the next

between them ok so let’s move the next

between them ok so let’s move the next example let’s say we have a chain of

example let’s say we have a chain of

example let’s say we have a chain of pizza parlors and we want to put 3 of

pizza parlors and we want to put 3 of

pizza parlors and we want to put 3 of them in this city so we make a study and

them in this city so we make a study and

them in this city so we make a study and realize that the people who eat piece of

realize that the people who eat piece of

realize that the people who eat piece of the most live in these locations and so

the most live in these locations and so

the most live in these locations and so we need to know where are the optimal

we need to know where are the optimal

we need to know where are the optimal places to put our 3 pizza parlors well

places to put our 3 pizza parlors well

places to put our 3 pizza parlors well it seems like the houses are nicely

it seems like the houses are nicely

it seems like the houses are nicely split into three groups the red the blue

split into three groups the red the blue

split into three groups the red the blue and the yellow so it makes sense to put

and the yellow so it makes sense to put

and the yellow so it makes sense to put one pizza parlor in each one of the

one pizza parlor in each one of the

one pizza parlor in each one of the three clusters but we’re teaching a

three clusters but we’re teaching a

three clusters but we’re teaching a computer how to do this a computer can

computer how to do this a computer can

computer how to do this a computer can just eyeball the three clusters we need

just eyeball the three clusters we need

just eyeball the three clusters we need an algorithm so here’s one algorithm

an algorithm so here’s one algorithm

an algorithm so here’s one algorithm that’ll work

that’ll work

that’ll work let’s start by choosing three random

let’s start by choosing three random

let’s start by choosing three random locations for the pizza parlors so

locations for the pizza parlors so

locations for the pizza parlors so they’re here where the stars are located

they’re here where the stars are located

they’re here where the stars are located red blue and yellow now it makes sense

red blue and yellow now it makes sense

red blue and yellow now it makes sense to say each house should go to the pizza

to say each house should go to the pizza

to say each house should go to the pizza parlor that is closest to it in that

parlor that is closest to it in that

parlor that is closest to it in that case we can look at the map like this

case we can look at the map like this

case we can look at the map like this where the yellow houses go to the yellow

where the yellow houses go to the yellow

where the yellow houses go to the yellow pizza parlor the blue houses go to the

pizza parlor the blue houses go to the

pizza parlor the blue houses go to the blue pizza parlor and the red houses go

blue pizza parlor and the red houses go

blue pizza parlor and the red houses go to the red pizza parlor but now look at

to the red pizza parlor but now look at

to the red pizza parlor but now look at where the yellow houses are located you

where the yellow houses are located you

where the yellow houses are located you would make a lot of sense to move the

would make a lot of sense to move the

would make a lot of sense to move the yellow pizza parlor to the center of

yellow pizza parlor to the center of

yellow pizza parlor to the center of these houses same thing with the blue

these houses same thing with the blue

these houses same thing with the blue houses and the red houses so let’s do

houses and the red houses so let’s do

houses and the red houses so let’s do that let’s move every pizza parlor to

that let’s move every pizza parlor to

that let’s move every pizza parlor to the center of the houses that it serves

the center of the houses that it serves

the center of the houses that it serves as follows but now look at these blue

as follows but now look at these blue

as follows but now look at these blue points there are a lot closer to the

points there are a lot closer to the

points there are a lot closer to the yellow pizza parlor than to the blue one

yellow pizza parlor than to the blue one

yellow pizza parlor than to the blue one so we might as well color them yellow

so we might as well color them yellow

so we might as well color them yellow and look at these red points they’re

and look at these red points they’re

and look at these red points they’re closer to the blue bits of color then to

closer to the blue bits of color then to

closer to the blue bits of color then to the red so let’s color them blue and now

the red so let’s color them blue and now

the red so let’s color them blue and now let’s do the step again that send each

let’s do the step again that send each

let’s do the step again that send each pizza parlor to the center of this

pizza parlor to the center of this

pizza parlor to the center of this houses that is serving in this way but

houses that is serving in this way but

houses that is serving in this way but then again look at this red house is

then again look at this red house is

then again look at this red house is there so much closer to the blue pizza

there so much closer to the blue pizza

there so much closer to the blue pizza parlor so let’s turn them blue and then

parlor so let’s turn them blue and then

parlor so let’s turn them blue and then again let’s move every pizza parlor to

again let’s move every pizza parlor to

again let’s move every pizza parlor to the center of the house as it serves and

the center of the house as it serves and

the center of the house as it serves and now we’ve reached an optimal solution so

now we’ve reached an optimal solution so

now we’ve reached an optimal solution so starting with random points and

starting with random points and

starting with random points and iterating this process helped us reach

iterating this process helped us reach

iterating this process helped us reach the best locations for the pizza parlors

the best locations for the pizza parlors

the best locations for the pizza parlors this algorithm is called k-means

this algorithm is called k-means

this algorithm is called k-means clustering but now let’s just say we

clustering but now let’s just say we

clustering but now let’s just say we don’t want to specify the number of

don’t want to specify the number of

don’t want to specify the number of clusters to begin with it’s just a

clusters to begin with it’s just a

clusters to begin with it’s just a different way to group the houses so say

different way to group the houses so say

different way to group the houses so say they’re arranged like this it would make

they’re arranged like this it would make

they’re arranged like this it would make sense to say the following if two houses

sense to say the following if two houses

sense to say the following if two houses are close they should be served by the

are close they should be served by the

are close they should be served by the same pizza parlor so if we go by this

same pizza parlor so if we go by this

same pizza parlor so if we go by this rule let’s try to group the house

rule let’s try to group the house

rule let’s try to group the house let’s look at which houses are the

let’s look at which houses are the

let’s look at which houses are the closest to each other it’s these two

closest to each other it’s these two

closest to each other it’s these two over here

over here

over here so we grouped them now what are the next

so we grouped them now what are the next

so we grouped them now what are the next two closest houses it’s these two over

two closest houses it’s these two over

two closest houses it’s these two over here

here

here so we grouped them the next two closest

so we grouped them the next two closest

so we grouped them the next two closest houses are these two so again we grouped

houses are these two so again we grouped

houses are these two so again we grouped them the next two closest outside is two

them the next two closest outside is two

them the next two closest outside is two so we unite the groups now the next two

so we unite the groups now the next two

so we unite the groups now the next two house right here so we grouped them the

house right here so we grouped them the

house right here so we grouped them the next two clusters are here so we join

next two clusters are here so we join

next two clusters are here so we join the groups the next two closest houses

the groups the next two closest houses

the groups the next two closest houses are here but now let’s just say that’s

are here but now let’s just say that’s

are here but now let’s just say that’s too big so all we need to do is specify

too big so all we need to do is specify

too big so all we need to do is specify a distance and say this distance is too

a distance and say this distance is too

a distance and say this distance is too far when you reach this distance stop

far when you reach this distance stop

far when you reach this distance stop clustering the houses and now we get our

clustering the houses and now we get our

clustering the houses and now we get our clusters this algorithm is called here

clusters this algorithm is called here

clusters this algorithm is called here article clustering so congratulations

article clustering so congratulations

article clustering so congratulations in this video we’ve learned many of the

in this video we’ve learned many of the

in this video we’ve learned many of the main algorithms of machine learning we

main algorithms of machine learning we

main algorithms of machine learning we learn to find you house prices using

learn to find you house prices using

learn to find you house prices using linear regression we learn to detect

linear regression we learn to detect

linear regression we learn to detect spam email using naive Bayes we learn to

spam email using naive Bayes we learn to

spam email using naive Bayes we learn to recommend apps using decision trees we

recommend apps using decision trees we

recommend apps using decision trees we learn to create a model for an

learn to create a model for an

learn to create a model for an admissions office using logistic

admissions office using logistic

admissions office using logistic regression we learn how to improve them

regression we learn how to improve them

regression we learn how to improve them using neural networks and we learn how

using neural networks and we learn how

using neural networks and we learn how to improve it even more using support

to improve it even more using support

to improve it even more using support vector machines and finally we learn how

vector machines and finally we learn how

vector machines and finally we learn how to locate pizza parlors around the city

to locate pizza parlors around the city

to locate pizza parlors around the city using clustering algorithms so many

using clustering algorithms so many

using clustering algorithms so many questions may arise in your head such as

questions may arise in your head such as

questions may arise in your head such as are there more algorithms the answer is

are there more algorithms the answer is

are there more algorithms the answer is yes which ones to use that’s not easy

yes which ones to use that’s not easy

yes which ones to use that’s not easy given a data set how do we know which

given a data set how do we know which

given a data set how do we know which algorithm to pick how to compare them

algorithm to pick how to compare them

algorithm to pick how to compare them and evaluate them into algorithms how do

and evaluate them into algorithms how do

and evaluate them into algorithms how do you know which one is better than

you know which one is better than

you know which one is better than another one data set given the running

another one data set given the running

another one data set given the running time their accuracy etc are there

time their accuracy etc are there

time their accuracy etc are there examples other projects are the real up

examples other projects are the real up

examples other projects are the real up that data that I can get my hands dirty

that data that I can get my hands dirty

that data that I can get my hands dirty with them the answer to all these

with them the answer to all these

with them the answer to all these questions are more or in the Udacity

questions are more or in the Udacity

questions are more or in the Udacity machine learning nanodegree so if this

machine learning nanodegree so if this

machine learning nanodegree so if this interests you you should take a look at

interests you you should take a look at

interests you you should take a look at it thank you

”

A Friendly Introduction to Machine Learning

Be First to Comment

Leave a Reply Cancel reply