hi and welcome to the machine learning
hi and welcome to the machine learning peek of the grief and audacity so we’re
peek of the grief and audacity so we’re
peek of the grief and audacity so we’re going to talk about today is what is
going to talk about today is what is
going to talk about today is what is machine learning well this is the world
machine learning well this is the world
machine learning well this is the world and in the world we add humans and we
and in the world we add humans and we
and in the world we add humans and we got computers and one of the main
got computers and one of the main
got computers and one of the main differences between humans and computers
differences between humans and computers
differences between humans and computers is that humans learn from past
is that humans learn from past
is that humans learn from past experience whereas computers need to be
experience whereas computers need to be
experience whereas computers need to be told what to do need to be programmed so
told what to do need to be programmed so
told what to do need to be programmed so they follow instructions now the
they follow instructions now the
they follow instructions now the question is can we get computers to
question is can we get computers to
question is can we get computers to learn from experience too and the answer
learn from experience too and the answer
learn from experience too and the answer is yes we can and that’s precisely with
is yes we can and that’s precisely with
is yes we can and that’s precisely with machine learning is of course for
machine learning is of course for
machine learning is of course for computers fast experiences have a name
computers fast experiences have a name
computers fast experiences have a name called data so in the next few minutes
called data so in the next few minutes
called data so in the next few minutes I’m going to show you a few examples in
I’m going to show you a few examples in
I’m going to show you a few examples in which we can teach the computer how to
which we can teach the computer how to
which we can teach the computer how to learn from previous data and most
learn from previous data and most
learn from previous data and most importantly I’m going to show you that
importantly I’m going to show you that
importantly I’m going to show you that these algorithms are actually pretty
these algorithms are actually pretty
these algorithms are actually pretty easy and the machine learning is really
easy and the machine learning is really
easy and the machine learning is really nothing to fear so let’s go to the first
nothing to fear so let’s go to the first
nothing to fear so let’s go to the first example let’s say we’re studying the
example let’s say we’re studying the
example let’s say we’re studying the housing market on our task is to predict
housing market on our task is to predict
housing market on our task is to predict the price of a house given its size so
the price of a house given its size so
the price of a house given its size so we have a small house that cost $70,000
we have a small house that cost $70,000
we have a small house that cost $70,000 we have a big house that cost one
we have a big house that cost one
we have a big house that cost one hundred and sixty thousand dollars and
hundred and sixty thousand dollars and
hundred and sixty thousand dollars and we’d like to estimate the price of this
we’d like to estimate the price of this
we’d like to estimate the price of this medium-sized house here so how do we do
medium-sized house here so how do we do
medium-sized house here so how do we do it well first put them in a grid where
it well first put them in a grid where
it well first put them in a grid where the x-axis represents the size of the
the x-axis represents the size of the
the x-axis represents the size of the house and square feet and the y-axis
house and square feet and the y-axis
house and square feet and the y-axis represents the price of the house and
represents the price of the house and
represents the price of the house and dollars and so to help us out we have
dollars and so to help us out we have
dollars and so to help us out we have collected some previous data in the form
collected some previous data in the form
collected some previous data in the form of these blue dots these are other
of these blue dots these are other
of these blue dots these are other houses that we’ve looked at and we’ve
houses that we’ve looked at and we’ve
houses that we’ve looked at and we’ve recorded their prices with respect to
recorded their prices with respect to
recorded their prices with respect to their size so in this graph we can see
their size so in this graph we can see
their size so in this graph we can see that the small house is priced $70,000
that the small house is priced $70,000
that the small house is priced $70,000 and the big house is priced at a hundred
and the big house is priced at a hundred
and the big house is priced at a hundred and sixty thousand dollars so now it’s
and sixty thousand dollars so now it’s
and sixty thousand dollars so now it’s time for a small quiz what do you think
time for a small quiz what do you think
time for a small quiz what do you think is the best guess for the price of the
is the best guess for the price of the
is the best guess for the price of the medium house given this data would it be
medium house given this data would it be
medium house given this data would it be a thousand dollars one hundred and
a thousand dollars one hundred and
a thousand dollars one hundred and twenty thousand dollars or one hundred
twenty thousand dollars or one hundred
twenty thousand dollars or one hundred and ninety thousand dollars well to help
and ninety thousand dollars well to help
and ninety thousand dollars well to help us out we can see that these blue points
us out we can see that these blue points
us out we can see that these blue points kind of form a line so we can draw the
kind of form a line so we can draw the
kind of form a line so we can draw the line that best fits the data
line that best fits the data
line that best fits the data now in this line we can say that our
now in this line we can say that our
now in this line we can say that our best guess for the price of the house is
best guess for the price of the house is
best guess for the price of the house is this point over here which corresponds
this point over here which corresponds
this point over here which corresponds to one hundred and twenty thousand
to one hundred and twenty thousand
to one hundred and twenty thousand dollars so if you set one hundred and
dollars so if you set one hundred and
dollars so if you set one hundred and twenty thousand dollars that is correct
twenty thousand dollars that is correct
twenty thousand dollars that is correct this method is known as linear
this method is known as linear
this method is known as linear regression now you may ask how do we
regression now you may ask how do we
regression now you may ask how do we find this line well let’s look at a
find this line well let’s look at a
find this line well let’s look at a simple example this three points we’re
simple example this three points we’re
simple example this three points we’re going to try to find the best line that
going to try to find the best line that
going to try to find the best line that fits through those three points
fits through those three points
fits through those three points obviously best line is subjective while
obviously best line is subjective while
obviously best line is subjective while we try to find a line that works well
we try to find a line that works well
we try to find a line that works well since we’re teaching the computer how to
since we’re teaching the computer how to
since we’re teaching the computer how to do it computer can’t really eyeball the
do it computer can’t really eyeball the
do it computer can’t really eyeball the line so you have to get it to draw a
line so you have to get it to draw a
line so you have to get it to draw a random line and then see how bad this
random line and then see how bad this
random line and then see how bad this line is so in order to see how bad the
line is so in order to see how bad the
line is so in order to see how bad the line is we calculate the error so we’re
line is we calculate the error so we’re
line is we calculate the error so we’re gonna for calculate the error look at
gonna for calculate the error look at
gonna for calculate the error look at the lengths of the distances from the
the lengths of the distances from the
the lengths of the distances from the line to the three points and we’re just
line to the three points and we’re just
line to the three points and we’re just going to simply say that the error of
going to simply say that the error of
going to simply say that the error of this line is the sum of those three red
this line is the sum of those three red
this line is the sum of those three red lengths now what we’re going to do is
lengths now what we’re going to do is
lengths now what we’re going to do is move the line around and see if we can
move the line around and see if we can
move the line around and see if we can reduce this error so let’s say we moved
reduce this error so let’s say we moved
reduce this error so let’s say we moved in this direction and we calculate the
in this direction and we calculate the
in this direction and we calculate the error it’s given by the yellow distances
error it’s given by the yellow distances
error it’s given by the yellow distances we add them up and realize that we’ve
we add them up and realize that we’ve
we add them up and realize that we’ve increased the error so that’s not a good
increased the error so that’s not a good
increased the error so that’s not a good direction to go let’s try moving the
direction to go let’s try moving the
direction to go let’s try moving the other direction we move it here
other direction we move it here
other direction we move it here calculate the error now it’s given by
calculate the error now it’s given by
calculate the error now it’s given by the sum of these three green distances
the sum of these three green distances
the sum of these three green distances and we see that the error is smaller so
and we see that the error is smaller so
and we see that the error is smaller so we actually reduced it so let’s say we
we actually reduced it so let’s say we
we actually reduced it so let’s say we take that step we’re a little closer to
take that step we’re a little closer to
take that step we’re a little closer to our solution if we continue doing this
our solution if we continue doing this
our solution if we continue doing this procedure several times we will always
procedure several times we will always
procedure several times we will always be decreasing the error and we’ll
be decreasing the error and we’ll
be decreasing the error and we’ll finally arrive to a good solution in the
finally arrive to a good solution in the
finally arrive to a good solution in the form of this line this general procedure
form of this line this general procedure
form of this line this general procedure is known as gradient descent now in real
is known as gradient descent now in real
is known as gradient descent now in real life we don’t want to deal with negative
life we don’t want to deal with negative
life we don’t want to deal with negative distances corresponding to a point being
distances corresponding to a point being
distances corresponding to a point being on one or the other side of the line so
on one or the other side of the line so
on one or the other side of the line so what we do to solve this is add the
what we do to solve this is add the
what we do to solve this is add the square of the distance from the point to
square of the distance from the point to
square of the distance from the point to the line instead and this procedure is
the line instead and this procedure is
the line instead and this procedure is called least squares
so we’re going to cover in the census
so we’re going to cover in the census trying to the central mountain this is
trying to the central mountain this is
trying to the central mountain this is our Mountain Mount Everest this mounting
our Mountain Mount Everest this mounting
our Mountain Mount Everest this mounting the hi we are the larger error is so
the hi we are the larger error is so
the hi we are the larger error is so descending means reducing the error so
descending means reducing the error so
descending means reducing the error so what are we doing the credit the
what are we doing the credit the
what are we doing the credit the cinematic well look at our surroundings
cinematic well look at our surroundings
cinematic well look at our surroundings and try to figure out which way we can
and try to figure out which way we can
and try to figure out which way we can descend more for example here we can go
descend more for example here we can go
descend more for example here we can go in two directions to the right or to the
in two directions to the right or to the
in two directions to the right or to the left let’s go to the left then we’re
left let’s go to the left then we’re
left let’s go to the left then we’re going up insert error is ascending this
going up insert error is ascending this
going up insert error is ascending this is equivalent to moving the line
is equivalent to moving the line
is equivalent to moving the line downwards and getting farther from the
downwards and getting farther from the
downwards and getting farther from the three points but if we go to the right
three points but if we go to the right
three points but if we go to the right instead then we’re actually descending
instead then we’re actually descending
instead then we’re actually descending which means our error is decreasing this
which means our error is decreasing this
which means our error is decreasing this is equivalent to moving the line upwards
is equivalent to moving the line upwards
is equivalent to moving the line upwards and getting closer to the three points
and getting closer to the three points
and getting closer to the three points so we decide to take a step towards or
so we decide to take a step towards or
so we decide to take a step towards or right then we can start this procedure
right then we can start this procedure
right then we can start this procedure again and again and again until we
again and again and again until we
again and again and again until we successfully descend from the mountain
successfully descend from the mountain
successfully descend from the mountain this is equivalent to reducing the error
this is equivalent to reducing the error
this is equivalent to reducing the error until we find its minimum value which
until we find its minimum value which
until we find its minimum value which gives us the best line fit so you can
gives us the best line fit so you can
gives us the best line fit so you can think of linear regression as a painter
think of linear regression as a painter
think of linear regression as a painter and will look at your data and draw the
and will look at your data and draw the
and will look at your data and draw the best fitting line now this method is
best fitting line now this method is
best fitting line now this method is actually much stronger if the data
actually much stronger if the data
actually much stronger if the data doesn’t form a line with a very very
doesn’t form a line with a very very
doesn’t form a line with a very very similar method we can draw a circle
similar method we can draw a circle
similar method we can draw a circle through it or a parabola or even a
through it or a parabola or even a
through it or a parabola or even a higher degree curve for example the data
higher degree curve for example the data
higher degree curve for example the data here we can actually fit a cubic
here we can actually fit a cubic
here we can actually fit a cubic polynomial okay so let’s move to the
polynomial okay so let’s move to the
polynomial okay so let’s move to the next example in this example we’re going
next example in this example we’re going
next example in this example we’re going to build an email spam detection
to build an email spam detection
to build an email spam detection classifier so something that will tell
classifier so something that will tell
classifier so something that will tell us if an email is spam or not and how do
us if an email is spam or not and how do
us if an email is spam or not and how do we do this we do this by looking at
we do this we do this by looking at
we do this we do this by looking at previous data the previous data is 100
previous data the previous data is 100
previous data the previous data is 100 emails that we looked at already out of
emails that we looked at already out of
emails that we looked at already out of these 100 emails we have flagged 25 of
these 100 emails we have flagged 25 of
these 100 emails we have flagged 25 of them are spam and 75 of them is not spam
them are spam and 75 of them is not spam
them are spam and 75 of them is not spam now let’s try to think of features of
now let’s try to think of features of
now let’s try to think of features of spam emails may be likely to display and
spam emails may be likely to display and
spam emails may be likely to display and analyze these features so one feature
analyze these features so one feature
analyze these features so one feature could be containing the word cheap
could be containing the word cheap
could be containing the word cheap seems reasonable to think that an email
seems reasonable to think that an email
seems reasonable to think that an email containing the word cheap is likely to
containing the word cheap is likely to
containing the word cheap is likely to be spam so let’s analyze this claim we
be spam so let’s analyze this claim we
be spam so let’s analyze this claim we look for the word cheap in all these 100
look for the word cheap in all these 100
look for the word cheap in all these 100 emails and find that 20 out of spam
emails and find that 20 out of spam
emails and find that 20 out of spam loads and 5 out of the non spam ones
loads and 5 out of the non spam ones
loads and 5 out of the non spam ones contain that word so we can forget about
contain that word so we can forget about
contain that word so we can forget about all the rest of the emails and focus
all the rest of the emails and focus
all the rest of the emails and focus only on the ones that contain the word
only on the ones that contain the word
only on the ones that contain the word cheap okay so time for a quiz here’s the
cheap okay so time for a quiz here’s the
cheap okay so time for a quiz here’s the question based on our data if an email
question based on our data if an email
question based on our data if an email contains the word cheap what is the
contains the word cheap what is the
contains the word cheap what is the probability of this email being spam is
probability of this email being spam is
probability of this email being spam is it 40% 60% or 80% well to help us out we
it 40% 60% or 80% well to help us out we
it 40% 60% or 80% well to help us out we can see that out of the 25 emails with
can see that out of the 25 emails with
can see that out of the 25 emails with the word cheap 20 of them are spam while
the word cheap 20 of them are spam while
the word cheap 20 of them are spam while 5 of them are not so these form an 80/20
5 of them are not so these form an 80/20
5 of them are not so these form an 80/20 split so the correct answer with 80
split so the correct answer with 80
split so the correct answer with 80 if you said 80 you were correct so from
if you said 80 you were correct so from
if you said 80 you were correct so from analyzing the data we can conclude a
analyzing the data we can conclude a
analyzing the data we can conclude a rule the rule says if an email contains
rule the rule says if an email contains
rule the rule says if an email contains the word cheap then we’re going to say
the word cheap then we’re going to say
the word cheap then we’re going to say the probability of it being spam is 80%
the probability of it being spam is 80%
the probability of it being spam is 80% so we then associate this feature with
so we then associate this feature with
so we then associate this feature with the probability 80% and we’re going to
the probability 80% and we’re going to
the probability 80% and we’re going to use it to flag future messages as spam
use it to flag future messages as spam
use it to flag future messages as spam or not spam we can also look at other
or not spam we can also look at other
or not spam we can also look at other features and try to find our Associated
features and try to find our Associated
features and try to find our Associated probability let’s say we look at emails
probability let’s say we look at emails
probability let’s say we look at emails containing a spelling mistake and
containing a spelling mistake and
containing a spelling mistake and realize that the probability of an email
realize that the probability of an email
realize that the probability of an email containing a spelling mistake being spam
containing a spelling mistake being spam
containing a spelling mistake being spam is 70% or let’s say we look at emails
is 70% or let’s say we look at emails
is 70% or let’s say we look at emails that are missing a title and find the
that are missing a title and find the
that are missing a title and find the probability of those being spam is 95%
probability of those being spam is 95%
probability of those being spam is 95% etc etc so now when future emails come
etc etc so now when future emails come
etc etc so now when future emails come we can combine these features to guess
we can combine these features to guess
we can combine these features to guess their spam or not this algorithm is
their spam or not this algorithm is
their spam or not this algorithm is known as the naive Bayes algorithm okay
known as the naive Bayes algorithm okay
known as the naive Bayes algorithm okay so now another example we are the App
so now another example we are the App
so now another example we are the App Store or Google Play and our goal is to
Store or Google Play and our goal is to
Store or Google Play and our goal is to recommend apps to users so to each user
recommend apps to users so to each user
recommend apps to users so to each user we’re going to try to recommend them
we’re going to try to recommend them
we’re going to try to recommend them app that they are most likely to
app that they are most likely to
app that they are most likely to download we have gathered a table of
download we have gathered a table of
download we have gathered a table of data that we’re going to use to make the
data that we’re going to use to make the
data that we’re going to use to make the rules on the table contains six people
rules on the table contains six people
rules on the table contains six people for each one of those six people we have
for each one of those six people we have
for each one of those six people we have recorded their gender and their age and
recorded their gender and their age and
recorded their gender and their age and the app they downloaded so for example
the app they downloaded so for example
the app they downloaded so for example the first person is a 15 year old female
the first person is a 15 year old female
the first person is a 15 year old female and she downloaded pokemon gold so
and she downloaded pokemon gold so
and she downloaded pokemon gold so here’s a small quiz between gender and
here’s a small quiz between gender and
here’s a small quiz between gender and age which one seems like the more
age which one seems like the more
age which one seems like the more decisive feature for predicting what app
decisive feature for predicting what app
decisive feature for predicting what app will be users download well to help us
will be users download well to help us
will be users download well to help us out first let’s look at gender if we
out first let’s look at gender if we
out first let’s look at gender if we split them by gender than the females
split them by gender than the females
split them by gender than the females downloaded Pokemon go on whatsapp
downloaded Pokemon go on whatsapp
downloaded Pokemon go on whatsapp whereas the male is downloaded Pokemon
whereas the male is downloaded Pokemon
whereas the male is downloaded Pokemon go and snapchat so not much for split
go and snapchat so not much for split
go and snapchat so not much for split here on the other hand if we look at age
here on the other hand if we look at age
here on the other hand if we look at age we realize that everybody who’s under 20
we realize that everybody who’s under 20
we realize that everybody who’s under 20 years old downloaded pokemon gold
years old downloaded pokemon gold
years old downloaded pokemon gold whereas everybody who is 20 or older
whereas everybody who is 20 or older
whereas everybody who is 20 or older didn’t
didn’t
didn’t that’s a nice split so the feature the
that’s a nice split so the feature the
that’s a nice split so the feature the best splits the data is H therefore if
best splits the data is H therefore if
best splits the data is H therefore if you said age that was correct so we’re
you said age that was correct so we’re
you said age that was correct so we’re going to do is we’re going to add a
going to do is we’re going to add a
going to do is we’re going to add a question here the question is are you
question here the question is are you
question here the question is are you younger than 20 if yes then we’ll
younger than 20 if yes then we’ll
younger than 20 if yes then we’ll recommend Pokemon go to you if not then
recommend Pokemon go to you if not then
recommend Pokemon go to you if not then we’ll see so what happens if you’re 20
we’ll see so what happens if you’re 20
we’ll see so what happens if you’re 20 or older then we look at the gender it
or older then we look at the gender it
or older then we look at the gender it seems like here if you’re a female
seems like here if you’re a female
seems like here if you’re a female you’ve downloaded what’s up whereas if
you’ve downloaded what’s up whereas if
you’ve downloaded what’s up whereas if you’re a male you download it snapchat
you’re a male you download it snapchat
you’re a male you download it snapchat so we add another question here the
so we add another question here the
so we add another question here the question is are you female or male and
question is are you female or male and
question is are you female or male and if you’re female
if you’re female
if you’re female we recommend what’s up and if you’re
we recommend what’s up and if you’re
we recommend what’s up and if you’re male then we recommend snapchat so what
male then we recommend snapchat so what
male then we recommend snapchat so what we end up here is with a decision tree
we end up here is with a decision tree
we end up here is with a decision tree and the decisions are given by the
and the decisions are given by the
and the decisions are given by the question we asked and this decision tree
question we asked and this decision tree
question we asked and this decision tree was built with the data and now whenever
was built with the data and now whenever
was built with the data and now whenever we have any user we can put them to the
we have any user we can put them to the
we have any user we can put them to the decision tree and recommend them
decision tree and recommend them
decision tree and recommend them whatever app the tree suggests is to
whatever app the tree suggests is to
whatever app the tree suggests is to recommend for example you have a young
recommend for example you have a young
recommend for example you have a young person
person
person you recommend them Pokemon go if you
you recommend them Pokemon go if you
you recommend them Pokemon go if you have an older person you check their
have an older person you check their
have an older person you check their gender if it’s a female you recommend
gender if it’s a female you recommend
gender if it’s a female you recommend them what’s up and it’s a male you
them what’s up and it’s a male you
them what’s up and it’s a male you recommend them snapchat obviously there
recommend them snapchat obviously there
recommend them snapchat obviously there won’t always be a tree that perfectly
won’t always be a tree that perfectly
won’t always be a tree that perfectly fits our data but in this class we’re
fits our data but in this class we’re
fits our data but in this class we’re going to learn an algorithm which
going to learn an algorithm which
going to learn an algorithm which actually will help us find the best
actually will help us find the best
actually will help us find the best fitting tree to your table of data okay
fitting tree to your table of data okay
fitting tree to your table of data okay so let’s go to the next example
so let’s go to the next example
so let’s go to the next example now let’s say we’re the admissions
now let’s say we’re the admissions
now let’s say we’re the admissions office at a university and we’re trying
office at a university and we’re trying
office at a university and we’re trying to figure out which students to admit
to figure out which students to admit
to figure out which students to admit we’re going to admit them or reject them
we’re going to admit them or reject them
we’re going to admit them or reject them based on two pieces of information one
based on two pieces of information one
based on two pieces of information one is an entrance exam that we provide them
is an entrance exam that we provide them
is an entrance exam that we provide them the test and the other one is their
the test and the other one is their
the test and the other one is their grades from school so for example here
grades from school so for example here
grades from school so for example here we have student 1 with scores of 9 out
we have student 1 with scores of 9 out
we have student 1 with scores of 9 out of 10 in the test and 8 out of 10 and
of 10 in the test and 8 out of 10 and
of 10 in the test and 8 out of 10 and the grades and that student got accepted
the grades and that student got accepted
the grades and that student got accepted we also have student 2 with scores a 3
we also have student 2 with scores a 3
we also have student 2 with scores a 3 in the test and 4 in the grades and that
in the test and 4 in the grades and that
in the test and 4 in the grades and that student did not get accepted and then a
student did not get accepted and then a
student did not get accepted and then a new student comes in student 3 this
new student comes in student 3 this
new student comes in student 3 this person has a son has scores of 7 and 6
person has a son has scores of 7 and 6
person has a son has scores of 7 and 6 and the question is should we accept
and the question is should we accept
and the question is should we accept them or not
them or not
them or not so let’s first put them in a grid or the
so let’s first put them in a grid or the
so let’s first put them in a grid or the x-axis represents our score on the tests
x-axis represents our score on the tests
x-axis represents our score on the tests and the y-axis represents their grades
and the y-axis represents their grades
and the y-axis represents their grades here we can see that student 1 would lie
here we can see that student 1 would lie
here we can see that student 1 would lie over here in the point with coordinates
over here in the point with coordinates
over here in the point with coordinates 9 8 since their scores were 9 and 8 and
9 8 since their scores were 9 and 8 and
9 8 since their scores were 9 and 8 and the student 2 would lie right here in
the student 2 would lie right here in
the student 2 would lie right here in the point with coordinates 3 4 since
the point with coordinates 3 4 since
the point with coordinates 3 4 since their scores were 3 & 4 so in order to
their scores were 3 & 4 so in order to
their scores were 3 & 4 so in order to see if we should accept or reject Stu
see if we should accept or reject Stu
see if we should accept or reject Stu and 3 we should try to find it training
and 3 we should try to find it training
and 3 we should try to find it training that data so we look at the previous
that data so we look at the previous
that data so we look at the previous data in the form of all the students
data in the form of all the students
data in the form of all the students we’ve already accepted or rejected and
we’ve already accepted or rejected and
we’ve already accepted or rejected and it turns out that the previous data
it turns out that the previous data
it turns out that the previous data looks like this the green dots represent
looks like this the green dots represent
looks like this the green dots represent students that we’ve previously accepted
students that we’ve previously accepted
students that we’ve previously accepted and the red dots represent students that
and the red dots represent students that
and the red dots represent students that we’ve previously rejected so time for a
we’ve previously rejected so time for a
we’ve previously rejected so time for a quiz
quiz
quiz based on the previous data do we think
based on the previous data do we think
based on the previous data do we think student 3 gets accepted yes or no so to
student 3 gets accepted yes or no so to
student 3 gets accepted yes or no so to answer this question let’s look closely
answer this question let’s look closely
answer this question let’s look closely at the data the red and green dots seem
at the data the red and green dots seem
at the data the red and green dots seem to be nicely separated by a line here’s
to be nicely separated by a line here’s
to be nicely separated by a line here’s the line and most of the points over at
the line and most of the points over at
the line and most of the points over at are green and most of the points under
are green and most of the points under
are green and most of the points under it are red with some exceptions which
it are red with some exceptions which
it are red with some exceptions which makes sense since the students who got
makes sense since the students who got
makes sense since the students who got high scores are over the line and they
high scores are over the line and they
high scores are over the line and they got accepted in soon so what lowest
got accepted in soon so what lowest
got accepted in soon so what lowest scores are under the line and they
scores are under the line and they
scores are under the line and they didn’t get accepted so we’re going to
didn’t get accepted so we’re going to
didn’t get accepted so we’re going to say that that line is going to be our
say that that line is going to be our
say that that line is going to be our model and now every time we get a new
model and now every time we get a new
model and now every time we get a new student we check their scores and plot
student we check their scores and plot
student we check their scores and plot them in this graph and if they end up
them in this graph and if they end up
them in this graph and if they end up over the line we predict that they’ll
over the line we predict that they’ll
over the line we predict that they’ll get accepted and if they end up below
get accepted and if they end up below
get accepted and if they end up below the line we predict that they’ll get
the line we predict that they’ll get
the line we predict that they’ll get rejected so since students 3 has grades
rejected so since students 3 has grades
rejected so since students 3 has grades 7 and 6 a person will end up here at the
7 and 6 a person will end up here at the
7 and 6 a person will end up here at the point 7 6 which is over the line so we
point 7 6 which is over the line so we
point 7 6 which is over the line so we conclude that this students gets
conclude that this students gets
conclude that this students gets accepted so if you said yes that’s a
accepted so if you said yes that’s a
accepted so if you said yes that’s a correct answer this method is known as
correct answer this method is known as
correct answer this method is known as logistic regression another question is
logistic regression another question is
logistic regression another question is how do you find this line that best cuts
how do you find this line that best cuts
how do you find this line that best cuts the data and – so let’s look at a simple
the data and – so let’s look at a simple
the data and – so let’s look at a simple example is 6 points 3 Green 3 red and
example is 6 points 3 Green 3 red and
example is 6 points 3 Green 3 red and we’re going to try to draw a line that
we’re going to try to draw a line that
we’re going to try to draw a line that best separates the green points from the
best separates the green points from the
best separates the green points from the red points and again a computer can’t
red points and again a computer can’t
red points and again a computer can’t really eyeball the line so you can just
really eyeball the line so you can just
really eyeball the line so you can just start by drawing a random line like this
start by drawing a random line like this
start by drawing a random line like this one and given this line let’s just
one and given this line let’s just
one and given this line let’s just randomly say that we label the region
randomly say that we label the region
randomly say that we label the region over the line is green and the region
over the line is green and the region
over the line is green and the region under line is red so just like with
under line is red so just like with
under line is red so just like with linear regression we’re going to try to
linear regression we’re going to try to
linear regression we’re going to try to see how bad this first line is and the
see how bad this first line is and the
see how bad this first line is and the measure of how bad the line is would be
measure of how bad the line is would be
measure of how bad the line is would be how many points are we miss classifying
how many points are we miss classifying
how many points are we miss classifying we’re going to call that number
we’re going to call that number
we’re going to call that number misclassified points the error this line
misclassified points the error this line
misclassified points the error this line for example misclassified two points one
for example misclassified two points one
for example misclassified two points one red and one green so we’ll say that it
red and one green so we’ll say that it
red and one green so we’ll say that it has two errors so again like with linear
has two errors so again like with linear
has two errors so again like with linear regression what we’ll do is move the
regression what we’ll do is move the
regression what we’ll do is move the line around
line around
line around and try to minimize the number of errors
and try to minimize the number of errors
and try to minimize the number of errors using gradient descent so I’ve removed
using gradient descent so I’ve removed
using gradient descent so I’ve removed the line a bit in this direction we can
the line a bit in this direction we can
the line a bit in this direction we can see that we start correctly classifying
see that we start correctly classifying
see that we start correctly classifying one of the points bringing down the
one of the points bringing down the
one of the points bringing down the number of errors to one and if we move
number of errors to one and if we move
number of errors to one and if we move it a little more correctly classify the
it a little more correctly classify the
it a little more correctly classify the other one of the points bringing down
other one of the points bringing down
other one of the points bringing down the number of errors to zero in reality
the number of errors to zero in reality
the number of errors to zero in reality since we use calculus for a gradient
since we use calculus for a gradient
since we use calculus for a gradient descent method it turns out that the
descent method it turns out that the
descent method it turns out that the number of errors is not what we need to
number of errors is not what we need to
number of errors is not what we need to minimize but instead something that
minimize but instead something that
minimize but instead something that captures the number of errors called the
captures the number of errors called the
captures the number of errors called the log loss function and the idea behind
log loss function and the idea behind
log loss function and the idea behind the log loss function is that it’s a
the log loss function is that it’s a
the log loss function is that it’s a function which assigns a large value to
function which assigns a large value to
function which assigns a large value to the misclassified points and a small
the misclassified points and a small
the misclassified points and a small value to the classified points ok so
value to the classified points ok so
value to the classified points ok so let’s look more carefully at this model
let’s look more carefully at this model
let’s look more carefully at this model for accepting or rejecting students
for accepting or rejecting students
for accepting or rejecting students let’s say we have a student for who got
let’s say we have a student for who got
let’s say we have a student for who got nine in the test and one on the grades
nine in the test and one on the grades
nine in the test and one on the grades so the student gets accepted according
so the student gets accepted according
so the student gets accepted according to our model since they are over here on
to our model since they are over here on
to our model since they are over here on top of the line but that seems wrong
top of the line but that seems wrong
top of the line but that seems wrong since I student got very low grades you
since I student got very low grades you
since I student got very low grades you can get accepted no matter what their
can get accepted no matter what their
can get accepted no matter what their test score was so maybe it’s simplistic
test score was so maybe it’s simplistic
test score was so maybe it’s simplistic to think this data can be separated by
to think this data can be separated by
to think this data can be separated by just one line right maybe the real data
just one line right maybe the real data
just one line right maybe the real data should look more like this where these
should look more like this where these
should look more like this where these students over here
students over here
students over here we’ve got a load test score or low
we’ve got a load test score or low
we’ve got a load test score or low grades don’t get accepted so now it
grades don’t get accepted so now it
grades don’t get accepted so now it seems like a line won’t cut the data
seems like a line won’t cut the data
seems like a line won’t cut the data into so what’s the next thing after a
into so what’s the next thing after a
into so what’s the next thing after a line maybe a circle circle could work
line maybe a circle circle could work
line maybe a circle circle could work maybe two lines that could work too
maybe two lines that could work too
maybe two lines that could work too actually it looks like that works better
actually it looks like that works better
actually it looks like that works better so let’s go with that now the question
so let’s go with that now the question
so let’s go with that now the question is how do we find these two lines again
is how do we find these two lines again
is how do we find these two lines again we can do it using gradient descent to
we can do it using gradient descent to
we can do it using gradient descent to minimize a similar log loss function at
minimize a similar log loss function at
minimize a similar log loss function at the for this is called a neural network
the for this is called a neural network
the for this is called a neural network now why is it called a neural network
now why is it called a neural network
now why is it called a neural network well let’s see we have this green area
well let’s see we have this green area
well let’s see we have this green area here by and about two lines this area
here by and about two lines this area
here by and about two lines this area can be constructed as an intersection
can be constructed as an intersection
can be constructed as an intersection namely the intersection between the
namely the intersection between the
namely the intersection between the green area on top of one lines and the
green area on top of one lines and the
green area on top of one lines and the green area to the right of the other one
green area to the right of the other one
green area to the right of the other one of the lines so
of the lines so
of the lines so we’re going to graph it like this we
we’re going to graph it like this we
we’re going to graph it like this we have two nodes each node is a line that
have two nodes each node is a line that
have two nodes each node is a line that separates the plane into two regions and
separates the plane into two regions and
separates the plane into two regions and from the two nodes we get the
from the two nodes we get the
from the two nodes we get the intersection which is the desired area
intersection which is the desired area
intersection which is the desired area the reason why this is called the neural
the reason why this is called the neural
the reason why this is called the neural network is because this mimics the
network is because this mimics the
network is because this mimics the behavior the brain in the brain we have
behavior the brain in the brain we have
behavior the brain in the brain we have the neurons which connect to each other
the neurons which connect to each other
the neurons which connect to each other and they either fire electricity or not
and they either fire electricity or not
and they either fire electricity or not they resemble the nodes in our graph
they resemble the nodes in our graph
they resemble the nodes in our graph which split the plane into regions and
which split the plane into regions and
which split the plane into regions and fire electricity for given point belongs
fire electricity for given point belongs
fire electricity for given point belongs to one of those regions and won’t fire
to one of those regions and won’t fire
to one of those regions and won’t fire if it doesn’t so we can’t explain your
if it doesn’t so we can’t explain your
if it doesn’t so we can’t explain your aggression as a ninja we’ll look at your
aggression as a ninja we’ll look at your
aggression as a ninja we’ll look at your data and cut it in half based on the
data and cut it in half based on the
data and cut it in half based on the labels and we can think of a neural
labels and we can think of a neural
labels and we can think of a neural network as a team of ninjas who will
network as a team of ninjas who will
network as a team of ninjas who will look at your data and cut it into
look at your data and cut it into
look at your data and cut it into regions based on the labels okay
regions based on the labels okay
regions based on the labels okay so let’s dive a bit deeper into the art
so let’s dive a bit deeper into the art
so let’s dive a bit deeper into the art of splitting data into two we can look
of splitting data into two we can look
of splitting data into two we can look at this points three green and three red
at this points three green and three red
at this points three green and three red and there seem to be many lines that can
and there seem to be many lines that can
and there seem to be many lines that can split them for example there is this
split them for example there is this
split them for example there is this yellow line and there is this purple
yellow line and there is this purple
yellow line and there is this purple line so quiz which of these two lines do
line so quiz which of these two lines do
line so quiz which of these two lines do athing cuts the data better the purple
athing cuts the data better the purple
athing cuts the data better the purple or the yellow one well if we look at the
or the yellow one well if we look at the
or the yellow one well if we look at the yellow line it seems that it’s close to
yellow line it seems that it’s close to
yellow line it seems that it’s close to failing it’s too close to two of the
failing it’s too close to two of the
failing it’s too close to two of the points so if we were to wiggle it a
points so if we were to wiggle it a
points so if we were to wiggle it a little bit we would miss classify some
little bit we would miss classify some
little bit we would miss classify some of the points the purple one on the
of the points the purple one on the
of the points the purple one on the other hand seems to be nicely spaced and
other hand seems to be nicely spaced and
other hand seems to be nicely spaced and as far as we can from all the points so
as far as we can from all the points so
as far as we can from all the points so it seems like the best line is a purple
it seems like the best line is a purple
it seems like the best line is a purple one now the question is how do we find
one now the question is how do we find
one now the question is how do we find the purple line well the first
the purple line well the first
the purple line well the first observation is that we don’t really need
observation is that we don’t really need
observation is that we don’t really need to worry about these points because
to worry about these points because
to worry about these points because they’re too far from the boundary so we
they’re too far from the boundary so we
they’re too far from the boundary so we can forget about them and only focus on
can forget about them and only focus on
can forget about them and only focus on the points that are close and now what
the points that are close and now what
the points that are close and now what we’re going to use is not gradient
we’re going to use is not gradient
we’re going to use is not gradient descent but we’re going to use linear
descent but we’re going to use linear
descent but we’re going to use linear optimization to find the line that
optimization to find the line that
optimization to find the line that maximizes the distance from the boundary
maximizes the distance from the boundary
maximizes the distance from the boundary points this method is called a support
points this method is called a support
points this method is called a support vector machine
so you can think of support vector
so you can think of support vector machines that surgeon will see your data
machines that surgeon will see your data
machines that surgeon will see your data and cut it but before she will carefully
and cut it but before she will carefully
and cut it but before she will carefully look at what’s the best way to separate
look at what’s the best way to separate
look at what’s the best way to separate the data into and then make the cut okay
the data into and then make the cut okay
the data into and then make the cut okay so now let’s say we have these four
so now let’s say we have these four
so now let’s say we have these four points arranged like this and we want to
points arranged like this and we want to
points arranged like this and we want to split them it seems like a line won’t do
split them it seems like a line won’t do
split them it seems like a line won’t do the job since they’re already over the
the job since they’re already over the
the job since they’re already over the line and the red ones are on the sides
line and the red ones are on the sides
line and the red ones are on the sides and the green ones are in the middle so
and the green ones are in the middle so
and the green ones are in the middle so we need to think outside the box one way
we need to think outside the box one way
we need to think outside the box one way to think outside the box is to use a
to think outside the box is to use a
to think outside the box is to use a curve like this to split them another
curve like this to split them another
curve like this to split them another one is to actually think outside the
one is to actually think outside the
one is to actually think outside the plain and to think of the points is
plain and to think of the points is
plain and to think of the points is lying in a three-dimensional space so
lying in a three-dimensional space so
lying in a three-dimensional space so here are the points over the plane and
here are the points over the plane and
here are the points over the plane and here we add an extra axis the z axis for
here we add an extra axis the z axis for
here we add an extra axis the z axis for the third dimension and if we can find a
the third dimension and if we can find a
the third dimension and if we can find a way to lift it to green points then we’d
way to lift it to green points then we’d
way to lift it to green points then we’d be able to separate them with a plane so
be able to separate them with a plane so
be able to separate them with a plane so what seems like a better solution the
what seems like a better solution the
what seems like a better solution the curve over here or the plane over here
curve over here or the plane over here
curve over here or the plane over here well it turns out that these two are
well it turns out that these two are
well it turns out that these two are actually the same method don’t worry if
actually the same method don’t worry if
actually the same method don’t worry if it seems confusing we’ll get into a
it seems confusing we’ll get into a
it seems confusing we’ll get into a little bit more detail later this method
little bit more detail later this method
little bit more detail later this method is called the kernel trick as very well
is called the kernel trick as very well
is called the kernel trick as very well used in support vector machines so let’s
used in support vector machines so let’s
used in support vector machines so let’s study one of them in more detail let’s
study one of them in more detail let’s
study one of them in more detail let’s start with the curve trick so let’s
start with the curve trick so let’s
start with the curve trick so let’s start by putting coordinates on the
start by putting coordinates on the
start by putting coordinates on the points this one is the point zero three
points this one is the point zero three
points this one is the point zero three this one is 1 2 this one is 2 1 and this
this one is 1 2 this one is 2 1 and this
this one is 1 2 this one is 2 1 and this one is 3 0 and what we need is a way to
one is 3 0 and what we need is a way to
one is 3 0 and what we need is a way to separate the green points from the red
separate the green points from the red
separate the green points from the red points so the points coordinates are X Y
points so the points coordinates are X Y
points so the points coordinates are X Y then we need an equation on the
then we need an equation on the
then we need an equation on the variables x and y that gives us large
variables x and y that gives us large
variables x and y that gives us large values for the green points and small
values for the green points and small
values for the green points and small values for the red points or vice versa
values for the red points or vice versa
values for the red points or vice versa so quiz which of the following equations
so quiz which of the following equations
so quiz which of the following equations could come to our rescue
could come to our rescue
could come to our rescue X plus y the product x times y
X plus y the product x times y
X plus y the product x times y or x squared the first coordinates
or x squared the first coordinates
or x squared the first coordinates squared this is a not an easy question
squared this is a not an easy question
squared this is a not an easy question so let’s actually make a table with the
so let’s actually make a table with the
so let’s actually make a table with the values of these equations on each of the
values of these equations on each of the
values of these equations on each of the four points so here’s our table here we
four points so here’s our table here we
four points so here’s our table here we have the four points on the top row and
have the four points on the top row and
have the four points on the top row and now each of the other rows will be one
now each of the other rows will be one
now each of the other rows will be one of the functions so here’s the sum X
of the functions so here’s the sum X
of the functions so here’s the sum X plus y we fill in the first row the
plus y we fill in the first row the
plus y we fill in the first row the following way 0 plus 3 is 3 1 plus 2 is
following way 0 plus 3 is 3 1 plus 2 is
following way 0 plus 3 is 3 1 plus 2 is 3 2 plus 1 3 3 plus 0 3 now for the
3 2 plus 1 3 3 plus 0 3 now for the
3 2 plus 1 3 3 plus 0 3 now for the second row we’re going to get the
second row we’re going to get the
second row we’re going to get the products 0 times 3 is 0 1 times 2 is 2 2
products 0 times 3 is 0 1 times 2 is 2 2
products 0 times 3 is 0 1 times 2 is 2 2 times 1 is 2 and 3 times 0 is 0 and for
times 1 is 2 and 3 times 0 is 0 and for
times 1 is 2 and 3 times 0 is 0 and for the third row x squared is the first
the third row x squared is the first
the third row x squared is the first coordinate squared so 0 squared is 0 1
coordinate squared so 0 squared is 0 1
coordinate squared so 0 squared is 0 1 squared is 1 2 squared is 4 and 3
squared is 1 2 squared is 4 and 3
squared is 1 2 squared is 4 and 3 squared is 9 so let’s think which one of
squared is 9 so let’s think which one of
squared is 9 so let’s think which one of these equations separates the green and
these equations separates the green and
these equations separates the green and the red points we look at the sum X plus
the red points we look at the sum X plus
the red points we look at the sum X plus y and that gives us 3 at every value so
y and that gives us 3 at every value so
y and that gives us 3 at every value so it doesn’t really separate the points we
it doesn’t really separate the points we
it doesn’t really separate the points we can look at x squared and that gives us
can look at x squared and that gives us
can look at x squared and that gives us different values for every point but we
different values for every point but we
different values for every point but we get 0 & 9 for the red values and 1 & 4
get 0 & 9 for the red values and 1 & 4
get 0 & 9 for the red values and 1 & 4 for the green ones so this one also
for the green ones so this one also
for the green ones so this one also don’t doesn’t separate them but now we
don’t doesn’t separate them but now we
don’t doesn’t separate them but now we look at the product x times y and that
look at the product x times y and that
look at the product x times y and that gives us 0 for the red values and 2 for
gives us 0 for the red values and 2 for
gives us 0 for the red values and 2 for the green ones so that one seems to do
the green ones so that one seems to do
the green ones so that one seems to do the job right it’s a function that can
the job right it’s a function that can
the job right it’s a function that can tell them apart so that’s the equation
tell them apart so that’s the equation
tell them apart so that’s the equation we’re going to use you can see their
we’re going to use you can see their
we’re going to use you can see their products here and now for the red points
products here and now for the red points
products here and now for the red points X comma Y we have that the product X y
X comma Y we have that the product X y
X comma Y we have that the product X y equals 0 and for the green points we
equals 0 and for the green points we
equals 0 and for the green points we have that the product X y equals 2 and
have that the product X y equals 2 and
have that the product X y equals 2 and what separates a 0 and a 2 well a 1 so
what separates a 0 and a 2 well a 1 so
what separates a 0 and a 2 well a 1 so the equation x y equals 1 will separate
the equation x y equals 1 will separate
the equation x y equals 1 will separate them
them
them and what is XY equals one it’s the same
and what is XY equals one it’s the same
and what is XY equals one it’s the same as y equals one over X and the graph for
as y equals one over X and the graph for
as y equals one over X and the graph for y was 1 over X is precisely this
y was 1 over X is precisely this
y was 1 over X is precisely this hyperbola over here that is the curve we
hyperbola over here that is the curve we
hyperbola over here that is the curve we want it so that is the kernel trick now
want it so that is the kernel trick now
want it so that is the kernel trick now we can also see it in 3d here we have
we can also see it in 3d here we have
we can also see it in 3d here we have the point 0 3 1 2 2 1 and 3 0 and we’re
the point 0 3 1 2 2 1 and 3 0 and we’re
the point 0 3 1 2 2 1 and 3 0 and we’re going to consider them in 3 space so
going to consider them in 3 space so
going to consider them in 3 space so we’re going to take the map that takes
we’re going to take the map that takes
we’re going to take the map that takes the point X comma Y 2 X comma Y comma X
the point X comma Y 2 X comma Y comma X
the point X comma Y 2 X comma Y comma X times y so where does 0 3 go 0 3 goes to
times y so where does 0 3 go 0 3 goes to
times y so where does 0 3 go 0 3 goes to 0 comma 3 comma 0 since the product of 0
0 comma 3 comma 0 since the product of 0
0 comma 3 comma 0 since the product of 0 & 3 is 0 1 2 goes to 1 comma 2 comma 2
& 3 is 0 1 2 goes to 1 comma 2 comma 2
& 3 is 0 1 2 goes to 1 comma 2 comma 2 so it goes all the way up since the
so it goes all the way up since the
so it goes all the way up since the third coordinate is the height the point
third coordinate is the height the point
third coordinate is the height the point 2 1 also goes to 2 comma 1 comma 2 and
2 1 also goes to 2 comma 1 comma 2 and
2 1 also goes to 2 comma 1 comma 2 and the point 3 0 goes to 3 comma 0 comma 0
the point 3 0 goes to 3 comma 0 comma 0
the point 3 0 goes to 3 comma 0 comma 0 so there we go we can split them using a
so there we go we can split them using a
so there we go we can split them using a plane so you can think of a support
plane so you can think of a support
plane so you can think of a support vector machine a kernel method as a
vector machine a kernel method as a
vector machine a kernel method as a surgeon who is a slightly confused
surgeon who is a slightly confused
surgeon who is a slightly confused trying to split some apples and oranges
trying to split some apples and oranges
trying to split some apples and oranges all of a sudden she comes up with a
all of a sudden she comes up with a
all of a sudden she comes up with a great idea the idea consists of moving
great idea the idea consists of moving
great idea the idea consists of moving the apples up and the oranges down and
the apples up and the oranges down and
the apples up and the oranges down and then successfully cutting a line through
then successfully cutting a line through
then successfully cutting a line through between them ok so let’s move the next
between them ok so let’s move the next
between them ok so let’s move the next example let’s say we have a chain of
example let’s say we have a chain of
example let’s say we have a chain of pizza parlors and we want to put 3 of
pizza parlors and we want to put 3 of
pizza parlors and we want to put 3 of them in this city so we make a study and
them in this city so we make a study and
them in this city so we make a study and realize that the people who eat piece of
realize that the people who eat piece of
realize that the people who eat piece of the most live in these locations and so
the most live in these locations and so
the most live in these locations and so we need to know where are the optimal
we need to know where are the optimal
we need to know where are the optimal places to put our 3 pizza parlors well
places to put our 3 pizza parlors well
places to put our 3 pizza parlors well it seems like the houses are nicely
it seems like the houses are nicely
it seems like the houses are nicely split into three groups the red the blue
split into three groups the red the blue
split into three groups the red the blue and the yellow so it makes sense to put
and the yellow so it makes sense to put
and the yellow so it makes sense to put one pizza parlor in each one of the
one pizza parlor in each one of the
one pizza parlor in each one of the three clusters but we’re teaching a
three clusters but we’re teaching a
three clusters but we’re teaching a computer how to do this a computer can
computer how to do this a computer can
computer how to do this a computer can just eyeball the three clusters we need
just eyeball the three clusters we need
just eyeball the three clusters we need an algorithm so here’s one algorithm
an algorithm so here’s one algorithm
an algorithm so here’s one algorithm that’ll work
that’ll work
that’ll work let’s start by choosing three random
let’s start by choosing three random
let’s start by choosing three random locations for the pizza parlors so
locations for the pizza parlors so
locations for the pizza parlors so they’re here where the stars are located
they’re here where the stars are located
they’re here where the stars are located red blue and yellow now it makes sense
red blue and yellow now it makes sense
red blue and yellow now it makes sense to say each house should go to the pizza
to say each house should go to the pizza
to say each house should go to the pizza parlor that is closest to it in that
parlor that is closest to it in that
parlor that is closest to it in that case we can look at the map like this
case we can look at the map like this
case we can look at the map like this where the yellow houses go to the yellow
where the yellow houses go to the yellow
where the yellow houses go to the yellow pizza parlor the blue houses go to the
pizza parlor the blue houses go to the
pizza parlor the blue houses go to the blue pizza parlor and the red houses go
blue pizza parlor and the red houses go
blue pizza parlor and the red houses go to the red pizza parlor but now look at
to the red pizza parlor but now look at
to the red pizza parlor but now look at where the yellow houses are located you
where the yellow houses are located you
where the yellow houses are located you would make a lot of sense to move the
would make a lot of sense to move the
would make a lot of sense to move the yellow pizza parlor to the center of
yellow pizza parlor to the center of
yellow pizza parlor to the center of these houses same thing with the blue
these houses same thing with the blue
these houses same thing with the blue houses and the red houses so let’s do
houses and the red houses so let’s do
houses and the red houses so let’s do that let’s move every pizza parlor to
that let’s move every pizza parlor to
that let’s move every pizza parlor to the center of the houses that it serves
the center of the houses that it serves
the center of the houses that it serves as follows but now look at these blue
as follows but now look at these blue
as follows but now look at these blue points there are a lot closer to the
points there are a lot closer to the
points there are a lot closer to the yellow pizza parlor than to the blue one
yellow pizza parlor than to the blue one
yellow pizza parlor than to the blue one so we might as well color them yellow
so we might as well color them yellow
so we might as well color them yellow and look at these red points they’re
and look at these red points they’re
and look at these red points they’re closer to the blue bits of color then to
closer to the blue bits of color then to
closer to the blue bits of color then to the red so let’s color them blue and now
the red so let’s color them blue and now
the red so let’s color them blue and now let’s do the step again that send each
let’s do the step again that send each
let’s do the step again that send each pizza parlor to the center of this
pizza parlor to the center of this
pizza parlor to the center of this houses that is serving in this way but
houses that is serving in this way but
houses that is serving in this way but then again look at this red house is
then again look at this red house is
then again look at this red house is there so much closer to the blue pizza
there so much closer to the blue pizza
there so much closer to the blue pizza parlor so let’s turn them blue and then
parlor so let’s turn them blue and then
parlor so let’s turn them blue and then again let’s move every pizza parlor to
again let’s move every pizza parlor to
again let’s move every pizza parlor to the center of the house as it serves and
the center of the house as it serves and
the center of the house as it serves and now we’ve reached an optimal solution so
now we’ve reached an optimal solution so
now we’ve reached an optimal solution so starting with random points and
starting with random points and
starting with random points and iterating this process helped us reach
iterating this process helped us reach
iterating this process helped us reach the best locations for the pizza parlors
the best locations for the pizza parlors
the best locations for the pizza parlors this algorithm is called k-means
this algorithm is called k-means
this algorithm is called k-means clustering but now let’s just say we
clustering but now let’s just say we
clustering but now let’s just say we don’t want to specify the number of
don’t want to specify the number of
don’t want to specify the number of clusters to begin with it’s just a
clusters to begin with it’s just a
clusters to begin with it’s just a different way to group the houses so say
different way to group the houses so say
different way to group the houses so say they’re arranged like this it would make
they’re arranged like this it would make
they’re arranged like this it would make sense to say the following if two houses
sense to say the following if two houses
sense to say the following if two houses are close they should be served by the
are close they should be served by the
are close they should be served by the same pizza parlor so if we go by this
same pizza parlor so if we go by this
same pizza parlor so if we go by this rule let’s try to group the house
rule let’s try to group the house
rule let’s try to group the house let’s look at which houses are the
let’s look at which houses are the
let’s look at which houses are the closest to each other it’s these two
closest to each other it’s these two
closest to each other it’s these two over here
over here
over here so we grouped them now what are the next
so we grouped them now what are the next
so we grouped them now what are the next two closest houses it’s these two over
two closest houses it’s these two over
two closest houses it’s these two over here
here
here so we grouped them the next two closest
so we grouped them the next two closest
so we grouped them the next two closest houses are these two so again we grouped
houses are these two so again we grouped
houses are these two so again we grouped them the next two closest outside is two
them the next two closest outside is two
them the next two closest outside is two so we unite the groups now the next two
so we unite the groups now the next two
so we unite the groups now the next two house right here so we grouped them the
house right here so we grouped them the
house right here so we grouped them the next two clusters are here so we join
next two clusters are here so we join
next two clusters are here so we join the groups the next two closest houses
the groups the next two closest houses
the groups the next two closest houses are here but now let’s just say that’s
are here but now let’s just say that’s
are here but now let’s just say that’s too big so all we need to do is specify
too big so all we need to do is specify
too big so all we need to do is specify a distance and say this distance is too
a distance and say this distance is too
a distance and say this distance is too far when you reach this distance stop
far when you reach this distance stop
far when you reach this distance stop clustering the houses and now we get our
clustering the houses and now we get our
clustering the houses and now we get our clusters this algorithm is called here
clusters this algorithm is called here
clusters this algorithm is called here article clustering so congratulations
article clustering so congratulations
article clustering so congratulations in this video we’ve learned many of the
in this video we’ve learned many of the
in this video we’ve learned many of the main algorithms of machine learning we
main algorithms of machine learning we
main algorithms of machine learning we learn to find you house prices using
learn to find you house prices using
learn to find you house prices using linear regression we learn to detect
linear regression we learn to detect
linear regression we learn to detect spam email using naive Bayes we learn to
spam email using naive Bayes we learn to
spam email using naive Bayes we learn to recommend apps using decision trees we
recommend apps using decision trees we
recommend apps using decision trees we learn to create a model for an
learn to create a model for an
learn to create a model for an admissions office using logistic
admissions office using logistic
admissions office using logistic regression we learn how to improve them
regression we learn how to improve them
regression we learn how to improve them using neural networks and we learn how
using neural networks and we learn how
using neural networks and we learn how to improve it even more using support
to improve it even more using support
to improve it even more using support vector machines and finally we learn how
vector machines and finally we learn how
vector machines and finally we learn how to locate pizza parlors around the city
to locate pizza parlors around the city
to locate pizza parlors around the city using clustering algorithms so many
using clustering algorithms so many
using clustering algorithms so many questions may arise in your head such as
questions may arise in your head such as
questions may arise in your head such as are there more algorithms the answer is
are there more algorithms the answer is
are there more algorithms the answer is yes which ones to use that’s not easy
yes which ones to use that’s not easy
yes which ones to use that’s not easy given a data set how do we know which
given a data set how do we know which
given a data set how do we know which algorithm to pick how to compare them
algorithm to pick how to compare them
algorithm to pick how to compare them and evaluate them into algorithms how do
and evaluate them into algorithms how do
and evaluate them into algorithms how do you know which one is better than
you know which one is better than
you know which one is better than another one data set given the running
another one data set given the running
another one data set given the running time their accuracy etc are there
time their accuracy etc are there
time their accuracy etc are there examples other projects are the real up
examples other projects are the real up
examples other projects are the real up that data that I can get my hands dirty
that data that I can get my hands dirty
that data that I can get my hands dirty with them the answer to all these
with them the answer to all these
with them the answer to all these questions are more or in the Udacity
questions are more or in the Udacity
questions are more or in the Udacity machine learning nanodegree so if this
machine learning nanodegree so if this
machine learning nanodegree so if this interests you you should take a look at
interests you you should take a look at
interests you you should take a look at it thank you
Be First to Comment