[Music]
[Music] [Applause]
so hi everyone as you heard my name is
so hi everyone as you heard my name is michael green and i’m indeed here to
michael green and i’m indeed here to
michael green and i’m indeed here to tell you about a different approach to
tell you about a different approach to
tell you about a different approach to to building algorithms and building
to building algorithms and building
to building algorithms and building machine learning methods really I’m also
machine learning methods really I’m also
machine learning methods really I’m also going to argue that they are
going to argue that they are
going to argue that they are fundamentally the same thing and you’ll
fundamentally the same thing and you’ll
fundamentally the same thing and you’ll see that a little bit later in my talk
see that a little bit later in my talk
see that a little bit later in my talk but that’s let’s get cracking basically
but that’s let’s get cracking basically
but that’s let’s get cracking basically I will I’ll talk about the overview of
I will I’ll talk about the overview of
I will I’ll talk about the overview of AI and machine learning and I’m not the
AI and machine learning and I’m not the
AI and machine learning and I’m not the first one to do this and there are lots
first one to do this and there are lots
first one to do this and there are lots of people who who have their take on it
of people who who have their take on it
of people who who have their take on it but this will be my take I’ll also try
but this will be my take I’ll also try
but this will be my take I’ll also try to extend to you the idea and concept of
to extend to you the idea and concept of
to extend to you the idea and concept of why this is not enough we are very good
why this is not enough we are very good
why this is not enough we are very good at telling ourselves that we have come
at telling ourselves that we have come
at telling ourselves that we have come really far in AI and I would actually
really far in AI and I would actually
really far in AI and I would actually tend to disagree with that I think we’re
tend to disagree with that I think we’re
tend to disagree with that I think we’re we’re playing around in the pedaling
we’re playing around in the pedaling
we’re playing around in the pedaling pool and it’s simply not good enough we
pool and it’s simply not good enough we
pool and it’s simply not good enough we need to innovate this area we need to be
need to innovate this area we need to be
need to innovate this area we need to be better I will also talk about how
better I will also talk about how
better I will also talk about how perception versus inference can work in
perception versus inference can work in
perception versus inference can work in a computer I will make a short note
a computer I will make a short note
a computer I will make a short note about our patient brains because that’s
about our patient brains because that’s
about our patient brains because that’s fundamentally how how we reason as
fundamentally how how we reason as
fundamentally how how we reason as people at least from macroscopic
people at least from macroscopic
people at least from macroscopic perspective I’ll also talk a little bit
perspective I’ll also talk a little bit
perspective I’ll also talk a little bit about probabilistic programming and why
about probabilistic programming and why
about probabilistic programming and why I see that as a very key point to to
I see that as a very key point to to
I see that as a very key point to to marrying two very different field or
marrying two very different field or
marrying two very different field or differentiated field today and in the
differentiated field today and in the
differentiated field today and in the end I’ll tie all of it together so that
end I’ll tie all of it together so that
end I’ll tie all of it together so that you can see how you can actually
you can see how you can actually
you can see how you can actually practically deploy a solution like this
but basically if we just go back to
but basically if we just go back to basic so I know a lot of different
basic so I know a lot of different
basic so I know a lot of different definitions of artificial intelligence
definitions of artificial intelligence
definitions of artificial intelligence there there are a lot of them out there
there there are a lot of them out there
there there are a lot of them out there and none of them says the ability to
and none of them says the ability to
and none of them says the ability to drive a car while not crashing that’s
drive a car while not crashing that’s
drive a car while not crashing that’s simply not artificial intelligence that
simply not artificial intelligence that
simply not artificial intelligence that is that is something that solves a
is that is something that solves a
is that is something that solves a domain-specific problem that is
domain-specific problem that is
domain-specific problem that is challenging yes but it’s not AI neither
challenging yes but it’s not AI neither
challenging yes but it’s not AI neither is diagnosing a health disease in in a
is diagnosing a health disease in in a
is diagnosing a health disease in in a page
page
page that comes into the ER that’s also not a
that comes into the ER that’s also not a
that comes into the ER that’s also not a I neither is actually well what I do in
I neither is actually well what I do in
I neither is actually well what I do in my company that’s also not AI all of
my company that’s also not AI all of
my company that’s also not AI all of those are examples of narrow AI where we
those are examples of narrow AI where we
those are examples of narrow AI where we try to use machines to do more clever
try to use machines to do more clever
try to use machines to do more clever things than an individual person could
things than an individual person could
things than an individual person could do at the same task but my definition of
do at the same task but my definition of
do at the same task but my definition of AI is is basically that it’s sort of the
AI is is basically that it’s sort of the
AI is is basically that it’s sort of the behavior as shown by an agent that you
behavior as shown by an agent that you
behavior as shown by an agent that you stuff into an environment and that
stuff into an environment and that
stuff into an environment and that behavior in itself seems to optimize the
behavior in itself seems to optimize the
behavior in itself seems to optimize the concept of future freedom now that is
concept of future freedom now that is
concept of future freedom now that is the closest definition to to artificial
the closest definition to to artificial
the closest definition to to artificial intelligence that I that I can come to
intelligence that I that I can come to
intelligence that I that I can come to because that doesn’t say anything you
because that doesn’t say anything you
because that doesn’t say anything you know yeah optimize the least square
know yeah optimize the least square
know yeah optimize the least square error do black back propagation to to
error do black back propagation to to
error do black back propagation to to make sure that the croissant repairer
make sure that the croissant repairer
make sure that the croissant repairer looks good all of those things are
looks good all of those things are
looks good all of those things are man-made and I assure you our brains do
man-made and I assure you our brains do
man-made and I assure you our brains do not do Brack propagation it’s simply not
not do Brack propagation it’s simply not
not do Brack propagation it’s simply not true
true
true no one is telling our children how to
no one is telling our children how to
no one is telling our children how to stand up they’re not getting smacked on
stand up they’re not getting smacked on
stand up they’re not getting smacked on the hands for failing my son he failed
the hands for failing my son he failed
the hands for failing my son he failed several times this morning but he
several times this morning but he
several times this morning but he actually succeeded when I left the room
actually succeeded when I left the room
actually succeeded when I left the room so without my encouragement he actually
so without my encouragement he actually
so without my encouragement he actually did better that might say something
did better that might say something
did better that might say something about my pedagogical skills or the fact
about my pedagogical skills or the fact
about my pedagogical skills or the fact that it doesn’t need my training to do
that it doesn’t need my training to do
that it doesn’t need my training to do these things so there’s a fundamental
these things so there’s a fundamental
these things so there’s a fundamental thing that’s missing there’s a missing
thing that’s missing there’s a missing
thing that’s missing there’s a missing piece in our understanding of how
piece in our understanding of how
piece in our understanding of how knowledge is represented accumulated and
knowledge is represented accumulated and
knowledge is represented accumulated and acted upon and that is what fascinates
acted upon and that is what fascinates
acted upon and that is what fascinates me more than anything I’m sure you’ve
me more than anything I’m sure you’ve
me more than anything I’m sure you’ve seen this before it’s just a definition
seen this before it’s just a definition
seen this before it’s just a definition of what AI is today so there’s a lot of
of what AI is today so there’s a lot of
of what AI is today so there’s a lot of things but but basically we are in the
things but but basically we are in the
things but but basically we are in the top level there every single application
top level there every single application
top level there every single application you have ever seen heard of today is in
you have ever seen heard of today is in
you have ever seen heard of today is in this field artificial narrow
this field artificial narrow
this field artificial narrow intelligence there is no such thing as
intelligence there is no such thing as
intelligence there is no such thing as artificial general intelligence it
artificial general intelligence it
artificial general intelligence it doesn’t exist today and if someone says
doesn’t exist today and if someone says
doesn’t exist today and if someone says they have it they’re lying because we
they have it they’re lying because we
they have it they’re lying because we don’t have the representation of how to
don’t have the representation of how to
don’t have the representation of how to capture knowledge no one has that you
capture knowledge no one has that you
capture knowledge no one has that you simply cannot express this in Python or
simply cannot express this in Python or
simply cannot express this in Python or R or whatever language you want it
R or whatever language you want it
R or whatever language you want it doesn’t exist we need to figure out how
doesn’t exist we need to figure out how
doesn’t exist we need to figure out how to represent this
to represent this
to represent this so artificial general intelligence that
so artificial general intelligence that
so artificial general intelligence that is really the task of saying how could
is really the task of saying how could
is really the task of saying how could we actually take an AI that knows how to
we actually take an AI that knows how to
we actually take an AI that knows how to drive a car stuff that into a different
drive a car stuff that into a different
drive a car stuff that into a different environment and make it utilize the
environment and make it utilize the
environment and make it utilize the skills that they had learning how to
skills that they had learning how to
skills that they had learning how to drive the car and apply that to a
drive the car and apply that to a
drive the car and apply that to a completely different field that is the
completely different field that is the
completely different field that is the main transfer and that is something that
main transfer and that is something that
main transfer and that is something that no AI can do today
no AI can do today
no AI can do today now artificial superintelligence and the
now artificial superintelligence and the
now artificial superintelligence and the only reason I’m mentioning this is
only reason I’m mentioning this is
only reason I’m mentioning this is because it’s really really far away
because it’s really really far away
because it’s really really far away the only thing super about this house
the only thing super about this house
the only thing super about this house super far away it is into the future and
super far away it is into the future and
super far away it is into the future and and there’s been a lot of people you
and there’s been a lot of people you
and there’s been a lot of people you know battling about this one of the one
know battling about this one of the one
know battling about this one of the one of the famous guys Elon Musk he is more
of the famous guys Elon Musk he is more
of the famous guys Elon Musk he is more of a doomsday kind of guy with respect
of a doomsday kind of guy with respect
of a doomsday kind of guy with respect to this and he and he should be because
to this and he and he should be because
to this and he and he should be because that gets him money into his company so
that gets him money into his company so
that gets him money into his company so it’s it’s a very it’s a very smart smart
it’s it’s a very it’s a very smart smart
it’s it’s a very it’s a very smart smart move that he says that AI is going to
move that he says that AI is going to
move that he says that AI is going to destroy the world so I’m creating a
destroy the world so I’m creating a
destroy the world so I’m creating a start-up that’s going to sort of
start-up that’s going to sort of
start-up that’s going to sort of regulate that so imagine how hard it was
regulate that so imagine how hard it was
regulate that so imagine how hard it was to raise money for that venture there
to raise money for that venture there
to raise money for that venture there are other things to consider about super
are other things to consider about super
are other things to consider about super intelligence and that’s that it is
intelligence and that’s that it is
intelligence and that’s that it is conceptually possible it is something
conceptually possible it is something
conceptually possible it is something that sooner or later if we do capture
that sooner or later if we do capture
that sooner or later if we do capture how to represent knowledge how to
how to represent knowledge how to
how to represent knowledge how to transfer knowledge how to accumulate
transfer knowledge how to accumulate
transfer knowledge how to accumulate knowledge if we know that then there is
knowledge if we know that then there is
knowledge if we know that then there is no stopping us from deploying this into
no stopping us from deploying this into
no stopping us from deploying this into the world and for all practical purposes
the world and for all practical purposes
the world and for all practical purposes now sounding a lot like musk what we
now sounding a lot like musk what we
now sounding a lot like musk what we released at that time would basically be
released at that time would basically be
released at that time would basically be a god to us and the whole thing in the
a god to us and the whole thing in the
a god to us and the whole thing in the scary part about that is will it be a
scary part about that is will it be a
scary part about that is will it be a nice God nobody knows but then again
nice God nobody knows but then again
nice God nobody knows but then again there’s very little proof in history
there’s very little proof in history
there’s very little proof in history that intelligence feeds violence so if
that intelligence feeds violence so if
that intelligence feeds violence so if anything the world is a safer place than
anything the world is a safer place than
anything the world is a safer place than it’s ever been before and and I would
it’s ever been before and and I would
it’s ever been before and and I would like to see that as an evolution of our
like to see that as an evolution of our
like to see that as an evolution of our intelligence as an evolution of our
intelligence as an evolution of our
intelligence as an evolution of our compassion I don’t see intelligence
compassion I don’t see intelligence
compassion I don’t see intelligence being a necessity for murderous robots
being a necessity for murderous robots
being a necessity for murderous robots so I’m not very afraid of that scenario
so I’m not very afraid of that scenario
so I’m not very afraid of that scenario I know we won’t be the smartest cookies
I know we won’t be the smartest cookies
I know we won’t be the smartest cookies anymore in the world but maybe that’s
anymore in the world but maybe that’s
anymore in the world but maybe that’s not so bad
not so bad
not so bad that was always going to happen and
that was always going to happen and
that was always going to happen and evolution will make sure that no matter
evolution will make sure that no matter
evolution will make sure that no matter what
what
what but basically the landscape looks like
but basically the landscape looks like
but basically the landscape looks like this so you know you have this this
this so you know you have this this
this so you know you have this this disturb artificial intelligence that
disturb artificial intelligence that
disturb artificial intelligence that sort of ubiquitous and describes
sort of ubiquitous and describes
sort of ubiquitous and describes everything from doing a linear
everything from doing a linear
everything from doing a linear regression in Excel to a self-driving
regression in Excel to a self-driving
regression in Excel to a self-driving car to identifying melanoma on a cell
car to identifying melanoma on a cell
car to identifying melanoma on a cell phone and and and all of these things
phone and and and all of these things
phone and and and all of these things are are not artificial intelligence but
are are not artificial intelligence but
are are not artificial intelligence but bets just become a buzzword just like
bets just become a buzzword just like
bets just become a buzzword just like big data I very much agree with the
big data I very much agree with the
big data I very much agree with the previous speakers about this the way I
previous speakers about this the way I
previous speakers about this the way I see it is that AI today is two things
see it is that AI today is two things
see it is that AI today is two things it’s perception machines and there’s
it’s perception machines and there’s
it’s perception machines and there’s inference machines and by inference that
inference machines and by inference that
inference machines and by inference that only mean forecasting or sort of
only mean forecasting or sort of
only mean forecasting or sort of prediction I mean really inference where
prediction I mean really inference where
prediction I mean really inference where you actually predict without actually
you actually predict without actually
you actually predict without actually having any data now under the perception
having any data now under the perception
having any data now under the perception part we’ve come a long way perception
part we’ve come a long way perception
part we’ve come a long way perception machines are everywhere those are the
machines are everywhere those are the
machines are everywhere those are the machines that I should know how to drive
machines that I should know how to drive
machines that I should know how to drive a car those are the machines that know
a car those are the machines that know
a car those are the machines that know how to identify the kites in the in the
how to identify the kites in the in the
how to identify the kites in the in the images that we saw all of those deep
images that we saw all of those deep
images that we saw all of those deep learning applications that they’re
learning applications that they’re
learning applications that they’re basically perception machine they can
basically perception machine they can
basically perception machine they can conceptualize something that they
conceptualize something that they
conceptualize something that they actually get as input either through
actually get as input either through
actually get as input either through visual stimuli or auditory stimuli they
visual stimuli or auditory stimuli they
visual stimuli or auditory stimuli they can sort of categorize it but they
can sort of categorize it but they
can sort of categorize it but they cannot make sense of it and I’ll show
cannot make sense of it and I’ll show
cannot make sense of it and I’ll show you examples of that and that’s why I
you examples of that and that’s why I
you examples of that and that’s why I reasoned that we need more we need to
reasoned that we need more we need to
reasoned that we need more we need to move into proper inference where we
move into proper inference where we
move into proper inference where we actually have a causal understanding a
actually have a causal understanding a
actually have a causal understanding a representation of the world that we’re
representation of the world that we’re
representation of the world that we’re living in and only then can we actually
living in and only then can we actually
living in and only then can we actually talk about pure intelligence but we can
talk about pure intelligence but we can
talk about pure intelligence but we can get you know closer and I’ll show you
get you know closer and I’ll show you
get you know closer and I’ll show you how to do that the biggest problems in
how to do that the biggest problems in
how to do that the biggest problems in data science today which is also another
data science today which is also another
data science today which is also another term for applied artificial intelligence
term for applied artificial intelligence
term for applied artificial intelligence is that data is actually not as
is that data is actually not as
is that data is actually not as ubiquitous and available as you might
ubiquitous and available as you might
ubiquitous and available as you might think
think
think for many interesting domains there is
for many interesting domains there is
for many interesting domains there is simply no data and the data this there
simply no data and the data this there
simply no data and the data this there is exceedingly noisy it might be a
is exceedingly noisy it might be a
is exceedingly noisy it might be a flat-out lie it might be based on
flat-out lie it might be based on
flat-out lie it might be based on surveys and we know that people lie in
surveys and we know that people lie in
surveys and we know that people lie in service that’s also a problem structure
service that’s also a problem structure
service that’s also a problem structure the problem with with structure is also
the problem with with structure is also
the problem with with structure is also that how do you represent the concept in
that how do you represent the concept in
that how do you represent the concept in the mathematical structure not
the mathematical structure not
the mathematical structure not necessarily in parameter space but just
necessarily in parameter space but just
necessarily in parameter space but just structurally how do you construct your
structurally how do you construct your
structurally how do you construct your layers in a neural network for example
identifiability what I mean by that is
identifiability what I mean by that is that for any given data sets there are
that for any given data sets there are
that for any given data sets there are millions of models that fit that data
millions of models that fit that data
millions of models that fit that data set generalizes from that data set
set generalizes from that data set
set generalizes from that data set equally well and many of them do not
equally well and many of them do not
equally well and many of them do not correspond to the physical reality that
correspond to the physical reality that
correspond to the physical reality that we’re living in
we’re living in
we’re living in so there are statistical truths
so there are statistical truths
so there are statistical truths parameter truths and there are physical
parameter truths and there are physical
parameter truths and there are physical realities and they’re not the same thing
realities and they’re not the same thing
realities and they’re not the same thing that’s why my previous field theoretical
that’s why my previous field theoretical
that’s why my previous field theoretical physics is sometimes problematic because
physics is sometimes problematic because
physics is sometimes problematic because quantum quantum theory that I sort of
quantum quantum theory that I sort of
quantum quantum theory that I sort of specialized in that’s has many different
specialized in that’s has many different
specialized in that’s has many different interpretations and then nobody really
interpretations and then nobody really
interpretations and then nobody really knows what’s going on but we know we can
knows what’s going on but we know we can
knows what’s going on but we know we can calculate stuff from it so it makes
calculate stuff from it so it makes
calculate stuff from it so it makes sense in the math but as soon as we push
sense in the math but as soon as we push
sense in the math but as soon as we push this button but what’s really happening
this button but what’s really happening
this button but what’s really happening then you know well we’re basically
then you know well we’re basically
then you know well we’re basically screwed because no one knows and a lot
screwed because no one knows and a lot
screwed because no one knows and a lot of people like to pretend that they know
of people like to pretend that they know
of people like to pretend that they know and then there are some people like the
and then there are some people like the
and then there are some people like the Copenhagen interpretation that says that
Copenhagen interpretation that says that
Copenhagen interpretation that says that well just shut up and do the math which
well just shut up and do the math which
well just shut up and do the math which is basically don’t ask the question
is basically don’t ask the question
is basically don’t ask the question because they cannot be answered
because they cannot be answered
because they cannot be answered Hawking adheres to this school by the
Hawking adheres to this school by the
Hawking adheres to this school by the way he’s also one of one of the guys
way he’s also one of one of the guys
way he’s also one of one of the guys who’s super scared of super intelligence
who’s super scared of super intelligence
who’s super scared of super intelligence funnily enough because he’s a clever
funnily enough because he’s a clever
funnily enough because he’s a clever cookie there’s also the thing about
cookie there’s also the thing about
cookie there’s also the thing about priors so every time that you you
priors so every time that you you
priors so every time that you you address a problem as as a human whatever
address a problem as as a human whatever
address a problem as as a human whatever problem I give you as an individual you
problem I give you as an individual you
problem I give you as an individual you will have a lot of prior knowledge
will have a lot of prior knowledge
will have a lot of prior knowledge you’ll have a half or whole life
you’ll have a half or whole life
you’ll have a half or whole life depending on how old you are of
depending on how old you are of
depending on how old you are of knowledge that you’ve accumulated this
knowledge that you’ve accumulated this
knowledge that you’ve accumulated this knowledge might transfer from another
knowledge might transfer from another
knowledge might transfer from another person that they just told you about
person that they just told you about
person that they just told you about something but you can apply this
something but you can apply this
something but you can apply this knowledge to the problem at hand you can
knowledge to the problem at hand you can
knowledge to the problem at hand you can represent that knowledge in the domain
represent that knowledge in the domain
represent that knowledge in the domain of the problem that you’re trying to
of the problem that you’re trying to
of the problem that you’re trying to solve and that is something that we also
solve and that is something that we also
solve and that is something that we also can actually mimic today through the
can actually mimic today through the
can actually mimic today through the concept of priors and that is that
concept of priors and that is that
concept of priors and that is that basically the way of encoding an idea or
basically the way of encoding an idea or
basically the way of encoding an idea or a sort of knowledge as the statistical
a sort of knowledge as the statistical
a sort of knowledge as the statistical prior and as a statistical distribution
prior and as a statistical distribution
prior and as a statistical distribution that can be put on par with data I’ll
that can be put on par with data I’ll
that can be put on par with data I’ll show you later how to do that as well
show you later how to do that as well
show you later how to do that as well the last part but not the least
the last part but not the least
the last part but not the least important one is uncertainty I cannot
important one is uncertainty I cannot
important one is uncertainty I cannot stress
stress
stress enough how important uncertainty is to
enough how important uncertainty is to
enough how important uncertainty is to do optimal decision-making you basically
do optimal decision-making you basically
do optimal decision-making you basically cannot make optimal decisions without
cannot make optimal decisions without
cannot make optimal decisions without knowing what you don’t know and I will
knowing what you don’t know and I will
knowing what you don’t know and I will stress that point several times during
stress that point several times during
stress that point several times during this talk during the remaining thirty
this talk during the remaining thirty
this talk during the remaining thirty nine minutes of it it’s really great I
nine minutes of it it’s really great I
nine minutes of it it’s really great I can actually see how little time I have
can actually see how little time I have
can actually see how little time I have left so I will not show you more
left so I will not show you more
left so I will not show you more equations and and it’s it’s not because
equations and and it’s it’s not because
equations and and it’s it’s not because I I’m particularly fond of them but they
I I’m particularly fond of them but they
I I’m particularly fond of them but they do help express ideas so in in the top
do help express ideas so in in the top
do help express ideas so in in the top level that’s basically a complete a
level that’s basically a complete a
level that’s basically a complete a compact way of describing any problem
compact way of describing any problem
compact way of describing any problem that you might approach it’s basically a
that you might approach it’s basically a
that you might approach it’s basically a probability distribution over the data
probability distribution over the data
probability distribution over the data that you’re a Fed they are the X’s the
that you’re a Fed they are the X’s the
that you’re a Fed they are the X’s the Y’s those are the things that you want
Y’s those are the things that you want
Y’s those are the things that you want to be able to explain and the Thetas
to be able to explain and the Thetas
to be able to explain and the Thetas they represent all of the different
they represent all of the different
they represent all of the different parameters of your model stuff you don’t
parameters of your model stuff you don’t
parameters of your model stuff you don’t know it can also be latent variables
know it can also be latent variables
know it can also be latent variables concept that you know exists but that
concept that you know exists but that
concept that you know exists but that you don’t have observational data for
you don’t have observational data for
you don’t have observational data for all that is the definition of a problem
all that is the definition of a problem
all that is the definition of a problem space now what machine learning has
space now what machine learning has
space now what machine learning has traditionally done ever since Fisher
traditionally done ever since Fisher
traditionally done ever since Fisher it’s basically that they that they
it’s basically that they that they
it’s basically that they that they looked at this with a question that
looked at this with a question that
looked at this with a question that everybody knew was wrong they basically
everybody knew was wrong they basically
everybody knew was wrong they basically said that what is the probability
said that what is the probability
said that what is the probability distribution of the data that I got
distribution of the data that I got
distribution of the data that I got pretending that is random given a fixed
pretending that is random given a fixed
pretending that is random given a fixed hypothesis that I don’t know that I’m
hypothesis that I don’t know that I’m
hypothesis that I don’t know that I’m actually searching for so then the
actually searching for so then the
actually searching for so then the problem actually became for all
problem actually became for all
problem actually became for all machining applications which sort of
machining applications which sort of
machining applications which sort of hypothesis could i generate that’s the
hypothesis could i generate that’s the
hypothesis could i generate that’s the most consistent with the data set that
most consistent with the data set that
most consistent with the data set that looks like my data set but that’s really
looks like my data set but that’s really
looks like my data set but that’s really not my data set and you can you can ask
not my data set and you can you can ask
not my data set and you can you can ask the question is that a reasonable
the question is that a reasonable
the question is that a reasonable question and then I will tell you it is
question and then I will tell you it is
question and then I will tell you it is not it is poppycock that question is not
not it is poppycock that question is not
not it is poppycock that question is not worth asking why because you’re
worth asking why because you’re
worth asking why because you’re basically just trying to find
basically just trying to find
basically just trying to find explanations to fit your truth that is
explanations to fit your truth that is
explanations to fit your truth that is not science ladies and gentlemen there
not science ladies and gentlemen there
not science ladies and gentlemen there is only one way to do science you
is only one way to do science you
is only one way to do science you postulate an idea and then you observe
postulate an idea and then you observe
postulate an idea and then you observe data to see if you can verify that idea
data to see if you can verify that idea
data to see if you can verify that idea or disregard it you cannot look at a
or disregard it you cannot look at a
or disregard it you cannot look at a data set then generate a hypothesis that
data set then generate a hypothesis that
data set then generate a hypothesis that best explains it and think that that’s
best explains it and think that that’s
best explains it and think that that’s somehow is any physical representation
somehow is any physical representation
somehow is any physical representation in this world because it doesn’t
in this world because it doesn’t
in this world because it doesn’t and and that’s why a lot of a lot of
and and that’s why a lot of a lot of
and and that’s why a lot of a lot of machine learning approaches a lot of
machine learning approaches a lot of
machine learning approaches a lot of statistical approaches has actually
statistical approaches has actually
statistical approaches has actually figured out after you know several
figured out after you know several
figured out after you know several several years of hardcore science they
several years of hardcore science they
several years of hardcore science they found out that the biggest risk for
found out that the biggest risk for
found out that the biggest risk for dying from coronary artery disease is
dying from coronary artery disease is
dying from coronary artery disease is actually going to the hospital yeah
actually going to the hospital yeah
actually going to the hospital yeah that’s just not true and you know nobody
that’s just not true and you know nobody
that’s just not true and you know nobody nobody stopped and and instead you know
nobody stopped and and instead you know
nobody stopped and and instead you know why did this happen is it because the
why did this happen is it because the
why did this happen is it because the the researchers are brain damaged could
the researchers are brain damaged could
the researchers are brain damaged could have been the reason but but but but it
have been the reason but but but but it
have been the reason but but but but it wasn’t it was the methodology it was
wasn’t it was the methodology it was
wasn’t it was the methodology it was they were asking the wrong question
they were asking the wrong question
they were asking the wrong question because if you ask that question I can
because if you ask that question I can
because if you ask that question I can assure you that before you died at the
assure you that before you died at the
assure you that before you died at the hospital you had to go there so this
hospital you had to go there so this
hospital you had to go there so this makes perfect sense but it has no
makes perfect sense but it has no
makes perfect sense but it has no representation of the problem you’re
representation of the problem you’re
representation of the problem you’re trying to solve what you should have
trying to solve what you should have
trying to solve what you should have said is given that you’re sick and you
said is given that you’re sick and you
said is given that you’re sick and you go to the hospital and given that jack
go to the hospital and given that jack
go to the hospital and given that jack to have something that’s worth visiting
to have something that’s worth visiting
to have something that’s worth visiting the hospital for now that is predictive
the hospital for now that is predictive
the hospital for now that is predictive of you being actually disposed to dying
of you being actually disposed to dying
of you being actually disposed to dying for coronary artery disease so how do we
for coronary artery disease so how do we
for coronary artery disease so how do we fix this we fix this by doing what we
fix this we fix this by doing what we
fix this we fix this by doing what we should have been doing from the
should have been doing from the
should have been doing from the beginning and this is not new this
beginning and this is not new this
beginning and this is not new this formula down here below asks a different
formula down here below asks a different
formula down here below asks a different question what does it ask it asks what
question what does it ask it asks what
question what does it ask it asks what is the probability distribution of the
is the probability distribution of the
is the probability distribution of the parameters on my model that I don’t know
parameters on my model that I don’t know
parameters on my model that I don’t know by the way given that I have observed a
by the way given that I have observed a
by the way given that I have observed a data set that is real it is not fake it
data set that is real it is not fake it
data set that is real it is not fake it is not random it is a data set as been
is not random it is a data set as been
is not random it is a data set as been observed what is the probability
observed what is the probability
observed what is the probability distribution of my parameters now that
distribution of my parameters now that
distribution of my parameters now that is an interesting question to ask and
is an interesting question to ask and
is an interesting question to ask and that is a scientific question to ask but
that is a scientific question to ask but
that is a scientific question to ask but what does that require it requires you
what does that require it requires you
what does that require it requires you to state your mind the last part on the
to state your mind the last part on the
to state your mind the last part on the denominator which is the P theta given X
denominator which is the P theta given X
denominator which is the P theta given X that says what do you believe is true
that says what do you believe is true
that says what do you believe is true about your parameters given the data set
about your parameters given the data set
about your parameters given the data set that you have that’s very very important
that you have that’s very very important
that you have that’s very very important ladies and gentlemen because this is the
ladies and gentlemen because this is the
ladies and gentlemen because this is the difference between something great and
difference between something great and
difference between something great and something completely insane
something completely insane
something completely insane now then you might ask but okay why
now then you might ask but okay why
now then you might ask but okay why didn’t we do this because it couldn’t be
didn’t we do this because it couldn’t be
didn’t we do this because it couldn’t be done we simply didn’t have the
done we simply didn’t have the
done we simply didn’t have the computational power to do this and it’s
computational power to do this and it’s
computational power to do this and it’s not because of the guy to the right hand
not because of the guy to the right hand
not because of the guy to the right hand side there
side there
side there it’s also not to the guy on the left
it’s also not to the guy on the left
it’s also not to the guy on the left hand side and denominator and you can
hand side and denominator and you can
hand side and denominator and you can see that the guy on the left hand side
see that the guy on the left hand side
see that the guy on the left hand side and nominated it’s exactly what machine
and nominated it’s exactly what machine
and nominated it’s exactly what machine learning is doing today now why is that
learning is doing today now why is that
learning is doing today now why is that it’s because of the fact that they knew
it’s because of the fact that they knew
it’s because of the fact that they knew that the the guy in the denominator that
that the the guy in the denominator that
that the the guy in the denominator that is an integral from hell and it cannot
is an integral from hell and it cannot
is an integral from hell and it cannot be solved it it looks at every single
be solved it it looks at every single
be solved it it looks at every single value of every single parameter that you
value of every single parameter that you
value of every single parameter that you have and sums that out now this will end
have and sums that out now this will end
have and sums that out now this will end up in a scenario we have to calculate a
up in a scenario we have to calculate a
up in a scenario we have to calculate a lot of more things than the number of
lot of more things than the number of
lot of more things than the number of atoms in the universe and there are a
atoms in the universe and there are a
atoms in the universe and there are a lot of atoms in the universe even the
lot of atoms in the universe even the
lot of atoms in the universe even the the part that one that we can see but
the part that one that we can see but
the part that one that we can see but that basically meant that all of this is
that basically meant that all of this is
that basically meant that all of this is out of the question so someone realized
out of the question so someone realized
out of the question so someone realized hey that I don’t need to calculate that
hey that I don’t need to calculate that
hey that I don’t need to calculate that I don’t know I don’t care about
I don’t know I don’t care about
I don’t know I don’t care about probabilities you know I can just say
probabilities you know I can just say
probabilities you know I can just say that the point that is the maximum will
that the point that is the maximum will
that the point that is the maximum will be the same because the other thing is
be the same because the other thing is
be the same because the other thing is just a normalizing factor it’s a
just a normalizing factor it’s a
just a normalizing factor it’s a constant okay good enough we remove that
constant okay good enough we remove that
constant okay good enough we remove that so done deal and then they said but the
so done deal and then they said but the
so done deal and then they said but the prior ever what if I don’t know anything
prior ever what if I don’t know anything
prior ever what if I don’t know anything what if I I don’t want to say anything I
what if I I don’t want to say anything I
what if I I don’t want to say anything I don’t want to you know state my mind and
don’t want to you know state my mind and
don’t want to you know state my mind and you know put my knowledge into the
you know put my knowledge into the
you know put my knowledge into the problem so that’s just the uniform
problem so that’s just the uniform
problem so that’s just the uniform distribution over minus infinity and
distribution over minus infinity and
distribution over minus infinity and infinity and whoopty this this equation
infinity and whoopty this this equation
infinity and whoopty this this equation here has been transferred to only the
here has been transferred to only the
here has been transferred to only the likelihood but you made a lot of
likelihood but you made a lot of
likelihood but you made a lot of assumptions there but people just forgot
assumptions there but people just forgot
assumptions there but people just forgot that these assumptions are not true and
that these assumptions are not true and
that these assumptions are not true and it also in a maximum likelihood which is
it also in a maximum likelihood which is
it also in a maximum likelihood which is you know horrible way of doing things
you know horrible way of doing things
you know horrible way of doing things it’s basically because you assume that
it’s basically because you assume that
it’s basically because you assume that everything is independent you assume
everything is independent you assume
everything is independent you assume that even when you’re doing time series
that even when you’re doing time series
that even when you’re doing time series regression that observation one is
regression that observation one is
regression that observation one is independent of observation – that’s
independent of observation – that’s
that’s like saying you know I wasn’t
that’s like saying you know I wasn’t last year I was not one year younger
last year I was not one year younger
last year I was not one year younger than I am today of course I was and
than I am today of course I was and
than I am today of course I was and that’s important
that’s important
that’s important all of those things that are temporally
all of those things that are temporally
all of those things that are temporally related are extremely important and the
related are extremely important and the
related are extremely important and the reason why I’m saying this today is that
reason why I’m saying this today is that
reason why I’m saying this today is that there’s no need to cheat anymore there’s
there’s no need to cheat anymore there’s
there’s no need to cheat anymore there’s no need for these crazy statistical
no need for these crazy statistical
no need for these crazy statistical results only you can state your mind you
results only you can state your mind you
results only you can state your mind you can do the inference and all of it can
can do the inference and all of it can
can do the inference and all of it can be done with probabilistic programming
be done with probabilistic programming
be done with probabilistic programming and there are many frameworks for this
and there are many frameworks for this
and there are many frameworks for this today including in Python and also
today including in Python and also
today including in Python and also building on top of tensorflow by the
building on top of tensorflow by the
building on top of tensorflow by the there’s really no excuse not to do this
there’s really no excuse not to do this
there’s really no excuse not to do this and the best thing about it is that it’s
and the best thing about it is that it’s
and the best thing about it is that it’s actually easier than than adhering to
actually easier than than adhering to
actually easier than than adhering to normal statistics because the normal
normal statistics because the normal
normal statistics because the normal statistics you were taught tools they
statistics you were taught tools they
statistics you were taught tools they said that if you have two populations
said that if you have two populations
said that if you have two populations and they are sort of varying together
and they are sort of varying together
and they are sort of varying together then you use this magical tool if they
then you use this magical tool if they
then you use this magical tool if they are independent then you use another
are independent then you use another
are independent then you use another magical tool nobody really understood
magical tool nobody really understood
magical tool nobody really understood why they just but in here is the t-test
why they just but in here is the t-test
why they just but in here is the t-test in this one it’s a paired t-test and
in this one it’s a paired t-test and
in this one it’s a paired t-test and this one is the Wilcox in this point you
this one is the Wilcox in this point you
this one is the Wilcox in this point you should do a general logistic regression
should do a general logistic regression
should do a general logistic regression in this one you should just do a normal
in this one you should just do a normal
in this one you should just do a normal linear regression in this one is uses
linear regression in this one is uses
linear regression in this one is uses port vector machine they are all the
port vector machine they are all the
port vector machine they are all the same thing they are not different there
same thing they are not different there
same thing they are not different there are different assumptions in the
are different assumptions in the
are different assumptions in the likelihood functions there are different
likelihood functions there are different
likelihood functions there are different assumptions in your priors there are
assumptions in your priors there are
assumptions in your priors there are different assumptions in the physical
different assumptions in the physical
different assumptions in the physical structure of your model that is all
structure of your model that is all
structure of your model that is all there is no other difference all of it
there is no other difference all of it
there is no other difference all of it comes back to probabilistic modeling and
comes back to probabilistic modeling and
comes back to probabilistic modeling and if you can learn how to make these
if you can learn how to make these
if you can learn how to make these assumptions explicitly then you have a
assumptions explicitly then you have a
assumptions explicitly then you have a modeling language without limitations
modeling language without limitations
modeling language without limitations then you don’t have to know the
then you don’t have to know the
then you don’t have to know the difference between logistic regressions
difference between logistic regressions
difference between logistic regressions and linear regressions because there is
and linear regressions because there is
and linear regressions because there is none it is exactly the same thing and
none it is exactly the same thing and
none it is exactly the same thing and that’s perhaps the most important thing
that’s perhaps the most important thing
that’s perhaps the most important thing now wait the most important thing that
now wait the most important thing that
now wait the most important thing that I’m gonna say today given that you think
I’m gonna say today given that you think
I’m gonna say today given that you think it’s important is that you cannot do
it’s important is that you cannot do
it’s important is that you cannot do science without assumptions that is
science without assumptions that is
science without assumptions that is impossible just you know this dis is not
impossible just you know this dis is not
impossible just you know this dis is not my belief this is just hardcore facts
my belief this is just hardcore facts
my belief this is just hardcore facts you cannot do science without assumption
you cannot do science without assumption
you cannot do science without assumption and and don’t rest your minds until you
and and don’t rest your minds until you
and and don’t rest your minds until you understand this so without actually
understand this so without actually
understand this so without actually risking something you can get no answers
risking something you can get no answers
risking something you can get no answers so let’s have a look at neural networks
so let’s have a look at neural networks
so let’s have a look at neural networks I’m sure how many of you have taken a
I’m sure how many of you have taken a
I’m sure how many of you have taken a neural networks class in their days ok
neural networks class in their days ok
neural networks class in their days ok then most of you have have solved this
then most of you have have solved this
then most of you have have solved this problem I’m sure how many people have
problem I’m sure how many people have
problem I’m sure how many people have solved this problem before ok a few guys
solved this problem before ok a few guys
solved this problem before ok a few guys and girls so basically this problem is
and girls so basically this problem is
and girls so basically this problem is is highly nonlinear it’s it’s a
is highly nonlinear it’s it’s a
is highly nonlinear it’s it’s a classification task your job is to
classification task your job is to
classification task your job is to separate the the blue dots from the red
separate the the blue dots from the red
separate the the blue dots from the red dots by some line you can see this is
dots by some line you can see this is
dots by some line you can see this is sort of a spiral that that’s that’s non
sort of a spiral that that’s that’s non
sort of a spiral that that’s that’s non stationary it’s
stationary it’s
stationary it’s quite nasty isn’t it Anna neural network
quite nasty isn’t it Anna neural network
quite nasty isn’t it Anna neural network will how many hidden notes do you think
will how many hidden notes do you think
will how many hidden notes do you think I have to have in a one-layer no natural
I have to have in a one-layer no natural
I have to have in a one-layer no natural to solve this 10 20 50 100 let’s see
to solve this 10 20 50 100 let’s see
to solve this 10 20 50 100 let’s see well with ten hit notes I can learn how
well with ten hit notes I can learn how
well with ten hit notes I can learn how to separate this not great but there is
to separate this not great but there is
to separate this not great but there is some signal there if you use up here
some signal there if you use up here
some signal there if you use up here thirty hidden notes you can do a lot
thirty hidden notes you can do a lot
thirty hidden notes you can do a lot better not surprising but still it’s
better not surprising but still it’s
better not surprising but still it’s still not good because we know that this
still not good because we know that this
still not good because we know that this problem can be solved exactly right so
problem can be solved exactly right so
problem can be solved exactly right so with a hundred hidden notes you almost
with a hundred hidden notes you almost
with a hundred hidden notes you almost have perfect classification right and if
have perfect classification right and if
have perfect classification right and if you look at the accuracy table you will
you look at the accuracy table you will
you look at the accuracy table you will see that the area under the curve is
see that the area under the curve is
see that the area under the curve is 100% with the 100 nodes now what is the
100% with the 100 nodes now what is the
100% with the 100 nodes now what is the problem with this and this is on a this
problem with this and this is on a this
problem with this and this is on a this is on a test data set mind you now the
is on a test data set mind you now the
is on a test data set mind you now the problem with this is that this looks
problem with this is that this looks
problem with this is that this looks great
great
great this looks amazing I mean your job is
this looks amazing I mean your job is
this looks amazing I mean your job is done right okay so let’s look at the
done right okay so let’s look at the
done right okay so let’s look at the decision surfaces that were generated
decision surfaces that were generated
decision surfaces that were generated from these guys now to the left-hand
from these guys now to the left-hand
from these guys now to the left-hand side you have the decision surface based
side you have the decision surface based
side you have the decision surface based on 10 hidden neurons and on the right
on 10 hidden neurons and on the right
on 10 hidden neurons and on the right hand side you have the decision surfaces
hand side you have the decision surfaces
hand side you have the decision surfaces based on 100 hidden nodes now you can
based on 100 hidden nodes now you can
based on 100 hidden nodes now you can see here does those decision surfaces
see here does those decision surfaces
see here does those decision surfaces look good to you does it look like they
look good to you does it look like they
look good to you does it look like they actually have captured what you wanted
actually have captured what you wanted
actually have captured what you wanted them to capture no it did not and this
them to capture no it did not and this
them to capture no it did not and this is exactly how neural networks work they
is exactly how neural networks work they
is exactly how neural networks work they are over parameterised very flexible
are over parameterised very flexible
are over parameterised very flexible mathematical models that will do
mathematical models that will do
mathematical models that will do everything they can to minimize that sum
everything they can to minimize that sum
everything they can to minimize that sum square or the croissant repair so
square or the croissant repair so
square or the croissant repair so there’s no penalisation for finding
there’s no penalisation for finding
there’s no penalisation for finding statistical only results and what is the
statistical only results and what is the
statistical only results and what is the worst thing with this the worst thing
worst thing with this the worst thing
worst thing with this the worst thing here is that you see the regions in the
here is that you see the regions in the
here is that you see the regions in the in the outskirts that are colored red
in the outskirts that are colored red
in the outskirts that are colored red that is a signal that the neural network
that is a signal that the neural network
that is a signal that the neural network is sure exists there’s there was no data
is sure exists there’s there was no data
is sure exists there’s there was no data out there at all
out there at all
out there at all but it knows that that has a
but it knows that that has a
but it knows that that has a differentiated class now this might not
differentiated class now this might not
differentiated class now this might not be a problem if you’re if you’re trying
be a problem if you’re if you’re trying
be a problem if you’re if you’re trying to classify you know
to classify you know
to classify you know maybe if there will rain extra much
maybe if there will rain extra much
maybe if there will rain extra much tomorrow the what if you have a droid
tomorrow the what if you have a droid
tomorrow the what if you have a droid with one target kill insurgents let
with one target kill insurgents let
with one target kill insurgents let civilians live what if they identify one
civilians live what if they identify one
civilians live what if they identify one of those asks you know one of those
of those asks you know one of those
of those asks you know one of those outer regions that that just makes sense
outer regions that that just makes sense
outer regions that that just makes sense that was never part of the training set
that was never part of the training set
that was never part of the training set this is a truth that has been learned by
this is a truth that has been learned by
this is a truth that has been learned by a network where data never actually
a network where data never actually
a network where data never actually showed at this and there’s no
showed at this and there’s no
showed at this and there’s no penalisation for this and the reason why
penalisation for this and the reason why
penalisation for this and the reason why I’m saying this is not to be you know
I’m saying this is not to be you know
I’m saying this is not to be you know don’t use AI or don’t use machine
don’t use AI or don’t use machine
don’t use AI or don’t use machine learning in fact I’m saying the opposite
learning in fact I’m saying the opposite
learning in fact I’m saying the opposite but what I want to say here is that be
but what I want to say here is that be
but what I want to say here is that be responsible
responsible
responsible every time you deploy a machine learning
every time you deploy a machine learning
every time you deploy a machine learning algorithm you have to understand exactly
algorithm you have to understand exactly
algorithm you have to understand exactly what it does because lack of
what it does because lack of
what it does because lack of understanding is the most dangerous
understanding is the most dangerous
understanding is the most dangerous thing that can exist today and it
thing that can exist today and it
thing that can exist today and it doesn’t have to be artificial
doesn’t have to be artificial
doesn’t have to be artificial superintelligence all that requires is a
superintelligence all that requires is a
superintelligence all that requires is a screw-up in the engineer or the
screw-up in the engineer or the
screw-up in the engineer or the scientists built this network and it can
scientists built this network and it can
scientists built this network and it can have dramatic consequences especially
have dramatic consequences especially
have dramatic consequences especially today in the in the time of self-driving
today in the in the time of self-driving
today in the in the time of self-driving cars and and all these things and this
cars and and all these things and this
cars and and all these things and this here I will show you another example of
here I will show you another example of
here I will show you another example of why I think that this is interesting so
why I think that this is interesting so
why I think that this is interesting so this is just a representation and mind
this is just a representation and mind
this is just a representation and mind you this is only a single layer neural
you this is only a single layer neural
you this is only a single layer neural network by the way no no you know super
network by the way no no you know super
network by the way no no you know super deep structures where would have even
deep structures where would have even
deep structures where would have even more parameters so I just want want to
more parameters so I just want want to
more parameters so I just want want to show you that this problem here
show you that this problem here
show you that this problem here represented in Cartesian coordinates is
represented in Cartesian coordinates is
represented in Cartesian coordinates is what was being fed to the neural network
what was being fed to the neural network
what was being fed to the neural network and what the neural network should have
and what the neural network should have
and what the neural network should have realized is that in polar coordinates it
realized is that in polar coordinates it
realized is that in polar coordinates it looks a lot simpler doesn’t it now I
looks a lot simpler doesn’t it now I
looks a lot simpler doesn’t it now I know that problem I can separate that
know that problem I can separate that
know that problem I can separate that with with just one hidden node and this
with with just one hidden node and this
with with just one hidden node and this is my point you can over parameterize
is my point you can over parameterize
is my point you can over parameterize and throw a lot of data things but if
and throw a lot of data things but if
and throw a lot of data things but if you start to think about the problem at
you start to think about the problem at
you start to think about the problem at hand and if we teach machines to learn
hand and if we teach machines to learn
hand and if we teach machines to learn how to think how to reason how to look
how to think how to reason how to look
how to think how to reason how to look at data instead of just number crunching
at data instead of just number crunching
at data instead of just number crunching and this is why today I’m not scared of
and this is why today I’m not scared of
and this is why today I’m not scared of artificial intelligence artificial
artificial intelligence artificial
artificial intelligence artificial superintelligence because i could have
superintelligence because i could have
superintelligence because i could have solved this in half a second you know
solved this in half a second you know
solved this in half a second you know even if you don’t have a degree in
even if you don’t have a degree in
even if you don’t have a degree in physics you should realize that that
physics you should realize that that
physics you should realize that that these are just two sine functions with
these are just two sine functions with
these are just two sine functions with with increasing radius it’s not hard but
with increasing radius it’s not hard but
with increasing radius it’s not hard but a neural network would never get this
a neural network would never get this
a neural network would never get this nor would any other machine learning
nor would any other machine learning
nor would any other machine learning algorithm by the way impossible because
algorithm by the way impossible because
algorithm by the way impossible because they don’t work that way that’s not
they don’t work that way that’s not
they don’t work that way that’s not their goal
their goal
their goal the way we can’t we can’t be angry at
the way we can’t we can’t be angry at
the way we can’t we can’t be angry at them for not solving that I just want to
them for not solving that I just want to
them for not solving that I just want to show you a take on probabilistic program
show you a take on probabilistic program
show you a take on probabilistic program with this and and also explain to you
with this and and also explain to you
with this and and also explain to you what public programming is it’s
what public programming is it’s
what public programming is it’s basically an attempt to unify
basically an attempt to unify
basically an attempt to unify general-purpose programming and by
general-purpose programming and by
general-purpose programming and by general purpose I mean like Turing
general purpose I mean like Turing
general purpose I mean like Turing complete programs that we all like
complete programs that we all like
complete programs that we all like because they can basically compute
because they can basically compute
because they can basically compute anything and marrying that was
anything and marrying that was
anything and marrying that was probabilistic modeling which is what
probabilistic modeling which is what
probabilistic modeling which is what everyone should be doing everyone
everyone should be doing everyone
everyone should be doing everyone whatever model you are crazy you are
whatever model you are crazy you are
whatever model you are crazy you are doing probabilistic modeling you just
doing probabilistic modeling you just
doing probabilistic modeling you just accepted a lot of assumptions that you
accepted a lot of assumptions that you
accepted a lot of assumptions that you didn’t make and and that is a
didn’t make and and that is a
didn’t make and and that is a realization that that even though you
realization that that even though you
realization that that even though you can choose not to care about it you have
can choose not to care about it you have
can choose not to care about it you have to know about it you have to know the
to know about it you have to know the
to know about it you have to know the assumptions behind the algorithms that
assumptions behind the algorithms that
assumptions behind the algorithms that you’re using and that’s why even though
you’re using and that’s why even though
you’re using and that’s why even though it’s very attempting to to fire up your
it’s very attempting to to fire up your
it’s very attempting to to fire up your favorite programming language load
favorite programming language load
favorite programming language load scikit-learn or tensorflow or you know
scikit-learn or tensorflow or you know
scikit-learn or tensorflow or you know whatever framework you’re using MX net
whatever framework you’re using MX net
whatever framework you’re using MX net doesn’t matter it’s still important to
doesn’t matter it’s still important to
doesn’t matter it’s still important to understand the cost you don’t have to be
understand the cost you don’t have to be
understand the cost you don’t have to be an expert in the math behind it that’s
an expert in the math behind it that’s
an expert in the math behind it that’s not what I’m saying but you have to
not what I’m saying but you have to
not what I’m saying but you have to understand conceptually what they do and
understand conceptually what they do and
understand conceptually what they do and more importantly what they don’t do
more importantly what they don’t do
more importantly what they don’t do because that makes all the difference so
because that makes all the difference so
because that makes all the difference so this is just to say that you could have
this is just to say that you could have
this is just to say that you could have written this model a lot easier now this
written this model a lot easier now this
written this model a lot easier now this is this is also a breaking point of the
is this is also a breaking point of the
is this is also a breaking point of the html5 presentations by the way this is
html5 presentations by the way this is
html5 presentations by the way this is actually really supposed to be on the
actually really supposed to be on the
actually really supposed to be on the right hand side so thank you windows
right hand side so thank you windows
right hand side so thank you windows even so that few code up there is
even so that few code up there is
even so that few code up there is basically a probabilistic way of
basically a probabilistic way of
basically a probabilistic way of specifying the model that solves it
specifying the model that solves it
specifying the model that solves it exactly and this can be expressed in a
exactly and this can be expressed in a
exactly and this can be expressed in a probabilistic programming language the
probabilistic programming language the
probabilistic programming language the neural network I wrote to fix that took
neural network I wrote to fix that took
neural network I wrote to fix that took a lot more coding I can assure you
so the take-home messages here is that
so the take-home messages here is that if you view things if you go back to
if you view things if you go back to
if you view things if you go back to basic and view them as what they are
basic and view them as what they are
basic and view them as what they are probabilistic statements about data
probabilistic statements about data
probabilistic statements about data about concepts about what you’re trying
about concepts about what you’re trying
about concepts about what you’re trying to model you gain basically a generative
to model you gain basically a generative
to model you gain basically a generative model you gain an understanding of what
model you gain an understanding of what
model you gain an understanding of what is actually happening and and that also
is actually happening and and that also
is actually happening and and that also means that you don’t get any crazy
means that you don’t get any crazy
means that you don’t get any crazy statistical only solutions due to
statistical only solutions due to
statistical only solutions due to identifiability problems and and this is
identifiability problems and and this is
identifiability problems and and this is something we really have to get away
something we really have to get away
something we really have to get away from identifiability is something that
from identifiability is something that
from identifiability is something that will be problematic so I’m not going to
will be problematic so I’m not going to
will be problematic so I’m not going to talk about deep learning I just want to
talk about deep learning I just want to
talk about deep learning I just want to show you what it is but I think you’ve
show you what it is but I think you’ve
show you what it is but I think you’ve had enough talks about that so max
had enough talks about that so max
had enough talks about that so max pooling and all of that we can I’m
pooling and all of that we can I’m
pooling and all of that we can I’m pretty sure we can skip what I do want
pretty sure we can skip what I do want
pretty sure we can skip what I do want to say though that neural networks per
to say though that neural networks per
to say though that neural networks per per default are degenerate and what I
per default are degenerate and what I
per default are degenerate and what I mean by that is that the the energy
mean by that is that the the energy
mean by that is that the the energy landscape that they’re running around in
landscape that they’re running around in
landscape that they’re running around in where they are trying to optimize things
where they are trying to optimize things
where they are trying to optimize things there are multiple locations in this
there are multiple locations in this
there are multiple locations in this energy landscape corresponding to the
energy landscape corresponding to the
energy landscape corresponding to the parameters that that minimizes the error
parameters that that minimizes the error
parameters that that minimizes the error and they’re equivalent that they
and they’re equivalent that they
and they’re equivalent that they correspond to very different physical
correspond to very different physical
correspond to very different physical realities so how the how’s the neural
realities so how the how’s the neural
realities so how the how’s the neural networks supposed to know and this is
networks supposed to know and this is
networks supposed to know and this is not something that you know that that we
not something that you know that that we
not something that you know that that we can design our way out of because the
can design our way out of because the
can design our way out of because the whole idea with the neural network is
whole idea with the neural network is
whole idea with the neural network is this degeneracy because the optimization
this degeneracy because the optimization
this degeneracy because the optimization is such a problem problematic space and
is such a problem problematic space and
is such a problem problematic space and I just want to visualize with the simple
I just want to visualize with the simple
I just want to visualize with the simple neural network here why this happens you
neural network here why this happens you
neural network here why this happens you can see these two networks describe
can see these two networks describe
can see these two networks describe exactly the same thing they solve
exactly the same thing they solve
exactly the same thing they solve exactly the same problem but the
exactly the same problem but the
exactly the same problem but the parameters are different and that’s why
parameters are different and that’s why
parameters are different and that’s why if you take you from X 1 and go to the
if you take you from X 1 and go to the
if you take you from X 1 and go to the hidden 2 and hidden 1 you can either
hidden 2 and hidden 1 you can either
hidden 2 and hidden 1 you can either have weight 1 1 be equal to 5 and go to
have weight 1 1 be equal to 5 and go to
have weight 1 1 be equal to 5 and go to a hidden node 1 or you can have weight 1
a hidden node 1 or you can have weight 1
a hidden node 1 or you can have weight 1 1 before and go to hidden 8 so if you
1 before and go to hidden 8 so if you
1 before and go to hidden 8 so if you try if you basically turn this on its
try if you basically turn this on its
try if you basically turn this on its head and shift around these weights you
head and shift around these weights you
head and shift around these weights you get exactly the same solution now this
get exactly the same solution now this
get exactly the same solution now this is one source of degeneracy and there
is one source of degeneracy and there
is one source of degeneracy and there are many of those so just imagine now
are many of those so just imagine now
are many of those so just imagine now that you’re stacking a lot of layers on
that you’re stacking a lot of layers on
that you’re stacking a lot of layers on top of each other you’re having hundreds
top of each other you’re having hundreds
top of each other you’re having hundreds of neurons how many permutations do you
of neurons how many permutations do you
of neurons how many permutations do you think you will be able to reach a lot is
think you will be able to reach a lot is
think you will be able to reach a lot is the answer I didn’t do it I didn’t do
the answer I didn’t do it I didn’t do
the answer I didn’t do it I didn’t do the math but just
the math but just
the math but just trust me it’s a lot so in in energy
trust me it’s a lot so in in energy
trust me it’s a lot so in in energy space in one dimension it looks like the
space in one dimension it looks like the
space in one dimension it looks like the one on the left-hand side you see two
one on the left-hand side you see two
one on the left-hand side you see two distinct points they are equivalent in
distinct points they are equivalent in
distinct points they are equivalent in the solution space and you cannot
the solution space and you cannot
the solution space and you cannot differentiate between them this is also
differentiate between them this is also
differentiate between them this is also why regularization is such a good idea
why regularization is such a good idea
why regularization is such a good idea in neural networks because it basically
in neural networks because it basically
in neural networks because it basically forces you to enter one of those
forces you to enter one of those
forces you to enter one of those tractors and in in two-dimensional space
tractors and in in two-dimensional space
tractors and in in two-dimensional space you can see that it corresponds to these
you can see that it corresponds to these
you can see that it corresponds to these two attractors in this colorized plot
two attractors in this colorized plot
two attractors in this colorized plot and then if you visualize this in in all
and then if you visualize this in in all
and then if you visualize this in in all the dimensions that the neural network
the dimensions that the neural network
the dimensions that the neural network is actually operating in which is
is actually operating in which is
is actually operating in which is typically the essence of dimensions then
typically the essence of dimensions then
typically the essence of dimensions then you can just imagine how many of those
you can just imagine how many of those
you can just imagine how many of those attractors you have and different depths
attractors you have and different depths
attractors you have and different depths of those attractors so I want to end my
of those attractors so I want to end my
of those attractors so I want to end my point if you missed my points I try to
point if you missed my points I try to
point if you missed my points I try to state it several times but sometimes I’m
state it several times but sometimes I’m
state it several times but sometimes I’m very clumsy in the way I state things so
very clumsy in the way I state things so
very clumsy in the way I state things so I’m gonna be very blunt this is one of
I’m gonna be very blunt this is one of
I’m gonna be very blunt this is one of the best neural networks at given 2016
the best neural networks at given 2016
the best neural networks at given 2016 or 2015 was a version of the Linette
or 2015 was a version of the Linette
or 2015 was a version of the Linette that was trained to recognize digits and
that was trained to recognize digits and
that was trained to recognize digits and it does that perfectly like we said
it does that perfectly like we said
it does that perfectly like we said before we’re so far and in this area
before we’re so far and in this area
before we’re so far and in this area about perception that we don’t have to
about perception that we don’t have to
about perception that we don’t have to worry about not being able to do it it’s
worry about not being able to do it it’s
worry about not being able to do it it’s actually it’s actually done and and and
actually it’s actually done and and and
actually it’s actually done and and and it’s much better than humans at
it’s much better than humans at
it’s much better than humans at recognizing these things okay so let’s
recognizing these things okay so let’s
recognizing these things okay so let’s put it to the test shall we let’s
put it to the test shall we let’s
put it to the test shall we let’s generate some random noise images and
generate some random noise images and
generate some random noise images and ask it what is this and in every single
ask it what is this and in every single
ask it what is this and in every single image here you see the network is 99%
image here you see the network is 99%
image here you see the network is 99% sure that it’s a 1 versus 2 all the way
sure that it’s a 1 versus 2 all the way
sure that it’s a 1 versus 2 all the way up to 9 so all the 4 images under the 0
up to 9 so all the 4 images under the 0
up to 9 so all the 4 images under the 0 it is convinced with the likelihood of
it is convinced with the likelihood of
it is convinced with the likelihood of 99% that this is a 0 can you in any way
99% that this is a 0 can you in any way
99% that this is a 0 can you in any way understand why this is a zero I can’t
understand why this is a zero I can’t
understand why this is a zero I can’t and nor nor can the network because it
and nor nor can the network because it
and nor nor can the network because it was never penalized based on the fact
was never penalized based on the fact
was never penalized based on the fact that you’re not allowed to find
that you’re not allowed to find
that you’re not allowed to find structures that does not sort of dispute
structures that does not sort of dispute
structures that does not sort of dispute your data it has no briefing that it has
your data it has no briefing that it has
your data it has no briefing that it has to stay true to some sort of physical
to stay true to some sort of physical
to stay true to some sort of physical reality and this happens
reality and this happens
reality and this happens now back to my point what if it’s not
now back to my point what if it’s not
now back to my point what if it’s not the number zero
the number zero
the number zero what if it’s recognizing a unknown the
what if it’s recognizing a unknown the
what if it’s recognizing a unknown the face of a known terrorist with a you
face of a known terrorist with a you
face of a known terrorist with a you know kill on sight command and this is
know kill on sight command and this is
know kill on sight command and this is just numbers ladies in them imagine the
just numbers ladies in them imagine the
just numbers ladies in them imagine the complexity of faces so this is the entry
complexity of faces so this is the entry
complexity of faces so this is the entry point exactly how dangerous this
point exactly how dangerous this
point exactly how dangerous this technology is if you don’t respect it
technology is if you don’t respect it
technology is if you don’t respect it and it’s not about you know the machines
and it’s not about you know the machines
and it’s not about you know the machines being too intelligent it’s about us not
being too intelligent it’s about us not
being too intelligent it’s about us not being stupid that is that is really
being stupid that is that is really
being stupid that is that is really important to remember we have a
important to remember we have a
important to remember we have a responsibility to build applications
responsibility to build applications
responsibility to build applications that do not have this confirmation bias
that do not have this confirmation bias
that do not have this confirmation bias in them and that is something I hope
in them and that is something I hope
in them and that is something I hope that all of you will think of when you
that all of you will think of when you
that all of you will think of when you go out and build the next awesome
go out and build the next awesome
go out and build the next awesome machine learning application because I
machine learning application because I
machine learning application because I can’t see any numbers in these images
can’t see any numbers in these images
can’t see any numbers in these images anywhere and if you want to you can read
anywhere and if you want to you can read
anywhere and if you want to you can read the paper by these guys that I said you
the paper by these guys that I said you
the paper by these guys that I said you get the slides afterwards and it’s a
get the slides afterwards and it’s a
get the slides afterwards and it’s a very interesting paper they’ve basically
very interesting paper they’ve basically
very interesting paper they’ve basically tried all they could to see how the
tried all they could to see how the
tried all they could to see how the network could generalize with things
network could generalize with things
network could generalize with things that hadn’t seen before and in different
that hadn’t seen before and in different
that hadn’t seen before and in different areas of what it was supposed to see
areas of what it was supposed to see
areas of what it was supposed to see another thing I want to say is that
another thing I want to say is that
another thing I want to say is that events are not temporally independent
events are not temporally independent
events are not temporally independent everything that you do today everything
everything that you do today everything
everything that you do today everything that you see today here perceive think
that you see today here perceive think
that you see today here perceive think about is affected by what you saw
about is affected by what you saw
about is affected by what you saw yesterday and it’s the same in data data
yesterday and it’s the same in data data
yesterday and it’s the same in data data is not independent you cannot assume
is not independent you cannot assume
is not independent you cannot assume that two data points are independent
that two data points are independent
that two data points are independent that is a wild and crazy assumption that
that is a wild and crazy assumption that
that is a wild and crazy assumption that we have been allowed to do for far too
we have been allowed to do for far too
we have been allowed to do for far too long
long
long and this is just a small visualization
and this is just a small visualization
and this is just a small visualization from the domain that I that I was
from the domain that I that I was
from the domain that I that I was working in where we’re trying to solve
working in where we’re trying to solve
working in where we’re trying to solve how a TV exposure affects the purchasing
how a TV exposure affects the purchasing
how a TV exposure affects the purchasing behavior of people moving into the
behavior of people moving into the
behavior of people moving into the future and of course if you see TV
future and of course if you see TV
future and of course if you see TV commercial today it might affect you to
commercial today it might affect you to
commercial today it might affect you to buy something far into the future and it
buy something far into the future and it
buy something far into the future and it might affect no one to do something
might affect no one to do something
might affect no one to do something today and that’s course or temporal
today and that’s course or temporal
today and that’s course or temporal dependencies that that also needs to be
dependencies that that also needs to be
dependencies that that also needs to be taken into account if you think about
taken into account if you think about
taken into account if you think about causal dependencies and if you think
causal dependencies and if you think
causal dependencies and if you think about concepts if you really think about
about concepts if you really think about
about concepts if you really think about structure of things then you end up with
structure of things then you end up with
structure of things then you end up with something that looks like a deep
something that looks like a deep
something that looks like a deep learning neural network but where you
learning neural network but where you
learning neural network but where you actually have
actually have
actually have structure that is inherent to the
structure that is inherent to the
structure that is inherent to the problem at hand and that’s basically you
problem at hand and that’s basically you
problem at hand and that’s basically you forging connections between concepts
forging connections between concepts
forging connections between concepts between variables between parameters
between variables between parameters
between variables between parameters death sort of solves the problem at hand
death sort of solves the problem at hand
death sort of solves the problem at hand but that doesn’t have this over
but that doesn’t have this over
but that doesn’t have this over characterization this is a visualization
characterization this is a visualization
characterization this is a visualization of one of the one of the models that
of one of the one of the models that
of one of the one of the models that were running and Blackwood for for one
were running and Blackwood for for one
were running and Blackwood for for one of our from one of our clients and and
of our from one of our clients and and
of our from one of our clients and and this is sort of the complexity that you
this is sort of the complexity that you
this is sort of the complexity that you need to have to solve the everyday
need to have to solve the everyday
need to have to solve the everyday problems every node that you see here is
problems every node that you see here is
problems every node that you see here is basically a representation of a variable
basically a representation of a variable
basically a representation of a variable or a latent variable and the
or a latent variable and the
or a latent variable and the relationships between them are basically
relationships between them are basically
relationships between them are basically edges and basically there’s no point in
edges and basically there’s no point in
edges and basically there’s no point in this thing spinning I just thought it
this thing spinning I just thought it
this thing spinning I just thought it looked cool and it helped me raise money
looked cool and it helped me raise money
looked cool and it helped me raise money back in the days
back in the days
back in the days actually the spinning I think was the
actually the spinning I think was the
actually the spinning I think was the differentiate because in one of the
differentiate because in one of the
differentiate because in one of the pitches I did it didn’t it didn’t spin
pitches I did it didn’t it didn’t spin
pitches I did it didn’t it didn’t spin and we didn’t get those money and then
and we didn’t get those money and then
and we didn’t get those money and then all of a sudden it was spinning and we
all of a sudden it was spinning and we
all of a sudden it was spinning and we got those money I don’t know if that’s
got those money I don’t know if that’s
got those money I don’t know if that’s you know all the reason but the spinning
you know all the reason but the spinning
you know all the reason but the spinning in my mind helped so but there’s there’s
in my mind helped so but there’s there’s
in my mind helped so but there’s there’s there’s no visual improvement based on
there’s no visual improvement based on
there’s no visual improvement based on that how many people have seen this
that how many people have seen this
that how many people have seen this before
before
before okay well that’s that’s just no fun okay
okay well that’s that’s just no fun okay
okay well that’s that’s just no fun okay but before before I saw it the first
but before before I saw it the first
but before before I saw it the first time interesting enough I had not seen
time interesting enough I had not seen
time interesting enough I had not seen it so the problem here is that you’re
it so the problem here is that you’re
it so the problem here is that you’re supposed to judge whether a and B the
supposed to judge whether a and B the
supposed to judge whether a and B the squares there are of the same hue or not
squares there are of the same hue or not
squares there are of the same hue or not and from my point of view there are
and from my point of view there are
and from my point of view there are extremely differentiated they look very
extremely differentiated they look very
extremely differentiated they look very differently but the problem is that
differently but the problem is that
differently but the problem is that they’re not they’re actually the same
they’re not they’re actually the same
they’re not they’re actually the same and the reason why why a lot of people
and the reason why why a lot of people
and the reason why why a lot of people think that they are think that they are
think that they are think that they are
think that they are think that they are different is because we are predicting
different is because we are predicting
different is because we are predicting based on the shadow that is being cast
based on the shadow that is being cast
based on the shadow that is being cast from a light source that we know where
from a light source that we know where
from a light source that we know where it is because we have recognized this
it is because we have recognized this
it is because we have recognized this pattern earlier in their lives that is
pattern earlier in their lives that is
pattern earlier in their lives that is also a kind of confirmation bias but
also a kind of confirmation bias but
also a kind of confirmation bias but it’s a good one
it’s a good one
it’s a good one because that’s that’s what allows us to
because that’s that’s what allows us to
because that’s that’s what allows us to actually live our lives and sometimes we
actually live our lives and sometimes we
actually live our lives and sometimes we were wrong like in these contorted
were wrong like in these contorted
were wrong like in these contorted images but but it does prove a point
images but but it does prove a point
images but but it does prove a point that does because our brains are very
that does because our brains are very
that does because our brains are very biased based on what we know already and
biased based on what we know already and
biased based on what we know already and and we would do predictions based on
and we would do predictions based on
and we would do predictions based on what we know
so basically probabilistic programming
so basically probabilistic programming what that is
what that is
what that is it basically allows us to specify any
it basically allows us to specify any
it basically allows us to specify any kind of models that we want no you don’t
kind of models that we want no you don’t
kind of models that we want no you don’t have to think about layers you don’t
have to think about layers you don’t
have to think about layers you don’t have to think about the pooling you
have to think about the pooling you
have to think about the pooling you don’t have to think about all the
don’t have to think about all the
don’t have to think about all the wording all you have to think about is
wording all you have to think about is
wording all you have to think about is that you specify how variables might
that you specify how variables might
that you specify how variables might relate to each other and you specify
relate to each other and you specify
relate to each other and you specify which parameters that might be there and
which parameters that might be there and
which parameters that might be there and how they are relating to the variables
how they are relating to the variables
how they are relating to the variables at hand and if you have that freedom
at hand and if you have that freedom
at hand and if you have that freedom then there’s nothing you cannot model
then there’s nothing you cannot model
then there’s nothing you cannot model the problem with this is that you cannot
the problem with this is that you cannot
the problem with this is that you cannot fit that with Maxim likelihood you
fit that with Maxim likelihood you
fit that with Maxim likelihood you cannot adapt that because you can’t
cannot adapt that because you can’t
cannot adapt that because you can’t assume independent observations you
assume independent observations you
assume independent observations you can’t assume that everything is its
can’t assume that everything is its
can’t assume that everything is its uniform you can’t assume what you can
uniform you can’t assume what you can
uniform you can’t assume what you can but it’s not very smart you can’t assume
but it’s not very smart you can’t assume
but it’s not very smart you can’t assume that any given parameter has a possible
that any given parameter has a possible
that any given parameter has a possible value of minus infinity or plus infinity
value of minus infinity or plus infinity
value of minus infinity or plus infinity now this this in general just makes no
now this this in general just makes no
now this this in general just makes no sense just just think about the fact
sense just just think about the fact
sense just just think about the fact that you’re supposed to predict the
that you’re supposed to predict the
that you’re supposed to predict the house prices for example if you allow
house prices for example if you allow
house prices for example if you allow your model to predict something which is
your model to predict something which is
your model to predict something which is negative then you have something that
negative then you have something that
negative then you have something that might make sense again it statistical
might make sense again it statistical
might make sense again it statistical space because there’s no reason why you
space because there’s no reason why you
space because there’s no reason why you shouldn’t be able to mirror things right
shouldn’t be able to mirror things right
shouldn’t be able to mirror things right you just look at the positive part but
you just look at the positive part but
you just look at the positive part but what about the part in the of your model
what about the part in the of your model
what about the part in the of your model that says that negative sales prices are
that says that negative sales prices are
that says that negative sales prices are also positive that that’s just nonsense
also positive that that’s just nonsense
also positive that that’s just nonsense and and these things you shouldn’t allow
and and these things you shouldn’t allow
and and these things you shouldn’t allow so that’s why you should specify your
so that’s why you should specify your
so that’s why you should specify your priors and the concept of your models
priors and the concept of your models
priors and the concept of your models very rigorously
very rigorously
very rigorously and the best thing about probabilistic
and the best thing about probabilistic
and the best thing about probabilistic programming is that we no longer have to
programming is that we no longer have to
programming is that we no longer have to be experts in Markov chain Monte Carlo
be experts in Markov chain Monte Carlo
be experts in Markov chain Monte Carlo before you have to do that but today you
before you have to do that but today you
before you have to do that but today you don’t you know you don’t have to
don’t you know you don’t have to
don’t you know you don’t have to understand what what a Hamiltonian is in
understand what what a Hamiltonian is in
understand what what a Hamiltonian is in this space you don’t have to understand
this space you don’t have to understand
this space you don’t have to understand quantum mechanics you just have to learn
quantum mechanics you just have to learn
quantum mechanics you just have to learn how to program a probabilistic
how to program a probabilistic
how to program a probabilistic programming language which is very easy
programming language which is very easy
programming language which is very easy by the way super easy if you know Python
by the way super easy if you know Python
by the way super easy if you know Python or R or Julia or C++ or C or Java
or R or Julia or C++ or C or Java
or R or Julia or C++ or C or Java learning how to program a probabilistic
learning how to program a probabilistic
learning how to program a probabilistic programming language is a walk in the
programming language is a walk in the
programming language is a walk in the park and it’s still true and complete
park and it’s still true and complete
park and it’s still true and complete mind you there are a lot of different
mind you there are a lot of different
mind you there are a lot of different things we get out of this we can get the
things we get out of this we can get the
things we get out of this we can get the full Bayesian inference with the market
full Bayesian inference with the market
full Bayesian inference with the market in Monte Carlo through algorithms such
in Monte Carlo through algorithms such
in Monte Carlo through algorithms such as Hamiltonian Markov chain Monte Carlo
as Hamiltonian Markov chain Monte Carlo
as Hamiltonian Markov chain Monte Carlo didn’t know you turn sampler that’s what
didn’t know you turn sampler that’s what
didn’t know you turn sampler that’s what you really want to do the problem with
you really want to do the problem with
you really want to do the problem with this is that still today it takes
this is that still today it takes
this is that still today it takes it takes some time there’s a there’s
it takes some time there’s a there’s
it takes some time there’s a there’s another emerging tool that’s called
another emerging tool that’s called
another emerging tool that’s called automated differentiation variational
automated differentiation variational
automated differentiation variational inference which is just a lot of
inference which is just a lot of
inference which is just a lot of different words that says that turn the
different words that says that turn the
different words that says that turn the inference problem into a maximization
inference problem into a maximization
inference problem into a maximization problem and and they would have gotten
problem and and they would have gotten
problem and and they would have gotten somewhere with that which makes these
somewhere with that which makes these
somewhere with that which makes these inference machine a lot easier to fit
inference machine a lot easier to fit
inference machine a lot easier to fit the best thing is that also the math
the best thing is that also the math
the best thing is that also the math library already has this to automate the
library already has this to automate the
library already has this to automate the differentiation so you don’t have to be
differentiation so you don’t have to be
differentiation so you don’t have to be expressing that either again all you
expressing that either again all you
expressing that either again all you have to do is learn a probabilistic
have to do is learn a probabilistic
have to do is learn a probabilistic programming language or learn a
programming language or learn a
programming language or learn a framework in in Python that supports it
framework in in Python that supports it
framework in in Python that supports it like Edward for example there are many
like Edward for example there are many
like Edward for example there are many other frameworks that do the same thing
other frameworks that do the same thing
other frameworks that do the same thing a note about uncertainty now what if I
a note about uncertainty now what if I
a note about uncertainty now what if I gave you a task your task right now is
gave you a task your task right now is
gave you a task your task right now is to take 1 million American dollars
to take 1 million American dollars
to take 1 million American dollars and you’re going to invest them in
and you’re going to invest them in
and you’re going to invest them in either a radio campaign or a TV campaign
either a radio campaign or a TV campaign
either a radio campaign or a TV campaign now I’m going to tell you that the
now I’m going to tell you that the
now I’m going to tell you that the average performance of each campaign has
average performance of each campaign has
average performance of each campaign has been 0.5 so the return of investment for
been 0.5 so the return of investment for
been 0.5 so the return of investment for an average radio campaign has been 0.5
an average radio campaign has been 0.5
an average radio campaign has been 0.5 the return on investment on an average
the return on investment on an average
the return on investment on an average TV campaign has also been 0.5 now my
TV campaign has also been 0.5 now my
TV campaign has also been 0.5 now my question to you is how would you invest
question to you is how would you invest
question to you is how would you invest does it matter well based on this
does it matter well based on this
does it matter well based on this information I would save I will just
information I would save I will just
information I would save I will just split it 5050 I mean why not they have
split it 5050 I mean why not they have
split it 5050 I mean why not they have the same performance right but what if I
the same performance right but what if I
the same performance right but what if I also told you that actually if you look
also told you that actually if you look
also told you that actually if you look at our is the distribution if you look
at our is the distribution if you look
at our is the distribution if you look over all the different radio campaigns
over all the different radio campaigns
over all the different radio campaigns that have been run and all the different
that have been run and all the different
that have been run and all the different TV campaigns that have been run if you
TV campaigns that have been run if you
TV campaigns that have been run if you look beyond the average and look at the
look beyond the average and look at the
look beyond the average and look at the individual results what do you have then
individual results what do you have then
individual results what do you have then well then you have that radio for
well then you have that radio for
well then you have that radio for example and TV they both have had
example and TV they both have had
example and TV they both have had historically a return investment of 0
historically a return investment of 0
historically a return investment of 0 which basically means it didn’t work
which basically means it didn’t work
which basically means it didn’t work that could be like your some of their
that could be like your some of their
that could be like your some of their some of the commercials you see on TV
some of the commercials you see on TV
some of the commercials you see on TV sometimes that are less than good you
sometimes that are less than good you
sometimes that are less than good you know sometimes you see these these naked
know sometimes you see these these naked
know sometimes you see these these naked gnomes running on a grass field and
gnomes running on a grass field and
gnomes running on a grass field and they’re trying to sell cell phone
they’re trying to sell cell phone
they’re trying to sell cell phone subscriptions and every law understood
subscriptions and every law understood
subscriptions and every law understood the connection but that didn’t work I’m
the connection but that didn’t work I’m
the connection but that didn’t work I’m sure I didn’t quantify that but but it
sure I didn’t quantify that but but it
sure I didn’t quantify that but but it didn’t work on me
didn’t work on me
didn’t work on me then I’m going to tell you that the
then I’m going to tell you that the
then I’m going to tell you that the maximum radio and TV performance that
maximum radio and TV performance that
maximum radio and TV performance that has been observed is that radio has had
has been observed is that radio has had
has been observed is that radio has had in his history and return investment of
in his history and return investment of
in his history and return investment of nine point three meanwhile TV has only
nine point three meanwhile TV has only
nine point three meanwhile TV has only had one point four how would you invest
had one point four how would you invest
had one point four how would you invest now would you still split it fifty-fifty
now would you still split it fifty-fifty
now would you still split it fifty-fifty I wouldn’t
now what if I tell you that this is
now what if I tell you that this is probably not the the real solution
probably not the the real solution
probably not the the real solution either in order to answer this question
either in order to answer this question
either in order to answer this question you have to ask another question in
you have to ask another question in
you have to ask another question in return you have to ask the question what
return you have to ask the question what
return you have to ask the question what is the probability of me realizing a
is the probability of me realizing a
is the probability of me realizing a return on investment greater than for
return on investment greater than for
return on investment greater than for example 0.3 let’s just take that that is
example 0.3 let’s just take that that is
example 0.3 let’s just take that that is what I want to to achieve now now we
what I want to to achieve now now we
what I want to to achieve now now we have a specified what our question is
have a specified what our question is
have a specified what our question is and then we can give it a probabilistic
and then we can give it a probabilistic
and then we can give it a probabilistic answer and then the answer to this
answer and then the answer to this
answer and then the answer to this question is that it’s about 40 percent
question is that it’s about 40 percent
question is that it’s about 40 percent probable for radio to get a return on
probable for radio to get a return on
probable for radio to get a return on investment for any given instance above
investment for any given instance above
investment for any given instance above 0.3 but it’s it’s about 90% for TV how
0.3 but it’s it’s about 90% for TV how
0.3 but it’s it’s about 90% for TV how does that go hand-in-hand with the fact
does that go hand-in-hand with the fact
does that go hand-in-hand with the fact that radio is outperform TV historically
that radio is outperform TV historically
that radio is outperform TV historically as a maximum and they have the same
as a maximum and they have the same
as a maximum and they have the same average well it’s because of the fact
average well it’s because of the fact
average well it’s because of the fact that things are distributions things are
that things are distributions things are
that things are distributions things are distributions and they are not caution
distributions and they are not caution
distributions and they are not caution now this here is the source of failure
now this here is the source of failure
now this here is the source of failure of every statistical method that you
of every statistical method that you
of every statistical method that you probably have tried before because it
probably have tried before because it
probably have tried before because it assumes that everything is symmetric in
assumes that everything is symmetric in
assumes that everything is symmetric in caution nature makes no such promise it
caution nature makes no such promise it
caution nature makes no such promise it has never said thou shalt not use Kashi
has never said thou shalt not use Kashi
has never said thou shalt not use Kashi never has that been part of any sort of
never has that been part of any sort of
never has that been part of any sort of commandment or information given to us
commandment or information given to us
commandment or information given to us by nature there is nothing special about
by nature there is nothing special about
by nature there is nothing special about the Gaussian distribution there is a few
the Gaussian distribution there is a few
the Gaussian distribution there is a few things special about it but you know
things special about it but you know
things special about it but you know let’s just ignore the central limit
let’s just ignore the central limit
let’s just ignore the central limit theorem for now because of the fact that
theorem for now because of the fact that
theorem for now because of the fact that we don’t have enough data to actually
we don’t have enough data to actually
we don’t have enough data to actually approach that anyway so let’s just
approach that anyway so let’s just
approach that anyway so let’s just ignore that for now now the point here
ignore that for now now the point here
ignore that for now now the point here is that the distribution of radio looks
is that the distribution of radio looks
is that the distribution of radio looks like this and the distribution for TV
like this and the distribution for TV
like this and the distribution for TV looks like the one below and here you
looks like the one below and here you
looks like the one below and here you can see they have the same average very
can see they have the same average very
can see they have the same average very different minima and Maxima and very
different minima and Maxima and very
different minima and Maxima and very different skewness and
different skewness and
different skewness and this is why you cannot make optimal
this is why you cannot make optimal
this is why you cannot make optimal decisions without knowing what you don’t
decisions without knowing what you don’t
decisions without knowing what you don’t know you cannot make optimal decisions
know you cannot make optimal decisions
know you cannot make optimal decisions without knowing uncertainty even though
without knowing uncertainty even though
without knowing uncertainty even though if you knew the average performance
if you knew the average performance
if you knew the average performance average performance is such a huge
average performance is such a huge
average performance is such a huge culprit in bad science and bad inference
culprit in bad science and bad inference
culprit in bad science and bad inference I cannot state this enough and that’s
I cannot state this enough and that’s
I cannot state this enough and that’s also why you should never ever ever ever
also why you should never ever ever ever
also why you should never ever ever ever ever treat the parameters of your model
ever treat the parameters of your model
ever treat the parameters of your model as if they were constants because they
as if they were constants because they
as if they were constants because they are not it’s also not interesting to ask
are not it’s also not interesting to ask
are not it’s also not interesting to ask the question how uncertain is my data
the question how uncertain is my data
the question how uncertain is my data about this parameter about this fixed
about this parameter about this fixed
about this parameter about this fixed parameter also a nonsense question not
parameter also a nonsense question not
parameter also a nonsense question not interesting and that is why we have to
interesting and that is why we have to
interesting and that is why we have to go back to basics and do it right
go back to basics and do it right
go back to basics and do it right because until we do we will never get
because until we do we will never get
because until we do we will never get further so if I can tie this all
further so if I can tie this all
further so if I can tie this all together I I created sort of a a way for
together I I created sort of a a way for
together I I created sort of a a way for for you to start playing around with
for you to start playing around with
for you to start playing around with this I am I made a docker image
this I am I made a docker image
this I am I made a docker image basically which is called our Bayesian
basically which is called our Bayesian
basically which is called our Bayesian or is the host language figure you can
or is the host language figure you can
or is the host language figure you can basically use whatever language you want
basically use whatever language you want
basically use whatever language you want it doesn’t really matter what I want to
it doesn’t really matter what I want to
it doesn’t really matter what I want to show here is basically how easy it is to
show here is basically how easy it is to
show here is basically how easy it is to deploy a docker container with a
deploy a docker container with a
deploy a docker container with a Bayesian inference engine that can model
Bayesian inference engine that can model
Bayesian inference engine that can model any problem known to man there is
any problem known to man there is
any problem known to man there is nothing you cannot do with this
nothing you cannot do with this
nothing you cannot do with this framework nothing it is more general
framework nothing it is more general
framework nothing it is more general than anything that you have ever tried
than anything that you have ever tried
than anything that you have ever tried because it can simulate everything that
because it can simulate everything that
because it can simulate everything that you have ever tried and most of the
you have ever tried and most of the
you have ever tried and most of the things you have ever tried comes from
things you have ever tried comes from
things you have ever tried comes from probability theory and this is just a
probability theory and this is just a
probability theory and this is just a pure application of probability theory
pure application of probability theory
pure application of probability theory so this is a very easy way to just snap
so this is a very easy way to just snap
so this is a very easy way to just snap that docker container and the best thing
that docker container and the best thing
that docker container and the best thing is that the functions do you write
is that the functions do you write
is that the functions do you write theory in there are automatically
theory in there are automatically
theory in there are automatically converted to rest API so that you can
converted to rest API so that you can
converted to rest API so that you can expose through this docker service so
expose through this docker service so
expose through this docker service so you have a REST API ready inference
you have a REST API ready inference
you have a REST API ready inference machine that is very much true to the
machine that is very much true to the
machine that is very much true to the scientific principle with no limitations
scientific principle with no limitations
scientific principle with no limitations and the only thing you have to pay for
and the only thing you have to pay for
and the only thing you have to pay for it is that you have to think twice now
it is that you have to think twice now
it is that you have to think twice now for those of you doesn’t like or I can
for those of you doesn’t like or I can
for those of you doesn’t like or I can make one version with Python or Julie or
make one version with Python or Julie or
make one version with Python or Julie or whatever it’s it’s not about the
whatever it’s it’s not about the
whatever it’s it’s not about the language are
language are
language are whatever I really want to convey is that
whatever I really want to convey is that
whatever I really want to convey is that modeling needs to be rebooted we need to
modeling needs to be rebooted we need to
modeling needs to be rebooted we need to think again on how we define our models
think again on how we define our models
think again on how we define our models how we specify our malls how we think
how we specify our malls how we think
how we specify our malls how we think about our models how we relate to our
about our models how we relate to our
about our models how we relate to our models we can never ever relate to our
models we can never ever relate to our
models we can never ever relate to our models without uncertainty we will
models without uncertainty we will
models without uncertainty we will always fail that’s why I think that
always fail that’s why I think that
always fail that’s why I think that playing around with this is it’s a good
playing around with this is it’s a good
playing around with this is it’s a good way to to learn more about these things
way to to learn more about these things
way to to learn more about these things this is just an example of how you would
this is just an example of how you would
this is just an example of how you would actually use this so I wrote a very very
actually use this so I wrote a very very
actually use this so I wrote a very very stupid container that it’s called the
stupid container that it’s called the
stupid container that it’s called the stupid weather and it’s stupid because
stupid weather and it’s stupid because
stupid weather and it’s stupid because it always gives you the same answer so
it always gives you the same answer so
it always gives you the same answer so no matter what you send in as parameter
no matter what you send in as parameter
no matter what you send in as parameter it always gives you something stupid so
it always gives you something stupid so
it always gives you something stupid so that that’s just to show you how you
that that’s just to show you how you
that that’s just to show you how you write a function it’s not supposed to
write a function it’s not supposed to
write a function it’s not supposed to convey any intelligence it’s just a
convey any intelligence it’s just a
convey any intelligence it’s just a placeholder it’s just boilerplate code
placeholder it’s just boilerplate code
placeholder it’s just boilerplate code for you to ingest your algorithm but it
for you to ingest your algorithm but it
for you to ingest your algorithm but it shows neatly how how you’re transforming
shows neatly how how you’re transforming
shows neatly how how you’re transforming this to rest api and it’s as simple as
this to rest api and it’s as simple as
this to rest api and it’s as simple as this just talk around and then you have
this just talk around and then you have
this just talk around and then you have it so even if you’re not you know a
it so even if you’re not you know a
it so even if you’re not you know a back-end developer or a full-stack
back-end developer or a full-stack
back-end developer or a full-stack developer it’s still easy to deploy and
developer it’s still easy to deploy and
developer it’s still easy to deploy and run your own solutions and you know
run your own solutions and you know
run your own solutions and you know docker container can run anywhere in the
docker container can run anywhere in the
docker container can run anywhere in the cloud can run on Google they can run on
cloud can run on Google they can run on
cloud can run on Google they can run on Amazon I think even it can run on on
Amazon I think even it can run on on
Amazon I think even it can run on on Microsoft’s cloud sure probably I didn’t
Microsoft’s cloud sure probably I didn’t
Microsoft’s cloud sure probably I didn’t try that but but but I would assume that
try that but but but I would assume that
try that but but but I would assume that they that they can run docker containers
so if I can leave you with one
so if I can leave you with one conclusion it is basically think again
conclusion it is basically think again
conclusion it is basically think again about everything that you were ever
about everything that you were ever
about everything that you were ever taught every statistics class you had
taught every statistics class you had
taught every statistics class you had every applied machine learning class all
every applied machine learning class all
every applied machine learning class all of it
of it
of it rethink its reevaluate it be critical to
rethink its reevaluate it be critical to
rethink its reevaluate it be critical to whatever you were told because I got I
whatever you were told because I got I
whatever you were told because I got I can assure you that in most cases it was
can assure you that in most cases it was
can assure you that in most cases it was a flatulent lie and that lie didn’t
a flatulent lie and that lie didn’t
a flatulent lie and that lie didn’t happen because of the fact that people
happen because of the fact that people
happen because of the fact that people wanted to lie to you it’s based on
wanted to lie to you it’s based on
wanted to lie to you it’s based on ignorance and it’s based on you know
ignorance and it’s based on you know
ignorance and it’s based on you know decades of malpractice in this field
decades of malpractice in this field
decades of malpractice in this field because computation has caught up with
because computation has caught up with
because computation has caught up with us before it was ok to do
us before it was ok to do
us before it was ok to do was done because we had no other choice
was done because we had no other choice
was done because we had no other choice today no longer okay we have all the
today no longer okay we have all the
today no longer okay we have all the choices in the world it’s not hard
choices in the world it’s not hard
choices in the world it’s not hard getting a computational cluster with 200
getting a computational cluster with 200
getting a computational cluster with 200 gigabytes of RAM and the 64 CPUs or even
gigabytes of RAM and the 64 CPUs or even
gigabytes of RAM and the 64 CPUs or even 5000 GPUs those things are at our
5000 GPUs those things are at our
5000 GPUs those things are at our disposal we don’t need to take the same
disposal we don’t need to take the same
disposal we don’t need to take the same shortcuts as we did dangerous shortcuts
shortcuts as we did dangerous shortcuts
shortcuts as we did dangerous shortcuts no less so I hope you will think about
no less so I hope you will think about
no less so I hope you will think about that another thing is that whenever
that another thing is that whenever
that another thing is that whenever you’re solving a problem I would like
you’re solving a problem I would like
you’re solving a problem I would like you to think about that whatever problem
you to think about that whatever problem
you to think about that whatever problem you’re solving whatever machine learning
you’re solving whatever machine learning
you’re solving whatever machine learning application you’re writing it is an
application you’re writing it is an
application you’re writing it is an application of the scientific principle
application of the scientific principle
application of the scientific principle please stay true to that there’s a
please stay true to that there’s a
please stay true to that there’s a reason why we have it science is a way
reason why we have it science is a way
reason why we have it science is a way for us to not be biased science is a way
for us to not be biased science is a way
for us to not be biased science is a way for us to discover truths about the
for us to discover truths about the
for us to discover truths about the world that we live in this should not be
world that we live in this should not be
world that we live in this should not be ignored or taken lightly and that’s why
ignored or taken lightly and that’s why
ignored or taken lightly and that’s why you know crazy people like Trump can get
you know crazy people like Trump can get
you know crazy people like Trump can get away with saying that there is no such
away with saying that there is no such
away with saying that there is no such thing as global warming because he does
thing as global warming because he does
thing as global warming because he does not adhere to the scientific principle
not adhere to the scientific principle
not adhere to the scientific principle so you know you can either be Trump or
so you know you can either be Trump or
so you know you can either be Trump or you can stay true to the scientific
you can stay true to the scientific
you can stay true to the scientific principle and those two are the only
principle and those two are the only
principle and those two are the only extremes my friends so another thing
extremes my friends so another thing
extremes my friends so another thing that I want to say is always state your
that I want to say is always state your
that I want to say is always state your mind whatever you know about the problem
mind whatever you know about the problem
mind whatever you know about the problem I assure you that that knowledge is
I assure you that that knowledge is
I assure you that that knowledge is critical and important do not pretend
critical and important do not pretend
critical and important do not pretend and fall into this trap more I want to
and fall into this trap more I want to
and fall into this trap more I want to do unbiased research there’s no such
do unbiased research there’s no such
do unbiased research there’s no such thing no such thing understand this
thing no such thing understand this
thing no such thing understand this there is no bias free research there is
there is no bias free research there is
there is no bias free research there is no scientific result that can be
no scientific result that can be
no scientific result that can be achieved without assumption you are free
achieved without assumption you are free
achieved without assumption you are free to evaluate your assumptions again
to evaluate your assumptions again
to evaluate your assumptions again restate them that’s good
restate them that’s good
restate them that’s good that’s progress that is science but
that’s progress that is science but
that’s progress that is science but before you’re observing data state your
before you’re observing data state your
before you’re observing data state your mind and you have to because otherwise
mind and you have to because otherwise
mind and you have to because otherwise you got nothing you got a result out but
you got nothing you got a result out but
you got nothing you got a result out but that was just picked out of thin air
that was just picked out of thin air
that was just picked out of thin air it’s nothing special about those
it’s nothing special about those
it’s nothing special about those coefficients that came
coefficients that came
coefficients that came nothing at all and until people realize
nothing at all and until people realize
nothing at all and until people realize this we will still have applications
this we will still have applications
this we will still have applications that believe that Central Park is the
that believe that Central Park is the
that believe that Central Park is the red light and it is not even though that
red light and it is not even though that
red light and it is not even though that might look like it from from a different
might look like it from from a different
might look like it from from a different scale we need to do better and we can’t
scale we need to do better and we can’t
scale we need to do better and we can’t do better and maybe the most important
do better and maybe the most important
do better and maybe the most important thing of all is that with this framework
thing of all is that with this framework
thing of all is that with this framework and with this principle of thinking you
and with this principle of thinking you
and with this principle of thinking you are able to be free you are able to be
are able to be free you are able to be
are able to be free you are able to be creative and most of all you are able to
creative and most of all you are able to
creative and most of all you are able to have so much more fun building your
have so much more fun building your
have so much more fun building your models because you are not forced into a
models because you are not forced into a
models because you are not forced into a paradigm that someone else defined for
paradigm that someone else defined for
paradigm that someone else defined for you because it made the math nice thanks
I think we have time for one question
I think we have time for one question somebody asked where can I read more
somebody asked where can I read more
somebody asked where can I read more about this any good resources yes there
about this any good resources yes there
about this any good resources yes there are a few great books that I can slowly
are a few great books that I can slowly
are a few great books that I can slowly recommend and I will do them in
recommend and I will do them in
recommend and I will do them in mathematical requirement order so if
mathematical requirement order so if
mathematical requirement order so if you’re a hardcore mathematician or a
you’re a hardcore mathematician or a
you’re a hardcore mathematician or a theoretical physicist or anyone with a
theoretical physicist or anyone with a
theoretical physicist or anyone with a computational background with a deep
computational background with a deep
computational background with a deep understanding of mathematics then you
understanding of mathematics then you
understanding of mathematics then you can go directly to read a book called
can go directly to read a book called
can go directly to read a book called the handbook of Markov chain Monte Carlo
the handbook of Markov chain Monte Carlo
the handbook of Markov chain Monte Carlo that is a very technical book and it
that is a very technical book and it
that is a very technical book and it describes the processes behind the
describes the processes behind the
describes the processes behind the probabilistic modeling if you are a
probabilistic modeling if you are a
probabilistic modeling if you are a little bit less mathematical but still
little bit less mathematical but still
little bit less mathematical but still has quite a bit of mathematics you
has quite a bit of mathematics you
has quite a bit of mathematics you should read the section about graphical
should read the section about graphical
should read the section about graphical models made by bishop and then a book
models made by bishop and then a book
models made by bishop and then a book called machine learning and pattern
called machine learning and pattern
called machine learning and pattern recognition but the most important book
recognition but the most important book
recognition but the most important book of all perhaps to read is is one of the
of all perhaps to read is is one of the
of all perhaps to read is is one of the books called statistical rethinking and
books called statistical rethinking and
books called statistical rethinking and that book explains a lot of the concepts
that book explains a lot of the concepts
that book explains a lot of the concepts that I’ve been badgering now that you
that I’ve been badgering now that you
that I’ve been badgering now that you know somewhere along the line we just
know somewhere along the line we just
know somewhere along the line we just got lost that has both text that you
got lost that has both text that you
got lost that has both text that you know is consumable by by people and it
know is consumable by by people and it
know is consumable by by people and it has a little bit of math so you can sort
has a little bit of math so you can sort
has a little bit of math so you can sort of put it in context those are really
of put it in context those are really
of put it in context those are really the books I would recommend in this okay
the books I would recommend in this okay
the books I would recommend in this okay thank you and I’ll tweet their resources
thank you and I’ll tweet their resources
thank you and I’ll tweet their resources to the go to hashtag go to CPA okay
to the go to hashtag go to CPA okay
to the go to hashtag go to CPA okay thank you my thank you
thank you my thank you
thank you my thank you [Applause]
Be First to Comment