GOTO 2017 • Improving Business Decision Making with Bayesian Artificial Intelligence • Michael Green

[Music]

[Music] [Applause]

so hi everyone as you heard my name is

so hi everyone as you heard my name is michael green and i’m indeed here to

michael green and i’m indeed here to

michael green and i’m indeed here to tell you about a different approach to

tell you about a different approach to

tell you about a different approach to to building algorithms and building

to building algorithms and building

to building algorithms and building machine learning methods really I’m also

machine learning methods really I’m also

machine learning methods really I’m also going to argue that they are

going to argue that they are

going to argue that they are fundamentally the same thing and you’ll

fundamentally the same thing and you’ll

fundamentally the same thing and you’ll see that a little bit later in my talk

see that a little bit later in my talk

see that a little bit later in my talk but that’s let’s get cracking basically

but that’s let’s get cracking basically

but that’s let’s get cracking basically I will I’ll talk about the overview of

I will I’ll talk about the overview of

I will I’ll talk about the overview of AI and machine learning and I’m not the

AI and machine learning and I’m not the

AI and machine learning and I’m not the first one to do this and there are lots

first one to do this and there are lots

first one to do this and there are lots of people who who have their take on it

of people who who have their take on it

of people who who have their take on it but this will be my take I’ll also try

but this will be my take I’ll also try

but this will be my take I’ll also try to extend to you the idea and concept of

to extend to you the idea and concept of

to extend to you the idea and concept of why this is not enough we are very good

why this is not enough we are very good

why this is not enough we are very good at telling ourselves that we have come

at telling ourselves that we have come

at telling ourselves that we have come really far in AI and I would actually

really far in AI and I would actually

really far in AI and I would actually tend to disagree with that I think we’re

tend to disagree with that I think we’re

tend to disagree with that I think we’re we’re playing around in the pedaling

we’re playing around in the pedaling

we’re playing around in the pedaling pool and it’s simply not good enough we

pool and it’s simply not good enough we

pool and it’s simply not good enough we need to innovate this area we need to be

need to innovate this area we need to be

need to innovate this area we need to be better I will also talk about how

better I will also talk about how

better I will also talk about how perception versus inference can work in

perception versus inference can work in

perception versus inference can work in a computer I will make a short note

a computer I will make a short note

a computer I will make a short note about our patient brains because that’s

about our patient brains because that’s

about our patient brains because that’s fundamentally how how we reason as

fundamentally how how we reason as

fundamentally how how we reason as people at least from macroscopic

people at least from macroscopic

people at least from macroscopic perspective I’ll also talk a little bit

perspective I’ll also talk a little bit

perspective I’ll also talk a little bit about probabilistic programming and why

about probabilistic programming and why

about probabilistic programming and why I see that as a very key point to to

I see that as a very key point to to

I see that as a very key point to to marrying two very different field or

marrying two very different field or

marrying two very different field or differentiated field today and in the

differentiated field today and in the

differentiated field today and in the end I’ll tie all of it together so that

end I’ll tie all of it together so that

end I’ll tie all of it together so that you can see how you can actually

you can see how you can actually

you can see how you can actually practically deploy a solution like this

but basically if we just go back to

but basically if we just go back to basic so I know a lot of different

basic so I know a lot of different

basic so I know a lot of different definitions of artificial intelligence

definitions of artificial intelligence

definitions of artificial intelligence there there are a lot of them out there

there there are a lot of them out there

there there are a lot of them out there and none of them says the ability to

and none of them says the ability to

and none of them says the ability to drive a car while not crashing that’s

drive a car while not crashing that’s

drive a car while not crashing that’s simply not artificial intelligence that

simply not artificial intelligence that

simply not artificial intelligence that is that is something that solves a

is that is something that solves a

is that is something that solves a domain-specific problem that is

domain-specific problem that is

domain-specific problem that is challenging yes but it’s not AI neither

challenging yes but it’s not AI neither

challenging yes but it’s not AI neither is diagnosing a health disease in in a

is diagnosing a health disease in in a

is diagnosing a health disease in in a page

page

page that comes into the ER that’s also not a

that comes into the ER that’s also not a

that comes into the ER that’s also not a I neither is actually well what I do in

I neither is actually well what I do in

I neither is actually well what I do in my company that’s also not AI all of

my company that’s also not AI all of

my company that’s also not AI all of those are examples of narrow AI where we

those are examples of narrow AI where we

those are examples of narrow AI where we try to use machines to do more clever

try to use machines to do more clever

try to use machines to do more clever things than an individual person could

things than an individual person could

things than an individual person could do at the same task but my definition of

do at the same task but my definition of

do at the same task but my definition of AI is is basically that it’s sort of the

AI is is basically that it’s sort of the

AI is is basically that it’s sort of the behavior as shown by an agent that you

behavior as shown by an agent that you

behavior as shown by an agent that you stuff into an environment and that

stuff into an environment and that

stuff into an environment and that behavior in itself seems to optimize the

behavior in itself seems to optimize the

behavior in itself seems to optimize the concept of future freedom now that is

concept of future freedom now that is

concept of future freedom now that is the closest definition to to artificial

the closest definition to to artificial

the closest definition to to artificial intelligence that I that I can come to

intelligence that I that I can come to

intelligence that I that I can come to because that doesn’t say anything you

because that doesn’t say anything you

because that doesn’t say anything you know yeah optimize the least square

know yeah optimize the least square

know yeah optimize the least square error do black back propagation to to

error do black back propagation to to

error do black back propagation to to make sure that the croissant repairer

make sure that the croissant repairer

make sure that the croissant repairer looks good all of those things are

looks good all of those things are

looks good all of those things are man-made and I assure you our brains do

man-made and I assure you our brains do

man-made and I assure you our brains do not do Brack propagation it’s simply not

not do Brack propagation it’s simply not

not do Brack propagation it’s simply not true

true

true no one is telling our children how to

no one is telling our children how to

no one is telling our children how to stand up they’re not getting smacked on

stand up they’re not getting smacked on

stand up they’re not getting smacked on the hands for failing my son he failed

the hands for failing my son he failed

the hands for failing my son he failed several times this morning but he

several times this morning but he

several times this morning but he actually succeeded when I left the room

actually succeeded when I left the room

actually succeeded when I left the room so without my encouragement he actually

so without my encouragement he actually

so without my encouragement he actually did better that might say something

did better that might say something

did better that might say something about my pedagogical skills or the fact

about my pedagogical skills or the fact

about my pedagogical skills or the fact that it doesn’t need my training to do

that it doesn’t need my training to do

that it doesn’t need my training to do these things so there’s a fundamental

these things so there’s a fundamental

these things so there’s a fundamental thing that’s missing there’s a missing

thing that’s missing there’s a missing

thing that’s missing there’s a missing piece in our understanding of how

piece in our understanding of how

piece in our understanding of how knowledge is represented accumulated and

knowledge is represented accumulated and

knowledge is represented accumulated and acted upon and that is what fascinates

acted upon and that is what fascinates

acted upon and that is what fascinates me more than anything I’m sure you’ve

me more than anything I’m sure you’ve

me more than anything I’m sure you’ve seen this before it’s just a definition

seen this before it’s just a definition

seen this before it’s just a definition of what AI is today so there’s a lot of

of what AI is today so there’s a lot of

of what AI is today so there’s a lot of things but but basically we are in the

things but but basically we are in the

things but but basically we are in the top level there every single application

top level there every single application

top level there every single application you have ever seen heard of today is in

you have ever seen heard of today is in

you have ever seen heard of today is in this field artificial narrow

this field artificial narrow

this field artificial narrow intelligence there is no such thing as

intelligence there is no such thing as

intelligence there is no such thing as artificial general intelligence it

artificial general intelligence it

artificial general intelligence it doesn’t exist today and if someone says

doesn’t exist today and if someone says

doesn’t exist today and if someone says they have it they’re lying because we

they have it they’re lying because we

they have it they’re lying because we don’t have the representation of how to

don’t have the representation of how to

don’t have the representation of how to capture knowledge no one has that you

capture knowledge no one has that you

capture knowledge no one has that you simply cannot express this in Python or

simply cannot express this in Python or

simply cannot express this in Python or R or whatever language you want it

R or whatever language you want it

R or whatever language you want it doesn’t exist we need to figure out how

doesn’t exist we need to figure out how

doesn’t exist we need to figure out how to represent this

to represent this

to represent this so artificial general intelligence that

so artificial general intelligence that

so artificial general intelligence that is really the task of saying how could

is really the task of saying how could

is really the task of saying how could we actually take an AI that knows how to

we actually take an AI that knows how to

we actually take an AI that knows how to drive a car stuff that into a different

drive a car stuff that into a different

drive a car stuff that into a different environment and make it utilize the

environment and make it utilize the

environment and make it utilize the skills that they had learning how to

skills that they had learning how to

skills that they had learning how to drive the car and apply that to a

drive the car and apply that to a

drive the car and apply that to a completely different field that is the

completely different field that is the

completely different field that is the main transfer and that is something that

main transfer and that is something that

main transfer and that is something that no AI can do today

no AI can do today

no AI can do today now artificial superintelligence and the

now artificial superintelligence and the

now artificial superintelligence and the only reason I’m mentioning this is

only reason I’m mentioning this is

only reason I’m mentioning this is because it’s really really far away

because it’s really really far away

because it’s really really far away the only thing super about this house

the only thing super about this house

the only thing super about this house super far away it is into the future and

super far away it is into the future and

super far away it is into the future and and there’s been a lot of people you

and there’s been a lot of people you

and there’s been a lot of people you know battling about this one of the one

know battling about this one of the one

know battling about this one of the one of the famous guys Elon Musk he is more

of the famous guys Elon Musk he is more

of the famous guys Elon Musk he is more of a doomsday kind of guy with respect

of a doomsday kind of guy with respect

of a doomsday kind of guy with respect to this and he and he should be because

to this and he and he should be because

to this and he and he should be because that gets him money into his company so

that gets him money into his company so

that gets him money into his company so it’s it’s a very it’s a very smart smart

it’s it’s a very it’s a very smart smart

it’s it’s a very it’s a very smart smart move that he says that AI is going to

move that he says that AI is going to

move that he says that AI is going to destroy the world so I’m creating a

destroy the world so I’m creating a

destroy the world so I’m creating a start-up that’s going to sort of

start-up that’s going to sort of

start-up that’s going to sort of regulate that so imagine how hard it was

regulate that so imagine how hard it was

regulate that so imagine how hard it was to raise money for that venture there

to raise money for that venture there

to raise money for that venture there are other things to consider about super

are other things to consider about super

are other things to consider about super intelligence and that’s that it is

intelligence and that’s that it is

intelligence and that’s that it is conceptually possible it is something

conceptually possible it is something

conceptually possible it is something that sooner or later if we do capture

that sooner or later if we do capture

that sooner or later if we do capture how to represent knowledge how to

how to represent knowledge how to

how to represent knowledge how to transfer knowledge how to accumulate

transfer knowledge how to accumulate

transfer knowledge how to accumulate knowledge if we know that then there is

knowledge if we know that then there is

knowledge if we know that then there is no stopping us from deploying this into

no stopping us from deploying this into

no stopping us from deploying this into the world and for all practical purposes

the world and for all practical purposes

the world and for all practical purposes now sounding a lot like musk what we

now sounding a lot like musk what we

now sounding a lot like musk what we released at that time would basically be

released at that time would basically be

released at that time would basically be a god to us and the whole thing in the

a god to us and the whole thing in the

a god to us and the whole thing in the scary part about that is will it be a

scary part about that is will it be a

scary part about that is will it be a nice God nobody knows but then again

nice God nobody knows but then again

nice God nobody knows but then again there’s very little proof in history

there’s very little proof in history

there’s very little proof in history that intelligence feeds violence so if

that intelligence feeds violence so if

that intelligence feeds violence so if anything the world is a safer place than

anything the world is a safer place than

anything the world is a safer place than it’s ever been before and and I would

it’s ever been before and and I would

it’s ever been before and and I would like to see that as an evolution of our

like to see that as an evolution of our

like to see that as an evolution of our intelligence as an evolution of our

intelligence as an evolution of our

intelligence as an evolution of our compassion I don’t see intelligence

compassion I don’t see intelligence

compassion I don’t see intelligence being a necessity for murderous robots

being a necessity for murderous robots

being a necessity for murderous robots so I’m not very afraid of that scenario

so I’m not very afraid of that scenario

so I’m not very afraid of that scenario I know we won’t be the smartest cookies

I know we won’t be the smartest cookies

I know we won’t be the smartest cookies anymore in the world but maybe that’s

anymore in the world but maybe that’s

anymore in the world but maybe that’s not so bad

not so bad

not so bad that was always going to happen and

that was always going to happen and

that was always going to happen and evolution will make sure that no matter

evolution will make sure that no matter

evolution will make sure that no matter what

what

what but basically the landscape looks like

but basically the landscape looks like

but basically the landscape looks like this so you know you have this this

this so you know you have this this

this so you know you have this this disturb artificial intelligence that

disturb artificial intelligence that

disturb artificial intelligence that sort of ubiquitous and describes

sort of ubiquitous and describes

sort of ubiquitous and describes everything from doing a linear

everything from doing a linear

everything from doing a linear regression in Excel to a self-driving

regression in Excel to a self-driving

regression in Excel to a self-driving car to identifying melanoma on a cell

car to identifying melanoma on a cell

car to identifying melanoma on a cell phone and and and all of these things

phone and and and all of these things

phone and and and all of these things are are not artificial intelligence but

are are not artificial intelligence but

are are not artificial intelligence but bets just become a buzzword just like

bets just become a buzzword just like

bets just become a buzzword just like big data I very much agree with the

big data I very much agree with the

big data I very much agree with the previous speakers about this the way I

previous speakers about this the way I

previous speakers about this the way I see it is that AI today is two things

see it is that AI today is two things

see it is that AI today is two things it’s perception machines and there’s

it’s perception machines and there’s

it’s perception machines and there’s inference machines and by inference that

inference machines and by inference that

inference machines and by inference that only mean forecasting or sort of

only mean forecasting or sort of

only mean forecasting or sort of prediction I mean really inference where

prediction I mean really inference where

prediction I mean really inference where you actually predict without actually

you actually predict without actually

you actually predict without actually having any data now under the perception

having any data now under the perception

having any data now under the perception part we’ve come a long way perception

part we’ve come a long way perception

part we’ve come a long way perception machines are everywhere those are the

machines are everywhere those are the

machines are everywhere those are the machines that I should know how to drive

machines that I should know how to drive

machines that I should know how to drive a car those are the machines that know

a car those are the machines that know

a car those are the machines that know how to identify the kites in the in the

how to identify the kites in the in the

how to identify the kites in the in the images that we saw all of those deep

images that we saw all of those deep

images that we saw all of those deep learning applications that they’re

learning applications that they’re

learning applications that they’re basically perception machine they can

basically perception machine they can

basically perception machine they can conceptualize something that they

conceptualize something that they

conceptualize something that they actually get as input either through

actually get as input either through

actually get as input either through visual stimuli or auditory stimuli they

visual stimuli or auditory stimuli they

visual stimuli or auditory stimuli they can sort of categorize it but they

can sort of categorize it but they

can sort of categorize it but they cannot make sense of it and I’ll show

cannot make sense of it and I’ll show

cannot make sense of it and I’ll show you examples of that and that’s why I

you examples of that and that’s why I

you examples of that and that’s why I reasoned that we need more we need to

reasoned that we need more we need to

reasoned that we need more we need to move into proper inference where we

move into proper inference where we

move into proper inference where we actually have a causal understanding a

actually have a causal understanding a

actually have a causal understanding a representation of the world that we’re

representation of the world that we’re

representation of the world that we’re living in and only then can we actually

living in and only then can we actually

living in and only then can we actually talk about pure intelligence but we can

talk about pure intelligence but we can

talk about pure intelligence but we can get you know closer and I’ll show you

get you know closer and I’ll show you

get you know closer and I’ll show you how to do that the biggest problems in

how to do that the biggest problems in

how to do that the biggest problems in data science today which is also another

data science today which is also another

data science today which is also another term for applied artificial intelligence

term for applied artificial intelligence

term for applied artificial intelligence is that data is actually not as

is that data is actually not as

is that data is actually not as ubiquitous and available as you might

ubiquitous and available as you might

ubiquitous and available as you might think

think

think for many interesting domains there is

for many interesting domains there is

for many interesting domains there is simply no data and the data this there

simply no data and the data this there

simply no data and the data this there is exceedingly noisy it might be a

is exceedingly noisy it might be a

is exceedingly noisy it might be a flat-out lie it might be based on

flat-out lie it might be based on

flat-out lie it might be based on surveys and we know that people lie in

surveys and we know that people lie in

surveys and we know that people lie in service that’s also a problem structure

service that’s also a problem structure

service that’s also a problem structure the problem with with structure is also

the problem with with structure is also

the problem with with structure is also that how do you represent the concept in

that how do you represent the concept in

that how do you represent the concept in the mathematical structure not

the mathematical structure not

the mathematical structure not necessarily in parameter space but just

necessarily in parameter space but just

necessarily in parameter space but just structurally how do you construct your

structurally how do you construct your

structurally how do you construct your layers in a neural network for example

identifiability what I mean by that is

identifiability what I mean by that is that for any given data sets there are

that for any given data sets there are

that for any given data sets there are millions of models that fit that data

millions of models that fit that data

millions of models that fit that data set generalizes from that data set

set generalizes from that data set

set generalizes from that data set equally well and many of them do not

equally well and many of them do not

equally well and many of them do not correspond to the physical reality that

correspond to the physical reality that

correspond to the physical reality that we’re living in

we’re living in

we’re living in so there are statistical truths

so there are statistical truths

so there are statistical truths parameter truths and there are physical

parameter truths and there are physical

parameter truths and there are physical realities and they’re not the same thing

realities and they’re not the same thing

realities and they’re not the same thing that’s why my previous field theoretical

that’s why my previous field theoretical

that’s why my previous field theoretical physics is sometimes problematic because

physics is sometimes problematic because

physics is sometimes problematic because quantum quantum theory that I sort of

quantum quantum theory that I sort of

quantum quantum theory that I sort of specialized in that’s has many different

specialized in that’s has many different

specialized in that’s has many different interpretations and then nobody really

interpretations and then nobody really

interpretations and then nobody really knows what’s going on but we know we can

knows what’s going on but we know we can

knows what’s going on but we know we can calculate stuff from it so it makes

calculate stuff from it so it makes

calculate stuff from it so it makes sense in the math but as soon as we push

sense in the math but as soon as we push

sense in the math but as soon as we push this button but what’s really happening

this button but what’s really happening

this button but what’s really happening then you know well we’re basically

then you know well we’re basically

then you know well we’re basically screwed because no one knows and a lot

screwed because no one knows and a lot

screwed because no one knows and a lot of people like to pretend that they know

of people like to pretend that they know

of people like to pretend that they know and then there are some people like the

and then there are some people like the

and then there are some people like the Copenhagen interpretation that says that

Copenhagen interpretation that says that

Copenhagen interpretation that says that well just shut up and do the math which

well just shut up and do the math which

well just shut up and do the math which is basically don’t ask the question

is basically don’t ask the question

is basically don’t ask the question because they cannot be answered

because they cannot be answered

because they cannot be answered Hawking adheres to this school by the

Hawking adheres to this school by the

Hawking adheres to this school by the way he’s also one of one of the guys

way he’s also one of one of the guys

way he’s also one of one of the guys who’s super scared of super intelligence

who’s super scared of super intelligence

who’s super scared of super intelligence funnily enough because he’s a clever

funnily enough because he’s a clever

funnily enough because he’s a clever cookie there’s also the thing about

cookie there’s also the thing about

cookie there’s also the thing about priors so every time that you you

priors so every time that you you

priors so every time that you you address a problem as as a human whatever

address a problem as as a human whatever

address a problem as as a human whatever problem I give you as an individual you

problem I give you as an individual you

problem I give you as an individual you will have a lot of prior knowledge

will have a lot of prior knowledge

will have a lot of prior knowledge you’ll have a half or whole life

you’ll have a half or whole life

you’ll have a half or whole life depending on how old you are of

depending on how old you are of

depending on how old you are of knowledge that you’ve accumulated this

knowledge that you’ve accumulated this

knowledge that you’ve accumulated this knowledge might transfer from another

knowledge might transfer from another

knowledge might transfer from another person that they just told you about

person that they just told you about

person that they just told you about something but you can apply this

something but you can apply this

something but you can apply this knowledge to the problem at hand you can

knowledge to the problem at hand you can

knowledge to the problem at hand you can represent that knowledge in the domain

represent that knowledge in the domain

represent that knowledge in the domain of the problem that you’re trying to

of the problem that you’re trying to

of the problem that you’re trying to solve and that is something that we also

solve and that is something that we also

solve and that is something that we also can actually mimic today through the

can actually mimic today through the

can actually mimic today through the concept of priors and that is that

concept of priors and that is that

concept of priors and that is that basically the way of encoding an idea or

basically the way of encoding an idea or

basically the way of encoding an idea or a sort of knowledge as the statistical

a sort of knowledge as the statistical

a sort of knowledge as the statistical prior and as a statistical distribution

prior and as a statistical distribution

prior and as a statistical distribution that can be put on par with data I’ll

that can be put on par with data I’ll

that can be put on par with data I’ll show you later how to do that as well

show you later how to do that as well

show you later how to do that as well the last part but not the least

the last part but not the least

the last part but not the least important one is uncertainty I cannot

important one is uncertainty I cannot

important one is uncertainty I cannot stress

stress

stress enough how important uncertainty is to

enough how important uncertainty is to

enough how important uncertainty is to do optimal decision-making you basically

do optimal decision-making you basically

do optimal decision-making you basically cannot make optimal decisions without

cannot make optimal decisions without

cannot make optimal decisions without knowing what you don’t know and I will

knowing what you don’t know and I will

knowing what you don’t know and I will stress that point several times during

stress that point several times during

stress that point several times during this talk during the remaining thirty

this talk during the remaining thirty

this talk during the remaining thirty nine minutes of it it’s really great I

nine minutes of it it’s really great I

nine minutes of it it’s really great I can actually see how little time I have

can actually see how little time I have

can actually see how little time I have left so I will not show you more

left so I will not show you more

left so I will not show you more equations and and it’s it’s not because

equations and and it’s it’s not because

equations and and it’s it’s not because I I’m particularly fond of them but they

I I’m particularly fond of them but they

I I’m particularly fond of them but they do help express ideas so in in the top

do help express ideas so in in the top

do help express ideas so in in the top level that’s basically a complete a

level that’s basically a complete a

level that’s basically a complete a compact way of describing any problem

compact way of describing any problem

compact way of describing any problem that you might approach it’s basically a

that you might approach it’s basically a

that you might approach it’s basically a probability distribution over the data

probability distribution over the data

probability distribution over the data that you’re a Fed they are the X’s the

that you’re a Fed they are the X’s the

that you’re a Fed they are the X’s the Y’s those are the things that you want

Y’s those are the things that you want

Y’s those are the things that you want to be able to explain and the Thetas

to be able to explain and the Thetas

to be able to explain and the Thetas they represent all of the different

they represent all of the different

they represent all of the different parameters of your model stuff you don’t

parameters of your model stuff you don’t

parameters of your model stuff you don’t know it can also be latent variables

know it can also be latent variables

know it can also be latent variables concept that you know exists but that

concept that you know exists but that

concept that you know exists but that you don’t have observational data for

you don’t have observational data for

you don’t have observational data for all that is the definition of a problem

all that is the definition of a problem

all that is the definition of a problem space now what machine learning has

space now what machine learning has

space now what machine learning has traditionally done ever since Fisher

traditionally done ever since Fisher

traditionally done ever since Fisher it’s basically that they that they

it’s basically that they that they

it’s basically that they that they looked at this with a question that

looked at this with a question that

looked at this with a question that everybody knew was wrong they basically

everybody knew was wrong they basically

everybody knew was wrong they basically said that what is the probability

said that what is the probability

said that what is the probability distribution of the data that I got

distribution of the data that I got

distribution of the data that I got pretending that is random given a fixed

pretending that is random given a fixed

pretending that is random given a fixed hypothesis that I don’t know that I’m

hypothesis that I don’t know that I’m

hypothesis that I don’t know that I’m actually searching for so then the

actually searching for so then the

actually searching for so then the problem actually became for all

problem actually became for all

problem actually became for all machining applications which sort of

machining applications which sort of

machining applications which sort of hypothesis could i generate that’s the

hypothesis could i generate that’s the

hypothesis could i generate that’s the most consistent with the data set that

most consistent with the data set that

most consistent with the data set that looks like my data set but that’s really

looks like my data set but that’s really

looks like my data set but that’s really not my data set and you can you can ask

not my data set and you can you can ask

not my data set and you can you can ask the question is that a reasonable

the question is that a reasonable

the question is that a reasonable question and then I will tell you it is

question and then I will tell you it is

question and then I will tell you it is not it is poppycock that question is not

not it is poppycock that question is not

not it is poppycock that question is not worth asking why because you’re

worth asking why because you’re

worth asking why because you’re basically just trying to find

basically just trying to find

basically just trying to find explanations to fit your truth that is

explanations to fit your truth that is

explanations to fit your truth that is not science ladies and gentlemen there

not science ladies and gentlemen there

not science ladies and gentlemen there is only one way to do science you

is only one way to do science you

is only one way to do science you postulate an idea and then you observe

postulate an idea and then you observe

postulate an idea and then you observe data to see if you can verify that idea

data to see if you can verify that idea

data to see if you can verify that idea or disregard it you cannot look at a

or disregard it you cannot look at a

or disregard it you cannot look at a data set then generate a hypothesis that

data set then generate a hypothesis that

data set then generate a hypothesis that best explains it and think that that’s

best explains it and think that that’s

best explains it and think that that’s somehow is any physical representation

somehow is any physical representation

somehow is any physical representation in this world because it doesn’t

in this world because it doesn’t

in this world because it doesn’t and and that’s why a lot of a lot of

and and that’s why a lot of a lot of

and and that’s why a lot of a lot of machine learning approaches a lot of

machine learning approaches a lot of

machine learning approaches a lot of statistical approaches has actually

statistical approaches has actually

statistical approaches has actually figured out after you know several

figured out after you know several

figured out after you know several several years of hardcore science they

several years of hardcore science they

several years of hardcore science they found out that the biggest risk for

found out that the biggest risk for

found out that the biggest risk for dying from coronary artery disease is

dying from coronary artery disease is

dying from coronary artery disease is actually going to the hospital yeah

actually going to the hospital yeah

actually going to the hospital yeah that’s just not true and you know nobody

that’s just not true and you know nobody

that’s just not true and you know nobody nobody stopped and and instead you know

nobody stopped and and instead you know

nobody stopped and and instead you know why did this happen is it because the

why did this happen is it because the

why did this happen is it because the the researchers are brain damaged could

the researchers are brain damaged could

the researchers are brain damaged could have been the reason but but but but it

have been the reason but but but but it

have been the reason but but but but it wasn’t it was the methodology it was

wasn’t it was the methodology it was

wasn’t it was the methodology it was they were asking the wrong question

they were asking the wrong question

they were asking the wrong question because if you ask that question I can

because if you ask that question I can

because if you ask that question I can assure you that before you died at the

assure you that before you died at the

assure you that before you died at the hospital you had to go there so this

hospital you had to go there so this

hospital you had to go there so this makes perfect sense but it has no

makes perfect sense but it has no

makes perfect sense but it has no representation of the problem you’re

representation of the problem you’re

representation of the problem you’re trying to solve what you should have

trying to solve what you should have

trying to solve what you should have said is given that you’re sick and you

said is given that you’re sick and you

said is given that you’re sick and you go to the hospital and given that jack

go to the hospital and given that jack

go to the hospital and given that jack to have something that’s worth visiting

to have something that’s worth visiting

to have something that’s worth visiting the hospital for now that is predictive

the hospital for now that is predictive

the hospital for now that is predictive of you being actually disposed to dying

of you being actually disposed to dying

of you being actually disposed to dying for coronary artery disease so how do we

for coronary artery disease so how do we

for coronary artery disease so how do we fix this we fix this by doing what we

fix this we fix this by doing what we

fix this we fix this by doing what we should have been doing from the

should have been doing from the

should have been doing from the beginning and this is not new this

beginning and this is not new this

beginning and this is not new this formula down here below asks a different

formula down here below asks a different

formula down here below asks a different question what does it ask it asks what

question what does it ask it asks what

question what does it ask it asks what is the probability distribution of the

is the probability distribution of the

is the probability distribution of the parameters on my model that I don’t know

parameters on my model that I don’t know

parameters on my model that I don’t know by the way given that I have observed a

by the way given that I have observed a

by the way given that I have observed a data set that is real it is not fake it

data set that is real it is not fake it

data set that is real it is not fake it is not random it is a data set as been

is not random it is a data set as been

is not random it is a data set as been observed what is the probability

observed what is the probability

observed what is the probability distribution of my parameters now that

distribution of my parameters now that

distribution of my parameters now that is an interesting question to ask and

is an interesting question to ask and

is an interesting question to ask and that is a scientific question to ask but

that is a scientific question to ask but

that is a scientific question to ask but what does that require it requires you

what does that require it requires you

what does that require it requires you to state your mind the last part on the

to state your mind the last part on the

to state your mind the last part on the denominator which is the P theta given X

denominator which is the P theta given X

denominator which is the P theta given X that says what do you believe is true

that says what do you believe is true

that says what do you believe is true about your parameters given the data set

about your parameters given the data set

about your parameters given the data set that you have that’s very very important

that you have that’s very very important

that you have that’s very very important ladies and gentlemen because this is the

ladies and gentlemen because this is the

ladies and gentlemen because this is the difference between something great and

difference between something great and

difference between something great and something completely insane

something completely insane

something completely insane now then you might ask but okay why

now then you might ask but okay why

now then you might ask but okay why didn’t we do this because it couldn’t be

didn’t we do this because it couldn’t be

didn’t we do this because it couldn’t be done we simply didn’t have the

done we simply didn’t have the

done we simply didn’t have the computational power to do this and it’s

computational power to do this and it’s

computational power to do this and it’s not because of the guy to the right hand

not because of the guy to the right hand

not because of the guy to the right hand side there

side there

side there it’s also not to the guy on the left

it’s also not to the guy on the left

it’s also not to the guy on the left hand side and denominator and you can

hand side and denominator and you can

hand side and denominator and you can see that the guy on the left hand side

see that the guy on the left hand side

see that the guy on the left hand side and nominated it’s exactly what machine

and nominated it’s exactly what machine

and nominated it’s exactly what machine learning is doing today now why is that

learning is doing today now why is that

learning is doing today now why is that it’s because of the fact that they knew

it’s because of the fact that they knew

it’s because of the fact that they knew that the the guy in the denominator that

that the the guy in the denominator that

that the the guy in the denominator that is an integral from hell and it cannot

is an integral from hell and it cannot

is an integral from hell and it cannot be solved it it looks at every single

be solved it it looks at every single

be solved it it looks at every single value of every single parameter that you

value of every single parameter that you

value of every single parameter that you have and sums that out now this will end

have and sums that out now this will end

have and sums that out now this will end up in a scenario we have to calculate a

up in a scenario we have to calculate a

up in a scenario we have to calculate a lot of more things than the number of

lot of more things than the number of

lot of more things than the number of atoms in the universe and there are a

atoms in the universe and there are a

atoms in the universe and there are a lot of atoms in the universe even the

lot of atoms in the universe even the

lot of atoms in the universe even the the part that one that we can see but

the part that one that we can see but

the part that one that we can see but that basically meant that all of this is

that basically meant that all of this is

that basically meant that all of this is out of the question so someone realized

out of the question so someone realized

out of the question so someone realized hey that I don’t need to calculate that

hey that I don’t need to calculate that

hey that I don’t need to calculate that I don’t know I don’t care about

I don’t know I don’t care about

I don’t know I don’t care about probabilities you know I can just say

probabilities you know I can just say

probabilities you know I can just say that the point that is the maximum will

that the point that is the maximum will

that the point that is the maximum will be the same because the other thing is

be the same because the other thing is

be the same because the other thing is just a normalizing factor it’s a

just a normalizing factor it’s a

just a normalizing factor it’s a constant okay good enough we remove that

constant okay good enough we remove that

constant okay good enough we remove that so done deal and then they said but the

so done deal and then they said but the

so done deal and then they said but the prior ever what if I don’t know anything

prior ever what if I don’t know anything

prior ever what if I don’t know anything what if I I don’t want to say anything I

what if I I don’t want to say anything I

what if I I don’t want to say anything I don’t want to you know state my mind and

don’t want to you know state my mind and

don’t want to you know state my mind and you know put my knowledge into the

you know put my knowledge into the

you know put my knowledge into the problem so that’s just the uniform

problem so that’s just the uniform

problem so that’s just the uniform distribution over minus infinity and

distribution over minus infinity and

distribution over minus infinity and infinity and whoopty this this equation

infinity and whoopty this this equation

infinity and whoopty this this equation here has been transferred to only the

here has been transferred to only the

here has been transferred to only the likelihood but you made a lot of

likelihood but you made a lot of

likelihood but you made a lot of assumptions there but people just forgot

assumptions there but people just forgot

assumptions there but people just forgot that these assumptions are not true and

that these assumptions are not true and

that these assumptions are not true and it also in a maximum likelihood which is

it also in a maximum likelihood which is

it also in a maximum likelihood which is you know horrible way of doing things

you know horrible way of doing things

you know horrible way of doing things it’s basically because you assume that

it’s basically because you assume that

it’s basically because you assume that everything is independent you assume

everything is independent you assume

everything is independent you assume that even when you’re doing time series

that even when you’re doing time series

that even when you’re doing time series regression that observation one is

regression that observation one is

regression that observation one is independent of observation – that’s

independent of observation – that’s

that’s like saying you know I wasn’t

that’s like saying you know I wasn’t last year I was not one year younger

last year I was not one year younger

last year I was not one year younger than I am today of course I was and

than I am today of course I was and

than I am today of course I was and that’s important

that’s important

that’s important all of those things that are temporally

all of those things that are temporally

all of those things that are temporally related are extremely important and the

related are extremely important and the

related are extremely important and the reason why I’m saying this today is that

reason why I’m saying this today is that

reason why I’m saying this today is that there’s no need to cheat anymore there’s

there’s no need to cheat anymore there’s

there’s no need to cheat anymore there’s no need for these crazy statistical

no need for these crazy statistical

no need for these crazy statistical results only you can state your mind you

results only you can state your mind you

results only you can state your mind you can do the inference and all of it can

can do the inference and all of it can

can do the inference and all of it can be done with probabilistic programming

be done with probabilistic programming

be done with probabilistic programming and there are many frameworks for this

and there are many frameworks for this

and there are many frameworks for this today including in Python and also

today including in Python and also

today including in Python and also building on top of tensorflow by the

building on top of tensorflow by the

building on top of tensorflow by the there’s really no excuse not to do this

there’s really no excuse not to do this

there’s really no excuse not to do this and the best thing about it is that it’s

and the best thing about it is that it’s

and the best thing about it is that it’s actually easier than than adhering to

actually easier than than adhering to

actually easier than than adhering to normal statistics because the normal

normal statistics because the normal

normal statistics because the normal statistics you were taught tools they

statistics you were taught tools they

statistics you were taught tools they said that if you have two populations

said that if you have two populations

said that if you have two populations and they are sort of varying together

and they are sort of varying together

and they are sort of varying together then you use this magical tool if they

then you use this magical tool if they

then you use this magical tool if they are independent then you use another

are independent then you use another

are independent then you use another magical tool nobody really understood

magical tool nobody really understood

magical tool nobody really understood why they just but in here is the t-test

why they just but in here is the t-test

why they just but in here is the t-test in this one it’s a paired t-test and

in this one it’s a paired t-test and

in this one it’s a paired t-test and this one is the Wilcox in this point you

this one is the Wilcox in this point you

this one is the Wilcox in this point you should do a general logistic regression

should do a general logistic regression

should do a general logistic regression in this one you should just do a normal

in this one you should just do a normal

in this one you should just do a normal linear regression in this one is uses

linear regression in this one is uses

linear regression in this one is uses port vector machine they are all the

port vector machine they are all the

port vector machine they are all the same thing they are not different there

same thing they are not different there

same thing they are not different there are different assumptions in the

are different assumptions in the

are different assumptions in the likelihood functions there are different

likelihood functions there are different

likelihood functions there are different assumptions in your priors there are

assumptions in your priors there are

assumptions in your priors there are different assumptions in the physical

different assumptions in the physical

different assumptions in the physical structure of your model that is all

structure of your model that is all

structure of your model that is all there is no other difference all of it

there is no other difference all of it

there is no other difference all of it comes back to probabilistic modeling and

comes back to probabilistic modeling and

comes back to probabilistic modeling and if you can learn how to make these

if you can learn how to make these

if you can learn how to make these assumptions explicitly then you have a

assumptions explicitly then you have a

assumptions explicitly then you have a modeling language without limitations

modeling language without limitations

modeling language without limitations then you don’t have to know the

then you don’t have to know the

then you don’t have to know the difference between logistic regressions

difference between logistic regressions

difference between logistic regressions and linear regressions because there is

and linear regressions because there is

and linear regressions because there is none it is exactly the same thing and

none it is exactly the same thing and

none it is exactly the same thing and that’s perhaps the most important thing

that’s perhaps the most important thing

that’s perhaps the most important thing now wait the most important thing that

now wait the most important thing that

now wait the most important thing that I’m gonna say today given that you think

I’m gonna say today given that you think

I’m gonna say today given that you think it’s important is that you cannot do

it’s important is that you cannot do

it’s important is that you cannot do science without assumptions that is

science without assumptions that is

science without assumptions that is impossible just you know this dis is not

impossible just you know this dis is not

impossible just you know this dis is not my belief this is just hardcore facts

my belief this is just hardcore facts

my belief this is just hardcore facts you cannot do science without assumption

you cannot do science without assumption

you cannot do science without assumption and and don’t rest your minds until you

and and don’t rest your minds until you

and and don’t rest your minds until you understand this so without actually

understand this so without actually

understand this so without actually risking something you can get no answers

risking something you can get no answers

risking something you can get no answers so let’s have a look at neural networks

so let’s have a look at neural networks

so let’s have a look at neural networks I’m sure how many of you have taken a

I’m sure how many of you have taken a

I’m sure how many of you have taken a neural networks class in their days ok

neural networks class in their days ok

neural networks class in their days ok then most of you have have solved this

then most of you have have solved this

then most of you have have solved this problem I’m sure how many people have

problem I’m sure how many people have

problem I’m sure how many people have solved this problem before ok a few guys

solved this problem before ok a few guys

solved this problem before ok a few guys and girls so basically this problem is

and girls so basically this problem is

and girls so basically this problem is is highly nonlinear it’s it’s a

is highly nonlinear it’s it’s a

is highly nonlinear it’s it’s a classification task your job is to

classification task your job is to

classification task your job is to separate the the blue dots from the red

separate the the blue dots from the red

separate the the blue dots from the red dots by some line you can see this is

dots by some line you can see this is

dots by some line you can see this is sort of a spiral that that’s that’s non

sort of a spiral that that’s that’s non

sort of a spiral that that’s that’s non stationary it’s

stationary it’s

stationary it’s quite nasty isn’t it Anna neural network

quite nasty isn’t it Anna neural network

quite nasty isn’t it Anna neural network will how many hidden notes do you think

will how many hidden notes do you think

will how many hidden notes do you think I have to have in a one-layer no natural

I have to have in a one-layer no natural

I have to have in a one-layer no natural to solve this 10 20 50 100 let’s see

to solve this 10 20 50 100 let’s see

to solve this 10 20 50 100 let’s see well with ten hit notes I can learn how

well with ten hit notes I can learn how

well with ten hit notes I can learn how to separate this not great but there is

to separate this not great but there is

to separate this not great but there is some signal there if you use up here

some signal there if you use up here

some signal there if you use up here thirty hidden notes you can do a lot

thirty hidden notes you can do a lot

thirty hidden notes you can do a lot better not surprising but still it’s

better not surprising but still it’s

better not surprising but still it’s still not good because we know that this

still not good because we know that this

still not good because we know that this problem can be solved exactly right so

problem can be solved exactly right so

problem can be solved exactly right so with a hundred hidden notes you almost

with a hundred hidden notes you almost

with a hundred hidden notes you almost have perfect classification right and if

have perfect classification right and if

have perfect classification right and if you look at the accuracy table you will

you look at the accuracy table you will

you look at the accuracy table you will see that the area under the curve is

see that the area under the curve is

see that the area under the curve is 100% with the 100 nodes now what is the

100% with the 100 nodes now what is the

100% with the 100 nodes now what is the problem with this and this is on a this

problem with this and this is on a this

problem with this and this is on a this is on a test data set mind you now the

is on a test data set mind you now the

is on a test data set mind you now the problem with this is that this looks

problem with this is that this looks

problem with this is that this looks great

great

great this looks amazing I mean your job is

this looks amazing I mean your job is

this looks amazing I mean your job is done right okay so let’s look at the

done right okay so let’s look at the

done right okay so let’s look at the decision surfaces that were generated

decision surfaces that were generated

decision surfaces that were generated from these guys now to the left-hand

from these guys now to the left-hand

from these guys now to the left-hand side you have the decision surface based

side you have the decision surface based

side you have the decision surface based on 10 hidden neurons and on the right

on 10 hidden neurons and on the right

on 10 hidden neurons and on the right hand side you have the decision surfaces

hand side you have the decision surfaces

hand side you have the decision surfaces based on 100 hidden nodes now you can

based on 100 hidden nodes now you can

based on 100 hidden nodes now you can see here does those decision surfaces

see here does those decision surfaces

see here does those decision surfaces look good to you does it look like they

look good to you does it look like they

look good to you does it look like they actually have captured what you wanted

actually have captured what you wanted

actually have captured what you wanted them to capture no it did not and this

them to capture no it did not and this

them to capture no it did not and this is exactly how neural networks work they

is exactly how neural networks work they

is exactly how neural networks work they are over parameterised very flexible

are over parameterised very flexible

are over parameterised very flexible mathematical models that will do

mathematical models that will do

mathematical models that will do everything they can to minimize that sum

everything they can to minimize that sum

everything they can to minimize that sum square or the croissant repair so

square or the croissant repair so

square or the croissant repair so there’s no penalisation for finding

there’s no penalisation for finding

there’s no penalisation for finding statistical only results and what is the

statistical only results and what is the

statistical only results and what is the worst thing with this the worst thing

worst thing with this the worst thing

worst thing with this the worst thing here is that you see the regions in the

here is that you see the regions in the

here is that you see the regions in the in the outskirts that are colored red

in the outskirts that are colored red

in the outskirts that are colored red that is a signal that the neural network

that is a signal that the neural network

that is a signal that the neural network is sure exists there’s there was no data

is sure exists there’s there was no data

is sure exists there’s there was no data out there at all

out there at all

out there at all but it knows that that has a

but it knows that that has a

but it knows that that has a differentiated class now this might not

differentiated class now this might not

differentiated class now this might not be a problem if you’re if you’re trying

be a problem if you’re if you’re trying

be a problem if you’re if you’re trying to classify you know

to classify you know

to classify you know maybe if there will rain extra much

maybe if there will rain extra much

maybe if there will rain extra much tomorrow the what if you have a droid

tomorrow the what if you have a droid

tomorrow the what if you have a droid with one target kill insurgents let

with one target kill insurgents let

with one target kill insurgents let civilians live what if they identify one

civilians live what if they identify one

civilians live what if they identify one of those asks you know one of those

of those asks you know one of those

of those asks you know one of those outer regions that that just makes sense

outer regions that that just makes sense

outer regions that that just makes sense that was never part of the training set

that was never part of the training set

that was never part of the training set this is a truth that has been learned by

this is a truth that has been learned by

this is a truth that has been learned by a network where data never actually

a network where data never actually

a network where data never actually showed at this and there’s no

showed at this and there’s no

showed at this and there’s no penalisation for this and the reason why

penalisation for this and the reason why

penalisation for this and the reason why I’m saying this is not to be you know

I’m saying this is not to be you know

I’m saying this is not to be you know don’t use AI or don’t use machine

don’t use AI or don’t use machine

don’t use AI or don’t use machine learning in fact I’m saying the opposite

learning in fact I’m saying the opposite

learning in fact I’m saying the opposite but what I want to say here is that be

but what I want to say here is that be

but what I want to say here is that be responsible

responsible

responsible every time you deploy a machine learning

every time you deploy a machine learning

every time you deploy a machine learning algorithm you have to understand exactly

algorithm you have to understand exactly

algorithm you have to understand exactly what it does because lack of

what it does because lack of

what it does because lack of understanding is the most dangerous

understanding is the most dangerous

understanding is the most dangerous thing that can exist today and it

thing that can exist today and it

thing that can exist today and it doesn’t have to be artificial

doesn’t have to be artificial

doesn’t have to be artificial superintelligence all that requires is a

superintelligence all that requires is a

superintelligence all that requires is a screw-up in the engineer or the

screw-up in the engineer or the

screw-up in the engineer or the scientists built this network and it can

scientists built this network and it can

scientists built this network and it can have dramatic consequences especially

have dramatic consequences especially

have dramatic consequences especially today in the in the time of self-driving

today in the in the time of self-driving

today in the in the time of self-driving cars and and all these things and this

cars and and all these things and this

cars and and all these things and this here I will show you another example of

here I will show you another example of

here I will show you another example of why I think that this is interesting so

why I think that this is interesting so

why I think that this is interesting so this is just a representation and mind

this is just a representation and mind

this is just a representation and mind you this is only a single layer neural

you this is only a single layer neural

you this is only a single layer neural network by the way no no you know super

network by the way no no you know super

network by the way no no you know super deep structures where would have even

deep structures where would have even

deep structures where would have even more parameters so I just want want to

more parameters so I just want want to

more parameters so I just want want to show you that this problem here

show you that this problem here

show you that this problem here represented in Cartesian coordinates is

represented in Cartesian coordinates is

represented in Cartesian coordinates is what was being fed to the neural network

what was being fed to the neural network

what was being fed to the neural network and what the neural network should have

and what the neural network should have

and what the neural network should have realized is that in polar coordinates it

realized is that in polar coordinates it

realized is that in polar coordinates it looks a lot simpler doesn’t it now I

looks a lot simpler doesn’t it now I

looks a lot simpler doesn’t it now I know that problem I can separate that

know that problem I can separate that

know that problem I can separate that with with just one hidden node and this

with with just one hidden node and this

with with just one hidden node and this is my point you can over parameterize

is my point you can over parameterize

is my point you can over parameterize and throw a lot of data things but if

and throw a lot of data things but if

and throw a lot of data things but if you start to think about the problem at

you start to think about the problem at

you start to think about the problem at hand and if we teach machines to learn

hand and if we teach machines to learn

hand and if we teach machines to learn how to think how to reason how to look

how to think how to reason how to look

how to think how to reason how to look at data instead of just number crunching

at data instead of just number crunching

at data instead of just number crunching and this is why today I’m not scared of

and this is why today I’m not scared of

and this is why today I’m not scared of artificial intelligence artificial

artificial intelligence artificial

artificial intelligence artificial superintelligence because i could have

superintelligence because i could have

superintelligence because i could have solved this in half a second you know

solved this in half a second you know

solved this in half a second you know even if you don’t have a degree in

even if you don’t have a degree in

even if you don’t have a degree in physics you should realize that that

physics you should realize that that

physics you should realize that that these are just two sine functions with

these are just two sine functions with

these are just two sine functions with with increasing radius it’s not hard but

with increasing radius it’s not hard but

with increasing radius it’s not hard but a neural network would never get this

a neural network would never get this

a neural network would never get this nor would any other machine learning

nor would any other machine learning

nor would any other machine learning algorithm by the way impossible because

algorithm by the way impossible because

algorithm by the way impossible because they don’t work that way that’s not

they don’t work that way that’s not

they don’t work that way that’s not their goal

their goal

their goal the way we can’t we can’t be angry at

the way we can’t we can’t be angry at

the way we can’t we can’t be angry at them for not solving that I just want to

them for not solving that I just want to

them for not solving that I just want to show you a take on probabilistic program

show you a take on probabilistic program

show you a take on probabilistic program with this and and also explain to you

with this and and also explain to you

with this and and also explain to you what public programming is it’s

what public programming is it’s

what public programming is it’s basically an attempt to unify

basically an attempt to unify

basically an attempt to unify general-purpose programming and by

general-purpose programming and by

general-purpose programming and by general purpose I mean like Turing

general purpose I mean like Turing

general purpose I mean like Turing complete programs that we all like

complete programs that we all like

complete programs that we all like because they can basically compute

because they can basically compute

because they can basically compute anything and marrying that was

anything and marrying that was

anything and marrying that was probabilistic modeling which is what

probabilistic modeling which is what

probabilistic modeling which is what everyone should be doing everyone

everyone should be doing everyone

everyone should be doing everyone whatever model you are crazy you are

whatever model you are crazy you are

whatever model you are crazy you are doing probabilistic modeling you just

doing probabilistic modeling you just

doing probabilistic modeling you just accepted a lot of assumptions that you

accepted a lot of assumptions that you

accepted a lot of assumptions that you didn’t make and and that is a

didn’t make and and that is a

didn’t make and and that is a realization that that even though you

realization that that even though you

realization that that even though you can choose not to care about it you have

can choose not to care about it you have

can choose not to care about it you have to know about it you have to know the

to know about it you have to know the

to know about it you have to know the assumptions behind the algorithms that

assumptions behind the algorithms that

assumptions behind the algorithms that you’re using and that’s why even though

you’re using and that’s why even though

you’re using and that’s why even though it’s very attempting to to fire up your

it’s very attempting to to fire up your

it’s very attempting to to fire up your favorite programming language load

favorite programming language load

favorite programming language load scikit-learn or tensorflow or you know

scikit-learn or tensorflow or you know

scikit-learn or tensorflow or you know whatever framework you’re using MX net

whatever framework you’re using MX net

whatever framework you’re using MX net doesn’t matter it’s still important to

doesn’t matter it’s still important to

doesn’t matter it’s still important to understand the cost you don’t have to be

understand the cost you don’t have to be

understand the cost you don’t have to be an expert in the math behind it that’s

an expert in the math behind it that’s

an expert in the math behind it that’s not what I’m saying but you have to

not what I’m saying but you have to

not what I’m saying but you have to understand conceptually what they do and

understand conceptually what they do and

understand conceptually what they do and more importantly what they don’t do

more importantly what they don’t do

more importantly what they don’t do because that makes all the difference so

because that makes all the difference so

because that makes all the difference so this is just to say that you could have

this is just to say that you could have

this is just to say that you could have written this model a lot easier now this

written this model a lot easier now this

written this model a lot easier now this is this is also a breaking point of the

is this is also a breaking point of the

is this is also a breaking point of the html5 presentations by the way this is

html5 presentations by the way this is

html5 presentations by the way this is actually really supposed to be on the

actually really supposed to be on the

actually really supposed to be on the right hand side so thank you windows

right hand side so thank you windows

right hand side so thank you windows even so that few code up there is

even so that few code up there is

even so that few code up there is basically a probabilistic way of

basically a probabilistic way of

basically a probabilistic way of specifying the model that solves it

specifying the model that solves it

specifying the model that solves it exactly and this can be expressed in a

exactly and this can be expressed in a

exactly and this can be expressed in a probabilistic programming language the

probabilistic programming language the

probabilistic programming language the neural network I wrote to fix that took

neural network I wrote to fix that took

neural network I wrote to fix that took a lot more coding I can assure you

so the take-home messages here is that

so the take-home messages here is that if you view things if you go back to

if you view things if you go back to

if you view things if you go back to basic and view them as what they are

basic and view them as what they are

basic and view them as what they are probabilistic statements about data

probabilistic statements about data

probabilistic statements about data about concepts about what you’re trying

about concepts about what you’re trying

about concepts about what you’re trying to model you gain basically a generative

to model you gain basically a generative

to model you gain basically a generative model you gain an understanding of what

model you gain an understanding of what

model you gain an understanding of what is actually happening and and that also

is actually happening and and that also

is actually happening and and that also means that you don’t get any crazy

means that you don’t get any crazy

means that you don’t get any crazy statistical only solutions due to

statistical only solutions due to

statistical only solutions due to identifiability problems and and this is

identifiability problems and and this is

identifiability problems and and this is something we really have to get away

something we really have to get away

something we really have to get away from identifiability is something that

from identifiability is something that

from identifiability is something that will be problematic so I’m not going to

will be problematic so I’m not going to

will be problematic so I’m not going to talk about deep learning I just want to

talk about deep learning I just want to

talk about deep learning I just want to show you what it is but I think you’ve

show you what it is but I think you’ve

show you what it is but I think you’ve had enough talks about that so max

had enough talks about that so max

had enough talks about that so max pooling and all of that we can I’m

pooling and all of that we can I’m

pooling and all of that we can I’m pretty sure we can skip what I do want

pretty sure we can skip what I do want

pretty sure we can skip what I do want to say though that neural networks per

to say though that neural networks per

to say though that neural networks per per default are degenerate and what I

per default are degenerate and what I

per default are degenerate and what I mean by that is that the the energy

mean by that is that the the energy

mean by that is that the the energy landscape that they’re running around in

landscape that they’re running around in

landscape that they’re running around in where they are trying to optimize things

where they are trying to optimize things

where they are trying to optimize things there are multiple locations in this

there are multiple locations in this

there are multiple locations in this energy landscape corresponding to the

energy landscape corresponding to the

energy landscape corresponding to the parameters that that minimizes the error

parameters that that minimizes the error

parameters that that minimizes the error and they’re equivalent that they

and they’re equivalent that they

and they’re equivalent that they correspond to very different physical

correspond to very different physical

correspond to very different physical realities so how the how’s the neural

realities so how the how’s the neural

realities so how the how’s the neural networks supposed to know and this is

networks supposed to know and this is

networks supposed to know and this is not something that you know that that we

not something that you know that that we

not something that you know that that we can design our way out of because the

can design our way out of because the

can design our way out of because the whole idea with the neural network is

whole idea with the neural network is

whole idea with the neural network is this degeneracy because the optimization

this degeneracy because the optimization

this degeneracy because the optimization is such a problem problematic space and

is such a problem problematic space and

is such a problem problematic space and I just want to visualize with the simple

I just want to visualize with the simple

I just want to visualize with the simple neural network here why this happens you

neural network here why this happens you

neural network here why this happens you can see these two networks describe

can see these two networks describe

can see these two networks describe exactly the same thing they solve

exactly the same thing they solve

exactly the same thing they solve exactly the same problem but the

exactly the same problem but the

exactly the same problem but the parameters are different and that’s why

parameters are different and that’s why

parameters are different and that’s why if you take you from X 1 and go to the

if you take you from X 1 and go to the

if you take you from X 1 and go to the hidden 2 and hidden 1 you can either

hidden 2 and hidden 1 you can either

hidden 2 and hidden 1 you can either have weight 1 1 be equal to 5 and go to

have weight 1 1 be equal to 5 and go to

have weight 1 1 be equal to 5 and go to a hidden node 1 or you can have weight 1

a hidden node 1 or you can have weight 1

a hidden node 1 or you can have weight 1 1 before and go to hidden 8 so if you

1 before and go to hidden 8 so if you

1 before and go to hidden 8 so if you try if you basically turn this on its

try if you basically turn this on its

try if you basically turn this on its head and shift around these weights you

head and shift around these weights you

head and shift around these weights you get exactly the same solution now this

get exactly the same solution now this

get exactly the same solution now this is one source of degeneracy and there

is one source of degeneracy and there

is one source of degeneracy and there are many of those so just imagine now

are many of those so just imagine now

are many of those so just imagine now that you’re stacking a lot of layers on

that you’re stacking a lot of layers on

that you’re stacking a lot of layers on top of each other you’re having hundreds

top of each other you’re having hundreds

top of each other you’re having hundreds of neurons how many permutations do you

of neurons how many permutations do you

of neurons how many permutations do you think you will be able to reach a lot is

think you will be able to reach a lot is

think you will be able to reach a lot is the answer I didn’t do it I didn’t do

the answer I didn’t do it I didn’t do

the answer I didn’t do it I didn’t do the math but just

the math but just

the math but just trust me it’s a lot so in in energy

trust me it’s a lot so in in energy

trust me it’s a lot so in in energy space in one dimension it looks like the

space in one dimension it looks like the

space in one dimension it looks like the one on the left-hand side you see two

one on the left-hand side you see two

one on the left-hand side you see two distinct points they are equivalent in

distinct points they are equivalent in

distinct points they are equivalent in the solution space and you cannot

the solution space and you cannot

the solution space and you cannot differentiate between them this is also

differentiate between them this is also

differentiate between them this is also why regularization is such a good idea

why regularization is such a good idea

why regularization is such a good idea in neural networks because it basically

in neural networks because it basically

in neural networks because it basically forces you to enter one of those

forces you to enter one of those

forces you to enter one of those tractors and in in two-dimensional space

tractors and in in two-dimensional space

tractors and in in two-dimensional space you can see that it corresponds to these

you can see that it corresponds to these

you can see that it corresponds to these two attractors in this colorized plot

two attractors in this colorized plot

two attractors in this colorized plot and then if you visualize this in in all

and then if you visualize this in in all

and then if you visualize this in in all the dimensions that the neural network

the dimensions that the neural network

the dimensions that the neural network is actually operating in which is

is actually operating in which is

is actually operating in which is typically the essence of dimensions then

typically the essence of dimensions then

typically the essence of dimensions then you can just imagine how many of those

you can just imagine how many of those

you can just imagine how many of those attractors you have and different depths

attractors you have and different depths

attractors you have and different depths of those attractors so I want to end my

of those attractors so I want to end my

of those attractors so I want to end my point if you missed my points I try to

point if you missed my points I try to

point if you missed my points I try to state it several times but sometimes I’m

state it several times but sometimes I’m

state it several times but sometimes I’m very clumsy in the way I state things so

very clumsy in the way I state things so

very clumsy in the way I state things so I’m gonna be very blunt this is one of

I’m gonna be very blunt this is one of

I’m gonna be very blunt this is one of the best neural networks at given 2016

the best neural networks at given 2016

the best neural networks at given 2016 or 2015 was a version of the Linette

or 2015 was a version of the Linette

or 2015 was a version of the Linette that was trained to recognize digits and

that was trained to recognize digits and

that was trained to recognize digits and it does that perfectly like we said

it does that perfectly like we said

it does that perfectly like we said before we’re so far and in this area

before we’re so far and in this area

before we’re so far and in this area about perception that we don’t have to

about perception that we don’t have to

about perception that we don’t have to worry about not being able to do it it’s

worry about not being able to do it it’s

worry about not being able to do it it’s actually it’s actually done and and and

actually it’s actually done and and and

actually it’s actually done and and and it’s much better than humans at

it’s much better than humans at

it’s much better than humans at recognizing these things okay so let’s

recognizing these things okay so let’s

recognizing these things okay so let’s put it to the test shall we let’s

put it to the test shall we let’s

put it to the test shall we let’s generate some random noise images and

generate some random noise images and

generate some random noise images and ask it what is this and in every single

ask it what is this and in every single

ask it what is this and in every single image here you see the network is 99%

image here you see the network is 99%

image here you see the network is 99% sure that it’s a 1 versus 2 all the way

sure that it’s a 1 versus 2 all the way

sure that it’s a 1 versus 2 all the way up to 9 so all the 4 images under the 0

up to 9 so all the 4 images under the 0

up to 9 so all the 4 images under the 0 it is convinced with the likelihood of

it is convinced with the likelihood of

it is convinced with the likelihood of 99% that this is a 0 can you in any way

99% that this is a 0 can you in any way

99% that this is a 0 can you in any way understand why this is a zero I can’t

understand why this is a zero I can’t

understand why this is a zero I can’t and nor nor can the network because it

and nor nor can the network because it

and nor nor can the network because it was never penalized based on the fact

was never penalized based on the fact

was never penalized based on the fact that you’re not allowed to find

that you’re not allowed to find

that you’re not allowed to find structures that does not sort of dispute

structures that does not sort of dispute

structures that does not sort of dispute your data it has no briefing that it has

your data it has no briefing that it has

your data it has no briefing that it has to stay true to some sort of physical

to stay true to some sort of physical

to stay true to some sort of physical reality and this happens

reality and this happens

reality and this happens now back to my point what if it’s not

now back to my point what if it’s not

now back to my point what if it’s not the number zero

the number zero

the number zero what if it’s recognizing a unknown the

what if it’s recognizing a unknown the

what if it’s recognizing a unknown the face of a known terrorist with a you

face of a known terrorist with a you

face of a known terrorist with a you know kill on sight command and this is

know kill on sight command and this is

know kill on sight command and this is just numbers ladies in them imagine the

just numbers ladies in them imagine the

just numbers ladies in them imagine the complexity of faces so this is the entry

complexity of faces so this is the entry

complexity of faces so this is the entry point exactly how dangerous this

point exactly how dangerous this

point exactly how dangerous this technology is if you don’t respect it

technology is if you don’t respect it

technology is if you don’t respect it and it’s not about you know the machines

and it’s not about you know the machines

and it’s not about you know the machines being too intelligent it’s about us not

being too intelligent it’s about us not

being too intelligent it’s about us not being stupid that is that is really

being stupid that is that is really

being stupid that is that is really important to remember we have a

important to remember we have a

important to remember we have a responsibility to build applications

responsibility to build applications

responsibility to build applications that do not have this confirmation bias

that do not have this confirmation bias

that do not have this confirmation bias in them and that is something I hope

in them and that is something I hope

in them and that is something I hope that all of you will think of when you

that all of you will think of when you

that all of you will think of when you go out and build the next awesome

go out and build the next awesome

go out and build the next awesome machine learning application because I

machine learning application because I

machine learning application because I can’t see any numbers in these images

can’t see any numbers in these images

can’t see any numbers in these images anywhere and if you want to you can read

anywhere and if you want to you can read

anywhere and if you want to you can read the paper by these guys that I said you

the paper by these guys that I said you

the paper by these guys that I said you get the slides afterwards and it’s a

get the slides afterwards and it’s a

get the slides afterwards and it’s a very interesting paper they’ve basically

very interesting paper they’ve basically

very interesting paper they’ve basically tried all they could to see how the

tried all they could to see how the

tried all they could to see how the network could generalize with things

network could generalize with things

network could generalize with things that hadn’t seen before and in different

that hadn’t seen before and in different

that hadn’t seen before and in different areas of what it was supposed to see

areas of what it was supposed to see

areas of what it was supposed to see another thing I want to say is that

another thing I want to say is that

another thing I want to say is that events are not temporally independent

events are not temporally independent

events are not temporally independent everything that you do today everything

everything that you do today everything

everything that you do today everything that you see today here perceive think

that you see today here perceive think

that you see today here perceive think about is affected by what you saw

about is affected by what you saw

about is affected by what you saw yesterday and it’s the same in data data

yesterday and it’s the same in data data

yesterday and it’s the same in data data is not independent you cannot assume

is not independent you cannot assume

is not independent you cannot assume that two data points are independent

that two data points are independent

that two data points are independent that is a wild and crazy assumption that

that is a wild and crazy assumption that

that is a wild and crazy assumption that we have been allowed to do for far too

we have been allowed to do for far too

we have been allowed to do for far too long

long

long and this is just a small visualization

and this is just a small visualization

and this is just a small visualization from the domain that I that I was

from the domain that I that I was

from the domain that I that I was working in where we’re trying to solve

working in where we’re trying to solve

working in where we’re trying to solve how a TV exposure affects the purchasing

how a TV exposure affects the purchasing

how a TV exposure affects the purchasing behavior of people moving into the

behavior of people moving into the

behavior of people moving into the future and of course if you see TV

future and of course if you see TV

future and of course if you see TV commercial today it might affect you to

commercial today it might affect you to

commercial today it might affect you to buy something far into the future and it

buy something far into the future and it

buy something far into the future and it might affect no one to do something

might affect no one to do something

might affect no one to do something today and that’s course or temporal

today and that’s course or temporal

today and that’s course or temporal dependencies that that also needs to be

dependencies that that also needs to be

dependencies that that also needs to be taken into account if you think about

taken into account if you think about

taken into account if you think about causal dependencies and if you think

causal dependencies and if you think

causal dependencies and if you think about concepts if you really think about

about concepts if you really think about

about concepts if you really think about structure of things then you end up with

structure of things then you end up with

structure of things then you end up with something that looks like a deep

something that looks like a deep

something that looks like a deep learning neural network but where you

learning neural network but where you

learning neural network but where you actually have

actually have

actually have structure that is inherent to the

structure that is inherent to the

structure that is inherent to the problem at hand and that’s basically you

problem at hand and that’s basically you

problem at hand and that’s basically you forging connections between concepts

forging connections between concepts

forging connections between concepts between variables between parameters

between variables between parameters

between variables between parameters death sort of solves the problem at hand

death sort of solves the problem at hand

death sort of solves the problem at hand but that doesn’t have this over

but that doesn’t have this over

but that doesn’t have this over characterization this is a visualization

characterization this is a visualization

characterization this is a visualization of one of the one of the models that

of one of the one of the models that

of one of the one of the models that were running and Blackwood for for one

were running and Blackwood for for one

were running and Blackwood for for one of our from one of our clients and and

of our from one of our clients and and

of our from one of our clients and and this is sort of the complexity that you

this is sort of the complexity that you

this is sort of the complexity that you need to have to solve the everyday

need to have to solve the everyday

need to have to solve the everyday problems every node that you see here is

problems every node that you see here is

problems every node that you see here is basically a representation of a variable

basically a representation of a variable

basically a representation of a variable or a latent variable and the

or a latent variable and the

or a latent variable and the relationships between them are basically

relationships between them are basically

relationships between them are basically edges and basically there’s no point in

edges and basically there’s no point in

edges and basically there’s no point in this thing spinning I just thought it

this thing spinning I just thought it

this thing spinning I just thought it looked cool and it helped me raise money

looked cool and it helped me raise money

looked cool and it helped me raise money back in the days

back in the days

back in the days actually the spinning I think was the

actually the spinning I think was the

actually the spinning I think was the differentiate because in one of the

differentiate because in one of the

differentiate because in one of the pitches I did it didn’t it didn’t spin

pitches I did it didn’t it didn’t spin

pitches I did it didn’t it didn’t spin and we didn’t get those money and then

and we didn’t get those money and then

and we didn’t get those money and then all of a sudden it was spinning and we

all of a sudden it was spinning and we

all of a sudden it was spinning and we got those money I don’t know if that’s

got those money I don’t know if that’s

got those money I don’t know if that’s you know all the reason but the spinning

you know all the reason but the spinning

you know all the reason but the spinning in my mind helped so but there’s there’s

in my mind helped so but there’s there’s

in my mind helped so but there’s there’s there’s no visual improvement based on

there’s no visual improvement based on

there’s no visual improvement based on that how many people have seen this

that how many people have seen this

that how many people have seen this before

before

before okay well that’s that’s just no fun okay

okay well that’s that’s just no fun okay

okay well that’s that’s just no fun okay but before before I saw it the first

but before before I saw it the first

but before before I saw it the first time interesting enough I had not seen

time interesting enough I had not seen

time interesting enough I had not seen it so the problem here is that you’re

it so the problem here is that you’re

it so the problem here is that you’re supposed to judge whether a and B the

supposed to judge whether a and B the

supposed to judge whether a and B the squares there are of the same hue or not

squares there are of the same hue or not

squares there are of the same hue or not and from my point of view there are

and from my point of view there are

and from my point of view there are extremely differentiated they look very

extremely differentiated they look very

extremely differentiated they look very differently but the problem is that

differently but the problem is that

differently but the problem is that they’re not they’re actually the same

they’re not they’re actually the same

they’re not they’re actually the same and the reason why why a lot of people

and the reason why why a lot of people

and the reason why why a lot of people think that they are think that they are

think that they are think that they are

think that they are think that they are different is because we are predicting

different is because we are predicting

different is because we are predicting based on the shadow that is being cast

based on the shadow that is being cast

based on the shadow that is being cast from a light source that we know where

from a light source that we know where

from a light source that we know where it is because we have recognized this

it is because we have recognized this

it is because we have recognized this pattern earlier in their lives that is

pattern earlier in their lives that is

pattern earlier in their lives that is also a kind of confirmation bias but

also a kind of confirmation bias but

also a kind of confirmation bias but it’s a good one

it’s a good one

it’s a good one because that’s that’s what allows us to

because that’s that’s what allows us to

because that’s that’s what allows us to actually live our lives and sometimes we

actually live our lives and sometimes we

actually live our lives and sometimes we were wrong like in these contorted

were wrong like in these contorted

were wrong like in these contorted images but but it does prove a point

images but but it does prove a point

images but but it does prove a point that does because our brains are very

that does because our brains are very

that does because our brains are very biased based on what we know already and

biased based on what we know already and

biased based on what we know already and and we would do predictions based on

and we would do predictions based on

and we would do predictions based on what we know

so basically probabilistic programming

so basically probabilistic programming what that is

what that is

what that is it basically allows us to specify any

it basically allows us to specify any

it basically allows us to specify any kind of models that we want no you don’t

kind of models that we want no you don’t

kind of models that we want no you don’t have to think about layers you don’t

have to think about layers you don’t

have to think about layers you don’t have to think about the pooling you

have to think about the pooling you

have to think about the pooling you don’t have to think about all the

don’t have to think about all the

don’t have to think about all the wording all you have to think about is

wording all you have to think about is

wording all you have to think about is that you specify how variables might

that you specify how variables might

that you specify how variables might relate to each other and you specify

relate to each other and you specify

relate to each other and you specify which parameters that might be there and

which parameters that might be there and

which parameters that might be there and how they are relating to the variables

how they are relating to the variables

how they are relating to the variables at hand and if you have that freedom

at hand and if you have that freedom

at hand and if you have that freedom then there’s nothing you cannot model

then there’s nothing you cannot model

then there’s nothing you cannot model the problem with this is that you cannot

the problem with this is that you cannot

the problem with this is that you cannot fit that with Maxim likelihood you

fit that with Maxim likelihood you

fit that with Maxim likelihood you cannot adapt that because you can’t

cannot adapt that because you can’t

cannot adapt that because you can’t assume independent observations you

assume independent observations you

assume independent observations you can’t assume that everything is its

can’t assume that everything is its

can’t assume that everything is its uniform you can’t assume what you can

uniform you can’t assume what you can

uniform you can’t assume what you can but it’s not very smart you can’t assume

but it’s not very smart you can’t assume

but it’s not very smart you can’t assume that any given parameter has a possible

that any given parameter has a possible

that any given parameter has a possible value of minus infinity or plus infinity

value of minus infinity or plus infinity

value of minus infinity or plus infinity now this this in general just makes no

now this this in general just makes no

now this this in general just makes no sense just just think about the fact

sense just just think about the fact

sense just just think about the fact that you’re supposed to predict the

that you’re supposed to predict the

that you’re supposed to predict the house prices for example if you allow

house prices for example if you allow

house prices for example if you allow your model to predict something which is

your model to predict something which is

your model to predict something which is negative then you have something that

negative then you have something that

negative then you have something that might make sense again it statistical

might make sense again it statistical

might make sense again it statistical space because there’s no reason why you

space because there’s no reason why you

space because there’s no reason why you shouldn’t be able to mirror things right

shouldn’t be able to mirror things right

shouldn’t be able to mirror things right you just look at the positive part but

you just look at the positive part but

you just look at the positive part but what about the part in the of your model

what about the part in the of your model

what about the part in the of your model that says that negative sales prices are

that says that negative sales prices are

that says that negative sales prices are also positive that that’s just nonsense

also positive that that’s just nonsense

also positive that that’s just nonsense and and these things you shouldn’t allow

and and these things you shouldn’t allow

and and these things you shouldn’t allow so that’s why you should specify your

so that’s why you should specify your

so that’s why you should specify your priors and the concept of your models

priors and the concept of your models

priors and the concept of your models very rigorously

very rigorously

very rigorously and the best thing about probabilistic

and the best thing about probabilistic

and the best thing about probabilistic programming is that we no longer have to

programming is that we no longer have to

programming is that we no longer have to be experts in Markov chain Monte Carlo

be experts in Markov chain Monte Carlo

be experts in Markov chain Monte Carlo before you have to do that but today you

before you have to do that but today you

before you have to do that but today you don’t you know you don’t have to

don’t you know you don’t have to

don’t you know you don’t have to understand what what a Hamiltonian is in

understand what what a Hamiltonian is in

understand what what a Hamiltonian is in this space you don’t have to understand

this space you don’t have to understand

this space you don’t have to understand quantum mechanics you just have to learn

quantum mechanics you just have to learn

quantum mechanics you just have to learn how to program a probabilistic

how to program a probabilistic

how to program a probabilistic programming language which is very easy

programming language which is very easy

programming language which is very easy by the way super easy if you know Python

by the way super easy if you know Python

by the way super easy if you know Python or R or Julia or C++ or C or Java

or R or Julia or C++ or C or Java

or R or Julia or C++ or C or Java learning how to program a probabilistic

learning how to program a probabilistic

learning how to program a probabilistic programming language is a walk in the

programming language is a walk in the

programming language is a walk in the park and it’s still true and complete

park and it’s still true and complete

park and it’s still true and complete mind you there are a lot of different

mind you there are a lot of different

mind you there are a lot of different things we get out of this we can get the

things we get out of this we can get the

things we get out of this we can get the full Bayesian inference with the market

full Bayesian inference with the market

full Bayesian inference with the market in Monte Carlo through algorithms such

in Monte Carlo through algorithms such

in Monte Carlo through algorithms such as Hamiltonian Markov chain Monte Carlo

as Hamiltonian Markov chain Monte Carlo

as Hamiltonian Markov chain Monte Carlo didn’t know you turn sampler that’s what

didn’t know you turn sampler that’s what

didn’t know you turn sampler that’s what you really want to do the problem with

you really want to do the problem with

you really want to do the problem with this is that still today it takes

this is that still today it takes

this is that still today it takes it takes some time there’s a there’s

it takes some time there’s a there’s

it takes some time there’s a there’s another emerging tool that’s called

another emerging tool that’s called

another emerging tool that’s called automated differentiation variational

automated differentiation variational

automated differentiation variational inference which is just a lot of

inference which is just a lot of

inference which is just a lot of different words that says that turn the

different words that says that turn the

different words that says that turn the inference problem into a maximization

inference problem into a maximization

inference problem into a maximization problem and and they would have gotten

problem and and they would have gotten

problem and and they would have gotten somewhere with that which makes these

somewhere with that which makes these

somewhere with that which makes these inference machine a lot easier to fit

inference machine a lot easier to fit

inference machine a lot easier to fit the best thing is that also the math

the best thing is that also the math

the best thing is that also the math library already has this to automate the

library already has this to automate the

library already has this to automate the differentiation so you don’t have to be

differentiation so you don’t have to be

differentiation so you don’t have to be expressing that either again all you

expressing that either again all you

expressing that either again all you have to do is learn a probabilistic

have to do is learn a probabilistic

have to do is learn a probabilistic programming language or learn a

programming language or learn a

programming language or learn a framework in in Python that supports it

framework in in Python that supports it

framework in in Python that supports it like Edward for example there are many

like Edward for example there are many

like Edward for example there are many other frameworks that do the same thing

other frameworks that do the same thing

other frameworks that do the same thing a note about uncertainty now what if I

a note about uncertainty now what if I

a note about uncertainty now what if I gave you a task your task right now is

gave you a task your task right now is

gave you a task your task right now is to take 1 million American dollars

to take 1 million American dollars

to take 1 million American dollars and you’re going to invest them in

and you’re going to invest them in

and you’re going to invest them in either a radio campaign or a TV campaign

either a radio campaign or a TV campaign

either a radio campaign or a TV campaign now I’m going to tell you that the

now I’m going to tell you that the

now I’m going to tell you that the average performance of each campaign has

average performance of each campaign has

average performance of each campaign has been 0.5 so the return of investment for

been 0.5 so the return of investment for

been 0.5 so the return of investment for an average radio campaign has been 0.5

an average radio campaign has been 0.5

an average radio campaign has been 0.5 the return on investment on an average

the return on investment on an average

the return on investment on an average TV campaign has also been 0.5 now my

TV campaign has also been 0.5 now my

TV campaign has also been 0.5 now my question to you is how would you invest

question to you is how would you invest

question to you is how would you invest does it matter well based on this

does it matter well based on this

does it matter well based on this information I would save I will just

information I would save I will just

information I would save I will just split it 5050 I mean why not they have

split it 5050 I mean why not they have

split it 5050 I mean why not they have the same performance right but what if I

the same performance right but what if I

the same performance right but what if I also told you that actually if you look

also told you that actually if you look

also told you that actually if you look at our is the distribution if you look

at our is the distribution if you look

at our is the distribution if you look over all the different radio campaigns

over all the different radio campaigns

over all the different radio campaigns that have been run and all the different

that have been run and all the different

that have been run and all the different TV campaigns that have been run if you

TV campaigns that have been run if you

TV campaigns that have been run if you look beyond the average and look at the

look beyond the average and look at the

look beyond the average and look at the individual results what do you have then

individual results what do you have then

individual results what do you have then well then you have that radio for

well then you have that radio for

well then you have that radio for example and TV they both have had

example and TV they both have had

example and TV they both have had historically a return investment of 0

historically a return investment of 0

historically a return investment of 0 which basically means it didn’t work

which basically means it didn’t work

which basically means it didn’t work that could be like your some of their

that could be like your some of their

that could be like your some of their some of the commercials you see on TV

some of the commercials you see on TV

some of the commercials you see on TV sometimes that are less than good you

sometimes that are less than good you

sometimes that are less than good you know sometimes you see these these naked

know sometimes you see these these naked

know sometimes you see these these naked gnomes running on a grass field and

gnomes running on a grass field and

gnomes running on a grass field and they’re trying to sell cell phone

they’re trying to sell cell phone

they’re trying to sell cell phone subscriptions and every law understood

subscriptions and every law understood

subscriptions and every law understood the connection but that didn’t work I’m

the connection but that didn’t work I’m

the connection but that didn’t work I’m sure I didn’t quantify that but but it

sure I didn’t quantify that but but it

sure I didn’t quantify that but but it didn’t work on me

didn’t work on me

didn’t work on me then I’m going to tell you that the

then I’m going to tell you that the

then I’m going to tell you that the maximum radio and TV performance that

maximum radio and TV performance that

maximum radio and TV performance that has been observed is that radio has had

has been observed is that radio has had

has been observed is that radio has had in his history and return investment of

in his history and return investment of

in his history and return investment of nine point three meanwhile TV has only

nine point three meanwhile TV has only

nine point three meanwhile TV has only had one point four how would you invest

had one point four how would you invest

had one point four how would you invest now would you still split it fifty-fifty

now would you still split it fifty-fifty

now would you still split it fifty-fifty I wouldn’t

now what if I tell you that this is

now what if I tell you that this is probably not the the real solution

probably not the the real solution

probably not the the real solution either in order to answer this question

either in order to answer this question

either in order to answer this question you have to ask another question in

you have to ask another question in

you have to ask another question in return you have to ask the question what

return you have to ask the question what

return you have to ask the question what is the probability of me realizing a

is the probability of me realizing a

is the probability of me realizing a return on investment greater than for

return on investment greater than for

return on investment greater than for example 0.3 let’s just take that that is

example 0.3 let’s just take that that is

example 0.3 let’s just take that that is what I want to to achieve now now we

what I want to to achieve now now we

what I want to to achieve now now we have a specified what our question is

have a specified what our question is

have a specified what our question is and then we can give it a probabilistic

and then we can give it a probabilistic

and then we can give it a probabilistic answer and then the answer to this

answer and then the answer to this

answer and then the answer to this question is that it’s about 40 percent

question is that it’s about 40 percent

question is that it’s about 40 percent probable for radio to get a return on

probable for radio to get a return on

probable for radio to get a return on investment for any given instance above

investment for any given instance above

investment for any given instance above 0.3 but it’s it’s about 90% for TV how

0.3 but it’s it’s about 90% for TV how

0.3 but it’s it’s about 90% for TV how does that go hand-in-hand with the fact

does that go hand-in-hand with the fact

does that go hand-in-hand with the fact that radio is outperform TV historically

that radio is outperform TV historically

that radio is outperform TV historically as a maximum and they have the same

as a maximum and they have the same

as a maximum and they have the same average well it’s because of the fact

average well it’s because of the fact

average well it’s because of the fact that things are distributions things are

that things are distributions things are

that things are distributions things are distributions and they are not caution

distributions and they are not caution

distributions and they are not caution now this here is the source of failure

now this here is the source of failure

now this here is the source of failure of every statistical method that you

of every statistical method that you

of every statistical method that you probably have tried before because it

probably have tried before because it

probably have tried before because it assumes that everything is symmetric in

assumes that everything is symmetric in

assumes that everything is symmetric in caution nature makes no such promise it

caution nature makes no such promise it

caution nature makes no such promise it has never said thou shalt not use Kashi

has never said thou shalt not use Kashi

has never said thou shalt not use Kashi never has that been part of any sort of

never has that been part of any sort of

never has that been part of any sort of commandment or information given to us

commandment or information given to us

commandment or information given to us by nature there is nothing special about

by nature there is nothing special about

by nature there is nothing special about the Gaussian distribution there is a few

the Gaussian distribution there is a few

the Gaussian distribution there is a few things special about it but you know

things special about it but you know

things special about it but you know let’s just ignore the central limit

let’s just ignore the central limit

let’s just ignore the central limit theorem for now because of the fact that

theorem for now because of the fact that

theorem for now because of the fact that we don’t have enough data to actually

we don’t have enough data to actually

we don’t have enough data to actually approach that anyway so let’s just

approach that anyway so let’s just

approach that anyway so let’s just ignore that for now now the point here

ignore that for now now the point here

ignore that for now now the point here is that the distribution of radio looks

is that the distribution of radio looks

is that the distribution of radio looks like this and the distribution for TV

like this and the distribution for TV

like this and the distribution for TV looks like the one below and here you

looks like the one below and here you

looks like the one below and here you can see they have the same average very

can see they have the same average very

can see they have the same average very different minima and Maxima and very

different minima and Maxima and very

different minima and Maxima and very different skewness and

different skewness and

different skewness and this is why you cannot make optimal

this is why you cannot make optimal

this is why you cannot make optimal decisions without knowing what you don’t

decisions without knowing what you don’t

decisions without knowing what you don’t know you cannot make optimal decisions

know you cannot make optimal decisions

know you cannot make optimal decisions without knowing uncertainty even though

without knowing uncertainty even though

without knowing uncertainty even though if you knew the average performance

if you knew the average performance

if you knew the average performance average performance is such a huge

average performance is such a huge

average performance is such a huge culprit in bad science and bad inference

culprit in bad science and bad inference

culprit in bad science and bad inference I cannot state this enough and that’s

I cannot state this enough and that’s

I cannot state this enough and that’s also why you should never ever ever ever

also why you should never ever ever ever

also why you should never ever ever ever ever treat the parameters of your model

ever treat the parameters of your model

ever treat the parameters of your model as if they were constants because they

as if they were constants because they

as if they were constants because they are not it’s also not interesting to ask

are not it’s also not interesting to ask

are not it’s also not interesting to ask the question how uncertain is my data

the question how uncertain is my data

the question how uncertain is my data about this parameter about this fixed

about this parameter about this fixed

about this parameter about this fixed parameter also a nonsense question not

parameter also a nonsense question not

parameter also a nonsense question not interesting and that is why we have to

interesting and that is why we have to

interesting and that is why we have to go back to basics and do it right

go back to basics and do it right

go back to basics and do it right because until we do we will never get

because until we do we will never get

because until we do we will never get further so if I can tie this all

further so if I can tie this all

further so if I can tie this all together I I created sort of a a way for

together I I created sort of a a way for

together I I created sort of a a way for for you to start playing around with

for you to start playing around with

for you to start playing around with this I am I made a docker image

this I am I made a docker image

this I am I made a docker image basically which is called our Bayesian

basically which is called our Bayesian

basically which is called our Bayesian or is the host language figure you can

or is the host language figure you can

or is the host language figure you can basically use whatever language you want

basically use whatever language you want

basically use whatever language you want it doesn’t really matter what I want to

it doesn’t really matter what I want to

it doesn’t really matter what I want to show here is basically how easy it is to

show here is basically how easy it is to

show here is basically how easy it is to deploy a docker container with a

deploy a docker container with a

deploy a docker container with a Bayesian inference engine that can model

Bayesian inference engine that can model

Bayesian inference engine that can model any problem known to man there is

any problem known to man there is

any problem known to man there is nothing you cannot do with this

nothing you cannot do with this

nothing you cannot do with this framework nothing it is more general

framework nothing it is more general

framework nothing it is more general than anything that you have ever tried

than anything that you have ever tried

than anything that you have ever tried because it can simulate everything that

because it can simulate everything that

because it can simulate everything that you have ever tried and most of the

you have ever tried and most of the

you have ever tried and most of the things you have ever tried comes from

things you have ever tried comes from

things you have ever tried comes from probability theory and this is just a

probability theory and this is just a

probability theory and this is just a pure application of probability theory

pure application of probability theory

pure application of probability theory so this is a very easy way to just snap

so this is a very easy way to just snap

so this is a very easy way to just snap that docker container and the best thing

that docker container and the best thing

that docker container and the best thing is that the functions do you write

is that the functions do you write

is that the functions do you write theory in there are automatically

theory in there are automatically

theory in there are automatically converted to rest API so that you can

converted to rest API so that you can

converted to rest API so that you can expose through this docker service so

expose through this docker service so

expose through this docker service so you have a REST API ready inference

you have a REST API ready inference

you have a REST API ready inference machine that is very much true to the

machine that is very much true to the

machine that is very much true to the scientific principle with no limitations

scientific principle with no limitations

scientific principle with no limitations and the only thing you have to pay for

and the only thing you have to pay for

and the only thing you have to pay for it is that you have to think twice now

it is that you have to think twice now

it is that you have to think twice now for those of you doesn’t like or I can

for those of you doesn’t like or I can

for those of you doesn’t like or I can make one version with Python or Julie or

make one version with Python or Julie or

make one version with Python or Julie or whatever it’s it’s not about the

whatever it’s it’s not about the

whatever it’s it’s not about the language are

language are

language are whatever I really want to convey is that

whatever I really want to convey is that

whatever I really want to convey is that modeling needs to be rebooted we need to

modeling needs to be rebooted we need to

modeling needs to be rebooted we need to think again on how we define our models

think again on how we define our models

think again on how we define our models how we specify our malls how we think

how we specify our malls how we think

how we specify our malls how we think about our models how we relate to our

about our models how we relate to our

about our models how we relate to our models we can never ever relate to our

models we can never ever relate to our

models we can never ever relate to our models without uncertainty we will

models without uncertainty we will

models without uncertainty we will always fail that’s why I think that

always fail that’s why I think that

always fail that’s why I think that playing around with this is it’s a good

playing around with this is it’s a good

playing around with this is it’s a good way to to learn more about these things

way to to learn more about these things

way to to learn more about these things this is just an example of how you would

this is just an example of how you would

this is just an example of how you would actually use this so I wrote a very very

actually use this so I wrote a very very

actually use this so I wrote a very very stupid container that it’s called the

stupid container that it’s called the

stupid container that it’s called the stupid weather and it’s stupid because

stupid weather and it’s stupid because

stupid weather and it’s stupid because it always gives you the same answer so

it always gives you the same answer so

it always gives you the same answer so no matter what you send in as parameter

no matter what you send in as parameter

no matter what you send in as parameter it always gives you something stupid so

it always gives you something stupid so

it always gives you something stupid so that that’s just to show you how you

that that’s just to show you how you

that that’s just to show you how you write a function it’s not supposed to

write a function it’s not supposed to

write a function it’s not supposed to convey any intelligence it’s just a

convey any intelligence it’s just a

convey any intelligence it’s just a placeholder it’s just boilerplate code

placeholder it’s just boilerplate code

placeholder it’s just boilerplate code for you to ingest your algorithm but it

for you to ingest your algorithm but it

for you to ingest your algorithm but it shows neatly how how you’re transforming

shows neatly how how you’re transforming

shows neatly how how you’re transforming this to rest api and it’s as simple as

this to rest api and it’s as simple as

this to rest api and it’s as simple as this just talk around and then you have

this just talk around and then you have

this just talk around and then you have it so even if you’re not you know a

it so even if you’re not you know a

it so even if you’re not you know a back-end developer or a full-stack

back-end developer or a full-stack

back-end developer or a full-stack developer it’s still easy to deploy and

developer it’s still easy to deploy and

developer it’s still easy to deploy and run your own solutions and you know

run your own solutions and you know

run your own solutions and you know docker container can run anywhere in the

docker container can run anywhere in the

docker container can run anywhere in the cloud can run on Google they can run on

cloud can run on Google they can run on

cloud can run on Google they can run on Amazon I think even it can run on on

Amazon I think even it can run on on

Amazon I think even it can run on on Microsoft’s cloud sure probably I didn’t

Microsoft’s cloud sure probably I didn’t

Microsoft’s cloud sure probably I didn’t try that but but but I would assume that

try that but but but I would assume that

try that but but but I would assume that they that they can run docker containers

so if I can leave you with one

so if I can leave you with one conclusion it is basically think again

conclusion it is basically think again

conclusion it is basically think again about everything that you were ever

about everything that you were ever

about everything that you were ever taught every statistics class you had

taught every statistics class you had

taught every statistics class you had every applied machine learning class all

every applied machine learning class all

every applied machine learning class all of it

of it

of it rethink its reevaluate it be critical to

rethink its reevaluate it be critical to

rethink its reevaluate it be critical to whatever you were told because I got I

whatever you were told because I got I

whatever you were told because I got I can assure you that in most cases it was

can assure you that in most cases it was

can assure you that in most cases it was a flatulent lie and that lie didn’t

a flatulent lie and that lie didn’t

a flatulent lie and that lie didn’t happen because of the fact that people

happen because of the fact that people

happen because of the fact that people wanted to lie to you it’s based on

wanted to lie to you it’s based on

wanted to lie to you it’s based on ignorance and it’s based on you know

ignorance and it’s based on you know

ignorance and it’s based on you know decades of malpractice in this field

decades of malpractice in this field

decades of malpractice in this field because computation has caught up with

because computation has caught up with

because computation has caught up with us before it was ok to do

us before it was ok to do

us before it was ok to do was done because we had no other choice

was done because we had no other choice

was done because we had no other choice today no longer okay we have all the

today no longer okay we have all the

today no longer okay we have all the choices in the world it’s not hard

choices in the world it’s not hard

choices in the world it’s not hard getting a computational cluster with 200

getting a computational cluster with 200

getting a computational cluster with 200 gigabytes of RAM and the 64 CPUs or even

gigabytes of RAM and the 64 CPUs or even

gigabytes of RAM and the 64 CPUs or even 5000 GPUs those things are at our

5000 GPUs those things are at our

5000 GPUs those things are at our disposal we don’t need to take the same

disposal we don’t need to take the same

disposal we don’t need to take the same shortcuts as we did dangerous shortcuts

shortcuts as we did dangerous shortcuts

shortcuts as we did dangerous shortcuts no less so I hope you will think about

no less so I hope you will think about

no less so I hope you will think about that another thing is that whenever

that another thing is that whenever

that another thing is that whenever you’re solving a problem I would like

you’re solving a problem I would like

you’re solving a problem I would like you to think about that whatever problem

you to think about that whatever problem

you to think about that whatever problem you’re solving whatever machine learning

you’re solving whatever machine learning

you’re solving whatever machine learning application you’re writing it is an

application you’re writing it is an

application you’re writing it is an application of the scientific principle

application of the scientific principle

application of the scientific principle please stay true to that there’s a

please stay true to that there’s a

please stay true to that there’s a reason why we have it science is a way

reason why we have it science is a way

reason why we have it science is a way for us to not be biased science is a way

for us to not be biased science is a way

for us to not be biased science is a way for us to discover truths about the

for us to discover truths about the

for us to discover truths about the world that we live in this should not be

world that we live in this should not be

world that we live in this should not be ignored or taken lightly and that’s why

ignored or taken lightly and that’s why

ignored or taken lightly and that’s why you know crazy people like Trump can get

you know crazy people like Trump can get

you know crazy people like Trump can get away with saying that there is no such

away with saying that there is no such

away with saying that there is no such thing as global warming because he does

thing as global warming because he does

thing as global warming because he does not adhere to the scientific principle

not adhere to the scientific principle

not adhere to the scientific principle so you know you can either be Trump or

so you know you can either be Trump or

so you know you can either be Trump or you can stay true to the scientific

you can stay true to the scientific

you can stay true to the scientific principle and those two are the only

principle and those two are the only

principle and those two are the only extremes my friends so another thing

extremes my friends so another thing

extremes my friends so another thing that I want to say is always state your

that I want to say is always state your

that I want to say is always state your mind whatever you know about the problem

mind whatever you know about the problem

mind whatever you know about the problem I assure you that that knowledge is

I assure you that that knowledge is

I assure you that that knowledge is critical and important do not pretend

critical and important do not pretend

critical and important do not pretend and fall into this trap more I want to

and fall into this trap more I want to

and fall into this trap more I want to do unbiased research there’s no such

do unbiased research there’s no such

do unbiased research there’s no such thing no such thing understand this

thing no such thing understand this

thing no such thing understand this there is no bias free research there is

there is no bias free research there is

there is no bias free research there is no scientific result that can be

no scientific result that can be

no scientific result that can be achieved without assumption you are free

achieved without assumption you are free

achieved without assumption you are free to evaluate your assumptions again

to evaluate your assumptions again

to evaluate your assumptions again restate them that’s good

restate them that’s good

restate them that’s good that’s progress that is science but

that’s progress that is science but

that’s progress that is science but before you’re observing data state your

before you’re observing data state your

before you’re observing data state your mind and you have to because otherwise

mind and you have to because otherwise

mind and you have to because otherwise you got nothing you got a result out but

you got nothing you got a result out but

you got nothing you got a result out but that was just picked out of thin air

that was just picked out of thin air

that was just picked out of thin air it’s nothing special about those

it’s nothing special about those

it’s nothing special about those coefficients that came

coefficients that came

coefficients that came nothing at all and until people realize

nothing at all and until people realize

nothing at all and until people realize this we will still have applications

this we will still have applications

this we will still have applications that believe that Central Park is the

that believe that Central Park is the

that believe that Central Park is the red light and it is not even though that

red light and it is not even though that

red light and it is not even though that might look like it from from a different

might look like it from from a different

might look like it from from a different scale we need to do better and we can’t

scale we need to do better and we can’t

scale we need to do better and we can’t do better and maybe the most important

do better and maybe the most important

do better and maybe the most important thing of all is that with this framework

thing of all is that with this framework

thing of all is that with this framework and with this principle of thinking you

and with this principle of thinking you

and with this principle of thinking you are able to be free you are able to be

are able to be free you are able to be

are able to be free you are able to be creative and most of all you are able to

creative and most of all you are able to

creative and most of all you are able to have so much more fun building your

have so much more fun building your

have so much more fun building your models because you are not forced into a

models because you are not forced into a

models because you are not forced into a paradigm that someone else defined for

paradigm that someone else defined for

paradigm that someone else defined for you because it made the math nice thanks

I think we have time for one question

I think we have time for one question somebody asked where can I read more

somebody asked where can I read more

somebody asked where can I read more about this any good resources yes there

about this any good resources yes there

about this any good resources yes there are a few great books that I can slowly

are a few great books that I can slowly

are a few great books that I can slowly recommend and I will do them in

recommend and I will do them in

recommend and I will do them in mathematical requirement order so if

mathematical requirement order so if

mathematical requirement order so if you’re a hardcore mathematician or a

you’re a hardcore mathematician or a

you’re a hardcore mathematician or a theoretical physicist or anyone with a

theoretical physicist or anyone with a

theoretical physicist or anyone with a computational background with a deep

computational background with a deep

computational background with a deep understanding of mathematics then you

understanding of mathematics then you

understanding of mathematics then you can go directly to read a book called

can go directly to read a book called

can go directly to read a book called the handbook of Markov chain Monte Carlo

the handbook of Markov chain Monte Carlo

the handbook of Markov chain Monte Carlo that is a very technical book and it

that is a very technical book and it

that is a very technical book and it describes the processes behind the

describes the processes behind the

describes the processes behind the probabilistic modeling if you are a

probabilistic modeling if you are a

probabilistic modeling if you are a little bit less mathematical but still

little bit less mathematical but still

little bit less mathematical but still has quite a bit of mathematics you

has quite a bit of mathematics you

has quite a bit of mathematics you should read the section about graphical

should read the section about graphical

should read the section about graphical models made by bishop and then a book

models made by bishop and then a book

models made by bishop and then a book called machine learning and pattern

called machine learning and pattern

called machine learning and pattern recognition but the most important book

recognition but the most important book

recognition but the most important book of all perhaps to read is is one of the

of all perhaps to read is is one of the

of all perhaps to read is is one of the books called statistical rethinking and

books called statistical rethinking and

books called statistical rethinking and that book explains a lot of the concepts

that book explains a lot of the concepts

that book explains a lot of the concepts that I’ve been badgering now that you

that I’ve been badgering now that you

that I’ve been badgering now that you know somewhere along the line we just

know somewhere along the line we just

know somewhere along the line we just got lost that has both text that you

got lost that has both text that you

got lost that has both text that you know is consumable by by people and it

know is consumable by by people and it

know is consumable by by people and it has a little bit of math so you can sort

has a little bit of math so you can sort

has a little bit of math so you can sort of put it in context those are really

of put it in context those are really

of put it in context those are really the books I would recommend in this okay

the books I would recommend in this okay

the books I would recommend in this okay thank you and I’ll tweet their resources

thank you and I’ll tweet their resources

thank you and I’ll tweet their resources to the go to hashtag go to CPA okay

to the go to hashtag go to CPA okay

to the go to hashtag go to CPA okay thank you my thank you

thank you my thank you

thank you my thank you [Applause]

”

GOTO 2017 • Improving Business Decision Making with Bayesian Artificial Intelligence • Michael Green

Be First to Comment

Leave a Reply Cancel reply