GOTO 2016 • Deep Learning - What is it and What It Can Do For You • Diogo Moitinho de Almeida

cool thank you I’m do go as a bit of

cool thank you I’m do go as a bit of background I have a very math and

background I have a very math and

background I have a very math and computer background which is very good

computer background which is very good

computer background which is very good for deep learning list of achievements

for deep learning list of achievements

for deep learning list of achievements on why you should listen to me is that I

on why you should listen to me is that I

on why you should listen to me is that I currently work at a super cool company

currently work at a super cool company

currently work at a super cool company that does uses deep learning to make

that does uses deep learning to make

that does uses deep learning to make faster and more accurate medical

faster and more accurate medical

faster and more accurate medical diagnosis and in the past lives I’ve one

diagnosis and in the past lives I’ve one

diagnosis and in the past lives I’ve one whole lot of like international math

whole lot of like international math

whole lot of like international math competitions and some programming

competitions and some programming

competitions and some programming competitions as well and today I’m going

competitions as well and today I’m going

competitions as well and today I’m going to chat about what deep learning is and

to chat about what deep learning is and

to chat about what deep learning is and what it can do for you feel free to ask

what it can do for you feel free to ask

what it can do for you feel free to ask questions at any time if I can ask that

questions at any time if I can ask that

questions at any time if I can ask that right they can ask that right Simon cool

right they can ask that right Simon cool

right they can ask that right Simon cool so yeah feel free to ask questions any

so yeah feel free to ask questions any

so yeah feel free to ask questions any time just seal them out especially if

time just seal them out especially if

time just seal them out especially if you think I’m lying to you so what is

you think I’m lying to you so what is

you think I’m lying to you so what is deep learning gonna start with a

deep learning gonna start with a

deep learning gonna start with a disclaimer deep learning is actually

disclaimer deep learning is actually

disclaimer deep learning is actually pretty complicated it’s hard to be very

pretty complicated it’s hard to be very

pretty complicated it’s hard to be very general about everything and be correct

general about everything and be correct

general about everything and be correct so when in doubt I’m going to favor

so when in doubt I’m going to favor

so when in doubt I’m going to favor generality so if you’re familiar with

generality so if you’re familiar with

generality so if you’re familiar with the appointing already it’s going to

the appointing already it’s going to

the appointing already it’s going to sound like I’m lying a lot but in

sound like I’m lying a lot but in

sound like I’m lying a lot but in reality like this is just to like kind

reality like this is just to like kind

reality like this is just to like kind of give the hot really high level of it

of give the hot really high level of it

of give the hot really high level of it and also whenever possible going to

and also whenever possible going to

and also whenever possible going to favor some of the shortcuts that might

favor some of the shortcuts that might

favor some of the shortcuts that might not be a hundred percent correct but

not be a hundred percent correct but

not be a hundred percent correct but should give the correct mental model of

should give the correct mental model of

should give the correct mental model of how these things work but if you have

how these things work but if you have

how these things work but if you have any questions feel free to ask so from

any questions feel free to ask so from

any questions feel free to ask so from super high level there’s a very lace a

super high level there’s a very lace a

super high level there’s a very lace a lot of like different levels of

lot of like different levels of

lot of like different levels of hierarchy here in the ecosystem there’s

hierarchy here in the ecosystem there’s

hierarchy here in the ecosystem there’s artificial intelligence which is a

artificial intelligence which is a

artificial intelligence which is a superset of like everything an example

superset of like everything an example

superset of like everything an example would be IBM Watson where lots of hand

would be IBM Watson where lots of hand

would be IBM Watson where lots of hand coded rules and uses extremely large

coded rules and uses extremely large

coded rules and uses extremely large amounts of expert manpower built to do a

amounts of expert manpower built to do a

amounts of expert manpower built to do a specific task there’s machine learning

specific task there’s machine learning

specific task there’s machine learning which is subset of that an example of

which is subset of that an example of

which is subset of that an example of this would be like Google ad click

this would be like Google ad click

this would be like Google ad click prediction and how you do this is rather

prediction and how you do this is rather

prediction and how you do this is rather than using tons and tons of hard coded

than using tons and tons of hard coded

than using tons and tons of hard coded rules you start using more examples to

rules you start using more examples to

rules you start using more examples to figure out how to combine some hand

figure out how to combine some hand

figure out how to combine some hand coded statistics to predict the

coded statistics to predict the

coded statistics to predict the probability of for example an ad click

probability of for example an ad click

probability of for example an ad click at a slightly deeper level you have

at a slightly deeper level you have

at a slightly deeper level you have representation learning which is

representation learning which is

representation learning which is sometimes seen as one layer deep

sometimes seen as one layer deep

sometimes seen as one layer deep learning so sometimes referred like

learning so sometimes referred like

learning so sometimes referred like these levels are called shallow learning

these levels are called shallow learning

these levels are called shallow learning if you’re trying to start a fight

if you’re trying to start a fight

if you’re trying to start a fight and an example of this would be netflix

and an example of this would be netflix

and an example of this would be netflix movie recommendation where the

movie recommendation where the

movie recommendation where the statistics of what you even know about

statistics of what you even know about

statistics of what you even know about each movie is learned from data but

each movie is learned from data but

each movie is learned from data but you’re still learning a simple

you’re still learning a simple

you’re still learning a simple combination of how these features go

combination of how these features go

combination of how these features go together and after a few levels you get

together and after a few levels you get

together and after a few levels you get into deep learning so this would be an

into deep learning so this would be an

into deep learning so this would be an example of figuring out diseases from

example of figuring out diseases from

example of figuring out diseases from images where instead of you know having

images where instead of you know having

images where instead of you know having a layer of manual statistics that are

a layer of manual statistics that are

a layer of manual statistics that are learned and then combining together you

learned and then combining together you

learned and then combining together you might learn all of these statistics at

might learn all of these statistics at

might learn all of these statistics at the same time in tens hundreds or even

the same time in tens hundreds or even

the same time in tens hundreds or even thousands of steps which is what some

thousands of steps which is what some

thousands of steps which is what some people use nowadays this is probably not

people use nowadays this is probably not

people use nowadays this is probably not a common view of what deep learning is

a common view of what deep learning is

a common view of what deep learning is but I think the easiest view of how to

but I think the easiest view of how to

but I think the easiest view of how to see it is deep line exists an interface

see it is deep line exists an interface

see it is deep line exists an interface and this interface has roughly two

and this interface has roughly two

and this interface has roughly two methods as the first method you have a

methods as the first method you have a

methods as the first method you have a forward pass and this is definitely the

forward pass and this is definitely the

forward pass and this is definitely the easy part given arbitrary input make

easy part given arbitrary input make

easy part given arbitrary input make arbitrary output anyone can do this part

arbitrary output anyone can do this part

arbitrary output anyone can do this part this is really easy the the trick that

this is really easy the the trick that

this is really easy the the trick that makes it work is the backwards pass so

makes it work is the backwards pass so

makes it work is the backwards pass so given a desired change in the output you

given a desired change in the output you

given a desired change in the output you want to be able to transform this into a

want to be able to transform this into a

want to be able to transform this into a desired change in the input and once you

desired change in the input and once you

desired change in the input and once you have these you can make arbitrarily

have these you can make arbitrarily

have these you can make arbitrarily complex things by chaining thing up

complex things by chaining thing up

complex things by chaining thing up changing them up into a directed acyclic

changing them up into a directed acyclic

changing them up into a directed acyclic graph and if this sounds too good to be

graph and if this sounds too good to be

graph and if this sounds too good to be true especially with how we design the

true especially with how we design the

true especially with how we design the forward pass because if you just say

forward pass because if you just say

forward pass because if you just say arbitrary input an arbitrary output of

arbitrary input an arbitrary output of

arbitrary input an arbitrary output of course you can do anything you want but

course you can do anything you want but

course you can do anything you want but the hard part is how you define that

the hard part is how you define that

the hard part is how you define that backwards pass because as you make your

backwards pass because as you make your

backwards pass because as you make your forward pass more and more complicated

forward pass more and more complicated

forward pass more and more complicated like if you have like some really crazy

like if you have like some really crazy

like if you have like some really crazy function it becomes hard to define how

function it becomes hard to define how

function it becomes hard to define how to map the inputs back into outputs so

to map the inputs back into outputs so

to map the inputs back into outputs so by keeping these things simple and

by keeping these things simple and

by keeping these things simple and combining them together it gives us this

combining them together it gives us this

combining them together it gives us this almost composable language of modules

almost composable language of modules

almost composable language of modules that allow us to do the things we want

that allow us to do the things we want

that allow us to do the things we want to do so once you have this interface

to do so once you have this interface

to do so once you have this interface you just can build up from this once you

you just can build up from this once you

you just can build up from this once you once you have that you could have a

once you have that you could have a

once you have that you could have a bunch of these modules that satisfy this

bunch of these modules that satisfy this

bunch of these modules that satisfy this interface as a side note a bunch of

interface as a side note a bunch of

interface as a side note a bunch of these modules will be parametric which

these modules will be parametric which

these modules will be parametric which means that they have parameters which

means that they have parameters which

means that they have parameters which roughly means that they’re stateful and

roughly means that they’re stateful and

roughly means that they’re stateful and they’re stateful means that once you

they’re stateful means that once you

they’re stateful means that once you have the state

have the state

have the state state changes and it’s this change in

state changes and it’s this change in

state changes and it’s this change in state that allows you to change this

state that allows you to change this

state that allows you to change this function from something that you just

function from something that you just

function from something that you just cobbled together to something that’s it

cobbled together to something that’s it

cobbled together to something that’s it gets closer and closer to what you want

gets closer and closer to what you want

gets closer and closer to what you want to do and once you have a frame rail

to do and once you have a frame rail

to do and once you have a frame rail language of what you want to do now you

language of what you want to do now you

language of what you want to do now you can start doing the tasks that you care

can start doing the tasks that you care

can start doing the tasks that you care about you and deepen you always define a

about you and deepen you always define a

about you and deepen you always define a loss or a cost depending on how you want

loss or a cost depending on how you want

loss or a cost depending on how you want to define this and this is something you

to define this and this is something you

to define this and this is something you want to minimize for reasons that I’d

want to minimize for reasons that I’d

want to minimize for reasons that I’d happily explain this has always be a

happily explain this has always be a

happily explain this has always be a scalar so it can’t be several costs the

scalar so it can’t be several costs the

scalar so it can’t be several costs the same time you have to squash it down

same time you have to squash it down

same time you have to squash it down into a single thing that you care about

into a single thing that you care about

into a single thing that you care about and once you squish this number like

and once you squish this number like

and once you squish this number like everything care about in the world into

everything care about in the world into

everything care about in the world into a single number now you can start using

a single number now you can start using

a single number now you can start using deep learning to optimize this you

deep learning to optimize this you

deep learning to optimize this you create an architecture which is the

create an architecture which is the

create an architecture which is the function that you want to do and this

function that you want to do and this

function that you want to do and this would be how you compose together these

would be how you compose together these

would be how you compose together these modules that I talked about so the way

modules that I talked about so the way

modules that I talked about so the way you connect them together changes the

you connect them together changes the

you connect them together changes the function that you have and the kind of

function that you have and the kind of

function that you have and the kind of representation power it has and that

representation power it has and that

representation power it has and that becomes the hard part after that you

becomes the hard part after that you

becomes the hard part after that you initialize the parameters and you train

initialize the parameters and you train

initialize the parameters and you train this architecture by repeatedly updating

this architecture by repeatedly updating

this architecture by repeatedly updating the parameters to minimize the cost so

the parameters to minimize the cost so

the parameters to minimize the cost so you go forward through the network to

you go forward through the network to

you go forward through the network to get the things you care about and you go

get the things you care about and you go

get the things you care about and you go backwards through the network to change

backwards through the network to change

backwards through the network to change the parameters to be slightly better for

the parameters to be slightly better for

the parameters to be slightly better for your costs and you repeat these many

your costs and you repeat these many

your costs and you repeat these many times until you get a function that

times until you get a function that

times until you get a function that you’re really really happy with and

you’re really really happy with and

you’re really really happy with and solves whatever problem you want and at

solves whatever problem you want and at

solves whatever problem you want and at the end you just use that function just

the end you just use that function just

the end you just use that function just the forward pass how to implement the

the forward pass how to implement the

the forward pass how to implement the backwards pass is in general we almost

backwards pass is in general we almost

backwards pass is in general we almost always use the chain rule this is really

always use the chain rule this is really

always use the chain rule this is really nice because it makes implementing the

nice because it makes implementing the

nice because it makes implementing the backwards pass easy how this works is if

backwards pass easy how this works is if

backwards pass easy how this works is if you have your output as sum function of

you have your output as sum function of

you have your output as sum function of some X and you have the partial

some X and you have the partial

some X and you have the partial derivative of your output dld f you can

derivative of your output dld f you can

derivative of your output dld f you can get dl DX by simply multiplying the

get dl DX by simply multiplying the

get dl DX by simply multiplying the partial derivative of DF DX and the nice

partial derivative of DF DX and the nice

partial derivative of DF DX and the nice part about this is that dl DF is gotten

part about this is that dl DF is gotten

part about this is that dl DF is gotten from the rest of your network and DF DX

from the rest of your network and DF DX

from the rest of your network and DF DX is gotten just from your module so this

is gotten just from your module so this

is gotten just from your module so this allows you to chain these things

allows you to chain these things

allows you to chain these things together in a way that only requires

together in a way that only requires

together in a way that only requires local information in order to get this

local information in order to get this

local information in order to get this backward pass it’s very nice there’s

backward pass it’s very nice there’s

backward pass it’s very nice there’s theoretical reasons of why this is a

theoretical reasons of why this is a

theoretical reasons of why this is a good way to do this and perhaps the best

good way to do this and perhaps the best

good way to do this and perhaps the best part about this is some frameworks make

part about this is some frameworks make

part about this is some frameworks make us completely automatic by the

us completely automatic by the

us completely automatic by the finding a forward pass using automatic

finding a forward pass using automatic

finding a forward pass using automatic differentiation you can figure out how

differentiation you can figure out how

differentiation you can figure out how to make a backward pass automatically so

to make a backward pass automatically so

to make a backward pass automatically so it becomes basically as easy as defining

it becomes basically as easy as defining

it becomes basically as easy as defining arbitrary functions as so you actually

arbitrary functions as so you actually

arbitrary functions as so you actually do get this benefit of just define

do get this benefit of just define

do get this benefit of just define arbitrary things return arbitrary things

arbitrary things return arbitrary things

arbitrary things return arbitrary things as long as all the operations you do are

as long as all the operations you do are

as long as all the operations you do are differentiable you can just make it work

differentiable you can just make it work

differentiable you can just make it work like magic and optimize it and this is

like magic and optimize it and this is

like magic and optimize it and this is literally how people do this in practice

literally how people do this in practice

literally how people do this in practice updating the parameters these are just

updating the parameters these are just

updating the parameters these are just minor details to like get an

minor details to like get an

minor details to like get an understanding of how this works is that

understanding of how this works is that

understanding of how this works is that once you have your existing parameters

once you have your existing parameters

once you have your existing parameters you get your gradient and you take a

you get your gradient and you take a

you get your gradient and you take a step in the opposite direction of the

step in the opposite direction of the

step in the opposite direction of the gradient and the partial derivatives

gradient and the partial derivatives

gradient and the partial derivatives tells us how to change the parameters to

tells us how to change the parameters to

tells us how to change the parameters to increase or decrease the cost that we

increase or decrease the cost that we

increase or decrease the cost that we care about an important word to note to

care about an important word to note to

care about an important word to note to know though is and that people always

know though is and that people always

know though is and that people always use I think kind of makes it more

use I think kind of makes it more

use I think kind of makes it more complicated it’s a big word is back

complicated it’s a big word is back

complicated it’s a big word is back propagation or back prop for short this

propagation or back prop for short this

propagation or back prop for short this is has a longer name called reverse mode

is has a longer name called reverse mode

is has a longer name called reverse mode automatic differentiation which sounds

automatic differentiation which sounds

automatic differentiation which sounds pretty complicated but this is just the

pretty complicated but this is just the

pretty complicated but this is just the chain rule plus dynamic programming I

chain rule plus dynamic programming I

chain rule plus dynamic programming I assume that I just talked about change

assume that I just talked about change

assume that I just talked about change were like some people are familiar with

were like some people are familiar with

were like some people are familiar with dynamic programming but this is just

dynamic programming but this is just

dynamic programming but this is just cashing and the idea would be when you

cashing and the idea would be when you

cashing and the idea would be when you have a computation graph this is a very

have a computation graph this is a very

have a computation graph this is a very simple computation graph y equals C

simple computation graph y equals C

simple computation graph y equals C times the C equals a plus B D equals B

times the C equals a plus B D equals B

times the C equals a plus B D equals B plus 1 the idea would be um you traverse

plus 1 the idea would be um you traverse

plus 1 the idea would be um you traverse the graph from the top to the bottom and

the graph from the top to the bottom and

the graph from the top to the bottom and by doing it from the top to the bottom

by doing it from the top to the bottom

by doing it from the top to the bottom instead of the bottom to the top you can

instead of the bottom to the top you can

instead of the bottom to the top you can cash the intermediate so that I used

cash the intermediate so that I used

cash the intermediate so that I used many times in the graph and by cashing

many times in the graph and by cashing

many times in the graph and by cashing these intermediates you get something

these intermediates you get something

these intermediates you get something that’s much more efficient than if you

that’s much more efficient than if you

that’s much more efficient than if you were going to do a naive solution and

were going to do a naive solution and

were going to do a naive solution and this allows you to get gradients that

this allows you to get gradients that

this allows you to get gradients that are computable in linear time in the

are computable in linear time in the

are computable in linear time in the size of your graph so you basically

size of your graph so you basically

size of your graph so you basically evaluate each node once and this is a

evaluate each node once and this is a

evaluate each node once and this is a really nice property this makes it all

really nice property this makes it all

really nice property this makes it all really efficient and that’s basically it

really efficient and that’s basically it

really efficient and that’s basically it for the basics from a high level deep

for the basics from a high level deep

for the basics from a high level deep learning is just composing optimized

learning is just composing optimized

learning is just composing optimized abul subcomponents optimize the Ville

abul subcomponents optimize the Ville

abul subcomponents optimize the Ville almost always means differentiable

almost always means differentiable

almost always means differentiable differentiable means that you can do

differentiable means that you can do

differentiable means that you can do backdrop backdrop is just the chain rule

backdrop backdrop is just the chain rule

backdrop backdrop is just the chain rule in dynamic programming when once you get

in dynamic programming when once you get

in dynamic programming when once you get to practical deep learning normally

to practical deep learning normally

to practical deep learning normally you have to combine this with gradient

you have to combine this with gradient

you have to combine this with gradient descent software and a data set but you

descent software and a data set but you

descent software and a data set but you care about and the space of software is

care about and the space of software is

care about and the space of software is there’s a very rich space of software

there’s a very rich space of software

there’s a very rich space of software that will talk a little bit about in the

that will talk a little bit about in the

that will talk a little bit about in the future but it this these things are

future but it this these things are

future but it this these things are solved for you so you can do deep

solved for you so you can do deep

solved for you so you can do deep learning without even knowing how to

learning without even knowing how to

learning without even knowing how to calculate the gradient yourself so while

calculate the gradient yourself so while

calculate the gradient yourself so while we can do arbitrarily complicated things

we can do arbitrarily complicated things

we can do arbitrarily complicated things there are a few standard modules that

there are a few standard modules that

there are a few standard modules that are the main workhorse of deep learning

are the main workhorse of deep learning

are the main workhorse of deep learning today and the goal of this section is

today and the goal of this section is

today and the goal of this section is going to be to get a high level

going to be to get a high level

going to be to get a high level understanding of each since all of them

understanding of each since all of them

understanding of each since all of them can be very incredibly nuanced but these

can be very incredibly nuanced but these

can be very incredibly nuanced but these standard modules will cover almost all

standard modules will cover almost all

standard modules will cover almost all of what’s happening in papers the

of what’s happening in papers the

of what’s happening in papers the simplest of them is perhaps the simplest

simplest of them is perhaps the simplest

simplest of them is perhaps the simplest is just matrix multiplication it has

is just matrix multiplication it has

is just matrix multiplication it has many names the fully connected layer

many names the fully connected layer

many names the fully connected layer sometimes shortened to FC sometimes

sometimes shortened to FC sometimes

sometimes shortened to FC sometimes called dense because you have lots of

called dense because you have lots of

called dense because you have lots of connections linear layer because it’s a

connections linear layer because it’s a

connections linear layer because it’s a linear transformation or a fine because

linear transformation or a fine because

linear transformation or a fine because sometimes there’s a bias and the matrix

sometimes there’s a bias and the matrix

sometimes there’s a bias and the matrix multiplication is basically every time

multiplication is basically every time

multiplication is basically every time you have a neural network diagram all of

you have a neural network diagram all of

you have a neural network diagram all of these arrows correspond to the matrix

these arrows correspond to the matrix

these arrows correspond to the matrix multiplication so when you have a

multiplication so when you have a

multiplication so when you have a diagram that looks complicated that’s

diagram that looks complicated that’s

diagram that looks complicated that’s from that’s from this kind of thing and

from that’s from this kind of thing and

from that’s from this kind of thing and you can interpret this as a weight from

you can interpret this as a weight from

you can interpret this as a weight from every input to every output so if you

every input to every output so if you

every input to every output so if you have em inputs and you have n outputs

have em inputs and you have n outputs

have em inputs and you have n outputs you have M by n way to the transform

you have M by n way to the transform

you have M by n way to the transform your inputs to outputs and its

your inputs to outputs and its

your inputs to outputs and its implementation is literally a matrix

implementation is literally a matrix

implementation is literally a matrix multiplication and W in this case it’s

multiplication and W in this case it’s

multiplication and W in this case it’s generally a parameter which means you

generally a parameter which means you

generally a parameter which means you learn the connections from inputs to

learn the connections from inputs to

learn the connections from inputs to outputs this on its own is not powerful

outputs this on its own is not powerful

outputs this on its own is not powerful enough so you need at least one more

enough so you need at least one more

enough so you need at least one more thing which is non linearity the

thing which is non linearity the

thing which is non linearity the original non-linearity is called a

original non-linearity is called a

original non-linearity is called a sigmoid it’s just this function it has

sigmoid it’s just this function it has

sigmoid it’s just this function it has the nice property that it Maps reals

the nice property that it Maps reals

the nice property that it Maps reals into the the space 0 1 and it can be

into the the space 0 1 and it can be

into the the space 0 1 and it can be interpreted as a probability but that’s

interpreted as a probability but that’s

interpreted as a probability but that’s not as important that’s just being

not as important that’s just being

not as important that’s just being nonlinear and the reason the

nonlinear and the reason the

nonlinear and the reason the non-linearity is important is if you

non-linearity is important is if you

non-linearity is important is if you have this kind of like neural network

have this kind of like neural network

have this kind of like neural network when you stack up the layers

when you stack up the layers

when you stack up the layers back-to-back if you had no non-linearity

back-to-back if you had no non-linearity

back-to-back if you had no non-linearity in the middle this would just be two

in the middle this would just be two

in the middle this would just be two matrix multiplies back-to-back and what

matrix multiplies back-to-back and what

matrix multiplies back-to-back and what would happen is you could just combine

would happen is you could just combine

would happen is you could just combine this into a single matrix multiply so if

this into a single matrix multiply so if

this into a single matrix multiply so if you have that 100 layer purely linear

you have that 100 layer purely linear

you have that 100 layer purely linear network of just matrix multiplications

network of just matrix multiplications

network of just matrix multiplications while this thing is pretty complicated

while this thing is pretty complicated

while this thing is pretty complicated and you do all the work of a real neural

and you do all the work of a real neural

and you do all the work of a real neural network you could actually flatten it

network you could actually flatten it

network you could actually flatten it into a single weight matrix because of a

into a single weight matrix because of a

into a single weight matrix because of a linear composition of linearity so this

linear composition of linearity so this

linear composition of linearity so this was the original one people like it

was the original one people like it

was the original one people like it because it’s very similar to what people

because it’s very similar to what people

because it’s very similar to what people use before they really got how machine

use before they really got how machine

use before they really got how machine learning words which was just binary

learning words which was just binary

learning words which was just binary threshold injustice fired back in the

threshold injustice fired back in the

threshold injustice fired back in the day and the cool part is with just those

day and the cool part is with just those

day and the cool part is with just those two units you know how to make a neural

two units you know how to make a neural

two units you know how to make a neural network you can just simply do you get

network you can just simply do you get

network you can just simply do you get your input you apply the matrix multiply

your input you apply the matrix multiply

your input you apply the matrix multiply you apply a sigmoid you apply another

you apply a sigmoid you apply another

you apply a sigmoid you apply another matrix multiply and you have one and

matrix multiply and you have one and

matrix multiply and you have one and these are called multi-layer perceptrons

these are called multi-layer perceptrons

these are called multi-layer perceptrons when you only have matrix multiplies and

when you only have matrix multiplies and

when you only have matrix multiplies and nonlinearities and the cool part is that

nonlinearities and the cool part is that

nonlinearities and the cool part is that there’s a serum on this that this simple

there’s a serum on this that this simple

there’s a serum on this that this simple architecture like literally three

architecture like literally three

architecture like literally three functions can solve can approximate

functions can solve can approximate

functions can solve can approximate arbitrary functions which means it can

arbitrary functions which means it can

arbitrary functions which means it can solve any problem that you care about

solve any problem that you care about

solve any problem that you care about there’s a cool theorem on this the idea

there’s a cool theorem on this the idea

there’s a cool theorem on this the idea is that if you make the middle big

is that if you make the middle big

is that if you make the middle big enough you can calculate basically any

enough you can calculate basically any

enough you can calculate basically any function the downside is that just

function the downside is that just

function the downside is that just because it can it doesn’t mean it will

because it can it doesn’t mean it will

because it can it doesn’t mean it will and a single layer multi-layer

and a single layer multi-layer

and a single layer multi-layer perceptron often causes more problems

perceptron often causes more problems

perceptron often causes more problems than it solves so this is why there was

than it solves so this is why there was

than it solves so this is why there was an AI winter in the 90s just because

an AI winter in the 90s just because

an AI winter in the 90s just because these things are your kind of terrible

these things are your kind of terrible

these things are your kind of terrible but people have gotten a lot better it

but people have gotten a lot better it

but people have gotten a lot better it it’s and that now now neural networks

it’s and that now now neural networks

it’s and that now now neural networks are cool and as a disclaimer these are

are cool and as a disclaimer these are

are cool and as a disclaimer these are these are neural networks when you have

these are neural networks when you have

these are neural networks when you have so this is would be a multi-layer

so this is would be a multi-layer

so this is would be a multi-layer perceptron these are neural networks

perceptron these are neural networks

perceptron these are neural networks everything I’m talking about today is

everything I’m talking about today is

everything I’m talking about today is still a neural network but these are

still a neural network but these are

still a neural network but these are specifically when you talk about the

specifically when you talk about the

specifically when you talk about the multi-layer perceptron that is what this

multi-layer perceptron that is what this

multi-layer perceptron that is what this is so since then people have made better

is so since then people have made better

is so since then people have made better nonlinearities this is probably majority

nonlinearities this is probably majority

nonlinearities this is probably majority of the improvement between 1990 and 2012

of the improvement between 1990 and 2012

of the improvement between 1990 and 2012 unfortunately which is you have like a

unfortunately which is you have like a

unfortunately which is you have like a kind of smarter non-linearity so instead

kind of smarter non-linearity so instead

kind of smarter non-linearity so instead of taking this weird squiggly function

of taking this weird squiggly function

of taking this weird squiggly function you just do a threshold so anything

you just do a threshold so anything

you just do a threshold so anything that’s negative you just turn it into 0

that’s negative you just turn it into 0

that’s negative you just turn it into 0 this is actually the most popular

this is actually the most popular

this is actually the most popular non-linearity nowadays it does

non-linearity nowadays it does

non-linearity nowadays it does incredibly well

incredibly well

incredibly well some really nice optimization properties

some really nice optimization properties

some really nice optimization properties in particular when you have zero like is

in particular when you have zero like is

in particular when you have zero like is this not only is very linear so it works

this not only is very linear so it works

this not only is very linear so it works very well with a chain rule and this

very well with a chain rule and this

very well with a chain rule and this thing is used almost everywhere nowadays

thing is used almost everywhere nowadays

thing is used almost everywhere nowadays especially in the middle of a neural

especially in the middle of a neural

especially in the middle of a neural network there is a softmax which you can

network there is a softmax which you can

network there is a softmax which you can think of as converting a bunch of

think of as converting a bunch of

think of as converting a bunch of numbers into a discrete probability

numbers into a discrete probability

numbers into a discrete probability distribution so the math of it is p

distribution so the math of it is p

distribution so the math of it is p equals you exponentiate your input and

equals you exponentiate your input and

equals you exponentiate your input and then you divide it by the sum of the

then you divide it by the sum of the

then you divide it by the sum of the inputs you can think of the explanation

inputs you can think of the explanation

inputs you can think of the explanation is turning it all the numbers into

is turning it all the numbers into

is turning it all the numbers into positive and the dividing by the sum is

positive and the dividing by the sum is

positive and the dividing by the sum is a normalization term there’s some very

a normalization term there’s some very

a normalization term there’s some very nice properties about this and it’s used

nice properties about this and it’s used

nice properties about this and it’s used as the final layer for classification

as the final layer for classification

as the final layer for classification problems and it’s used in almost every

problems and it’s used in almost every

problems and it’s used in almost every neural network cool that was the easy

neural network cool that was the easy

neural network cool that was the easy part this gets complicated I like feel

part this gets complicated I like feel

part this gets complicated I like feel free to ask questions during this I

free to ask questions during this I

free to ask questions during this I normally explain this with a whiteboard

normally explain this with a whiteboard

normally explain this with a whiteboard and it normally is complicated even with

and it normally is complicated even with

and it normally is complicated even with a whiteboard but i’ll try to go through

a whiteboard but i’ll try to go through

a whiteboard but i’ll try to go through this so a convolution is a the main

this so a convolution is a the main

this so a convolution is a the main workhorse for deep learning on images

workhorse for deep learning on images

workhorse for deep learning on images and deep learning and images is

and deep learning and images is

and deep learning and images is basically it’s kind of where this

basically it’s kind of where this

basically it’s kind of where this revolution started so it’s very very

revolution started so it’s very very

revolution started so it’s very very important it’s probably the place where

important it’s probably the place where

important it’s probably the place where deep learning is the most advanced so

deep learning is the most advanced so

deep learning is the most advanced so it’s a very important primitive and I

it’s a very important primitive and I

it’s a very important primitive and I think the very cool primitive to

think the very cool primitive to

think the very cool primitive to understand because you really realize

understand because you really realize

understand because you really realize like how beautiful the framework is when

like how beautiful the framework is when

like how beautiful the framework is when you see like wow this thing sounds

you see like wow this thing sounds

you see like wow this thing sounds pretty complicated but it you can just

pretty complicated but it you can just

pretty complicated but it you can just plug it in and you’re doing that need to

plug it in and you’re doing that need to

plug it in and you’re doing that need to know how it works when someone has coded

know how it works when someone has coded

know how it works when someone has coded up for you which is what I do so this is

up for you which is what I do so this is

up for you which is what I do so this is a linear operation for 2d images so once

a linear operation for 2d images so once

a linear operation for 2d images so once you have a multi-layer perceptron you

you have a multi-layer perceptron you

you have a multi-layer perceptron you have a mapping from every input to every

have a mapping from every input to every

have a mapping from every input to every output but in the case of images your

output but in the case of images your

output but in the case of images your inputs are structured so you have like

inputs are structured so you have like

inputs are structured so you have like this spatial relationship between your

this spatial relationship between your

this spatial relationship between your inputs and if you have a mapping from

inputs and if you have a mapping from

inputs and if you have a mapping from every input every output you kind of

every input every output you kind of

every input every output you kind of throw away the spatial relationship so

throw away the spatial relationship so

throw away the spatial relationship so the idea would be what if rather than

the idea would be what if rather than

the idea would be what if rather than having a connection from every input to

having a connection from every input to

having a connection from every input to every output what if every output the

every output what if every output the

every output what if every output the output look like an image as well and

output look like an image as well and

output look like an image as well and every output was only locally connected

every output was only locally connected

every output was only locally connected to the things that corresponds to so

to the things that corresponds to so

to the things that corresponds to so that is in sight number one so local

that is in sight number one so local

that is in sight number one so local connections and inside number two is

connections and inside number two is

connections and inside number two is every output is

every output is

every output is a local function of its input what if

a local function of its input what if

a local function of its input what if instead of having every output be its

instead of having every output be its

instead of having every output be its own function which would be the general

own function which would be the general

own function which would be the general case what if every output was the same

case what if every output was the same

case what if every output was the same function of its input so what this then

function of its input so what this then

function of its input so what this then becomes is equivalent to like a

becomes is equivalent to like a

becomes is equivalent to like a well-known function in computer vision

well-known function in computer vision

well-known function in computer vision which is a convolution which is you have

which is a convolution which is you have

which is a convolution which is you have a kernel which is you can think of it

a kernel which is you can think of it

a kernel which is you can think of it like a local weight matrix so it’s

like a local weight matrix so it’s

like a local weight matrix so it’s represented yeah oh cool my mouse is

represented yeah oh cool my mouse is

represented yeah oh cool my mouse is here so it’s often represented as the

here so it’s often represented as the

here so it’s often represented as the square in an image like thing which

square in an image like thing which

square in an image like thing which means that they’re capturing that local

means that they’re capturing that local

means that they’re capturing that local input you just do a matrix multiply

input you just do a matrix multiply

input you just do a matrix multiply between all of the weights of the kernel

between all of the weights of the kernel

between all of the weights of the kernel which would be something like this you

which would be something like this you

which would be something like this you do that multiply for everything the

do that multiply for everything the

do that multiply for everything the local region you sum up the results so

local region you sum up the results so

local region you sum up the results so this is just a dot product and then you

this is just a dot product and then you

this is just a dot product and then you do that at every single location at the

do that at every single location at the

do that at every single location at the input so it’s kind of like tiling your

input so it’s kind of like tiling your

input so it’s kind of like tiling your input with the same function or it can

input with the same function or it can

input with the same function or it can be interpreted extracting the same

be interpreted extracting the same

be interpreted extracting the same features at every location which is the

features at every location which is the

features at every location which is the more common way to interpret it this is

more common way to interpret it this is

more common way to interpret it this is it’s very powerful it’s very parameter

it’s very powerful it’s very parameter

it’s very powerful it’s very parameter efficient because you have a lot of

efficient because you have a lot of

efficient because you have a lot of weight sharing between the parameters

weight sharing between the parameters

weight sharing between the parameters and you can end up having much larger

and you can end up having much larger

and you can end up having much larger outputs then you can have with a normal

outputs then you can have with a normal

outputs then you can have with a normal matrix multiplication and you also don’t

matrix multiplication and you also don’t

matrix multiplication and you also don’t lose spatial information which is a very

lose spatial information which is a very

lose spatial information which is a very important structure of images so these

important structure of images so these

important structure of images so these are some really nice properties and as a

are some really nice properties and as a

are some really nice properties and as a side effect you might think that this

side effect you might think that this

side effect you might think that this thing is really complicated how do I

thing is really complicated how do I

thing is really complicated how do I take a gradient of it because I maybe

take a gradient of it because I maybe

take a gradient of it because I maybe the whole thing is kind of complicated

the whole thing is kind of complicated

the whole thing is kind of complicated but this is actually equivalent to a

but this is actually equivalent to a

but this is actually equivalent to a very constrained matrix multiplication

very constrained matrix multiplication

very constrained matrix multiplication so if you take your input image and you

so if you take your input image and you

so if you take your input image and you unroll it because with a matrix moulton

unroll it because with a matrix moulton

unroll it because with a matrix moulton you lose that spatial structure and you

you lose that spatial structure and you

you lose that spatial structure and you unroll your input you basically have a

unroll your input you basically have a

unroll your input you basically have a few connection like every input every so

few connection like every input every so

few connection like every input every so every output is connected to maybe like

every output is connected to maybe like

every output is connected to maybe like nine of your inputs and that just

nine of your inputs and that just

nine of your inputs and that just becomes equivalent to the really like

becomes equivalent to the really like

becomes equivalent to the really like all of the diagram with lots of arrows

all of the diagram with lots of arrows

all of the diagram with lots of arrows but most of the arrows being zero or

but most of the arrows being zero or

but most of the arrows being zero or missing so this is still completely

missing so this is still completely

missing so this is still completely differentiable and still fits very

differentiable and still fits very

differentiable and still fits very nicely into this framework that you can

nicely into this framework that you can

nicely into this framework that you can plug in with all the other

plug in with all the other

plug in with all the other nonlinearities cool it’s going to get a

nonlinearities cool it’s going to get a

nonlinearities cool it’s going to get a little bit harder

little bit harder

little bit harder another very fundamental building block

another very fundamental building block

another very fundamental building block is called a recurrent neural network I

is called a recurrent neural network I

is called a recurrent neural network I don’t know why the building block is

don’t know why the building block is

don’t know why the building block is called a network when everything else is

called a network when everything else is

called a network when everything else is called layer but that’s just kind of

called layer but that’s just kind of

called layer but that’s just kind of convention and this is solving a a

convention and this is solving a a

convention and this is solving a a problem that is basically has not been

problem that is basically has not been

problem that is basically has not been solved in machine learning before which

solved in machine learning before which

solved in machine learning before which is we want functions to take to take in

is we want functions to take to take in

is we want functions to take to take in variable size input but they can only

variable size input but they can only

variable size input but they can only take in fixed size input and this is

take in fixed size input and this is

take in fixed size input and this is becomes a problem when you’re a function

becomes a problem when you’re a function

becomes a problem when you’re a function is parametric like a fully connected

is parametric like a fully connected

is parametric like a fully connected layer is because if you want a

layer is because if you want a

layer is because if you want a connection from every input to every

connection from every input to every

connection from every input to every output but your input size changes that

output but your input size changes that

output but your input size changes that means you’re the number of weights you

means you’re the number of weights you

means you’re the number of weights you have changes and that means that if you

have changes and that means that if you

have changes and that means that if you get a longer example at the inference

get a longer example at the inference

get a longer example at the inference time you now don’t know what to do with

time you now don’t know what to do with

time you now don’t know what to do with it and this also might be inefficient

it and this also might be inefficient

it and this also might be inefficient because you might have like really

because you might have like really

because you might have like really really big a really large number of

really big a really large number of

really big a really large number of inputs and you might not need all of the

inputs and you might not need all of the

inputs and you might not need all of the power of having like every connection

power of having like every connection

power of having like every connection there so a recurrent neural network is a

there so a recurrent neural network is a

there so a recurrent neural network is a way to solve this problem and the

way to solve this problem and the

way to solve this problem and the solution to this problem is recursion so

solution to this problem is recursion so

solution to this problem is recursion so what you have is an initial state which

what you have is an initial state which

what you have is an initial state which would just be let’s just call it h in

would just be let’s just call it h in

would just be let’s just call it h in this example and you have a bunch of

this example and you have a bunch of

this example and you have a bunch of these inputs X and there’s a variable

these inputs X and there’s a variable

these inputs X and there’s a variable number of them so you don’t really know

number of them so you don’t really know

number of them so you don’t really know what like this capital T is and you can

what like this capital T is and you can

what like this capital T is and you can make a function that takes in a fixed

make a function that takes in a fixed

make a function that takes in a fixed size and because each X is fixed size

size and because each X is fixed size

size and because each X is fixed size you can make that function have taken

you can make that function have taken

you can make that function have taken both h + X and now you can recurse

both h + X and now you can recurse

both h + X and now you can recurse through this list by saying h of t

through this list by saying h of t

through this list by saying h of t equals the function of the previous

equals the function of the previous

equals the function of the previous state sorry the new state is a function

state sorry the new state is a function

state sorry the new state is a function of the previous state and the current

of the previous state and the current

of the previous state and the current input and then you just return the final

input and then you just return the final

input and then you just return the final one and what this allows you to do is it

one and what this allows you to do is it

one and what this allows you to do is it allows you to with a fixed size input

allows you to with a fixed size input

allows you to with a fixed size input you can have it operate on sorry with a

you can have it operate on sorry with a

you can have it operate on sorry with a fixed function that takes fixed size

fixed function that takes fixed size

fixed function that takes fixed size input you can now turn it into a

input you can now turn it into a

input you can now turn it into a function that takes in a variable sized

function that takes in a variable sized

function that takes in a variable sized input by applying that function of

input by applying that function of

input by applying that function of variable number of times this is not the

variable number of times this is not the

variable number of times this is not the this is like a pretty obvious insight

this is like a pretty obvious insight

this is like a pretty obvious insight and you could do that with any kind of

and you could do that with any kind of

and you could do that with any kind of machine learning algorithm you could

machine learning algorithm you could

machine learning algorithm you could like apply a random forest like an

like apply a random forest like an

like apply a random forest like an arbitrary number of times but the cool

arbitrary number of times but the cool

arbitrary number of times but the cool part about this is that because this

part about this is that because this

part about this is that because this function is differentiable this

function is differentiable this

function is differentiable this recursive function is also

recursive function is also

recursive function is also differentiable so you can take the

differentiable so you can take the

differentiable so you can take the derivatives of each of the inputs you

derivatives of each of the inputs you

derivatives of each of the inputs you can take even take the derivative of the

can take even take the derivative of the

can take even take the derivative of the weight matrices you use at each step you

weight matrices you use at each step you

weight matrices you use at each step you can use that f reach up and you get a

can use that f reach up and you get a

can use that f reach up and you get a diagram

diagram

diagram it looks kind of like this now you can

it looks kind of like this now you can

it looks kind of like this now you can think of it as applying an FC layer for

think of it as applying an FC layer for

think of it as applying an FC layer for each input that takes the input and the

each input that takes the input and the

each input that takes the input and the state so far and this diagram might not

state so far and this diagram might not

state so far and this diagram might not be very clear but there are many

be very clear but there are many

be very clear but there are many different diagrams for RN ends and

different diagrams for RN ends and

different diagrams for RN ends and they’re all equally confusing if you’re

they’re all equally confusing if you’re

they’re all equally confusing if you’re unfamiliar with them so this is kind of

unfamiliar with them so this is kind of

unfamiliar with them so this is kind of the one on the left is my favorite one

the one on the left is my favorite one

the one on the left is my favorite one because you can kind of think of it as a

because you can kind of think of it as a

because you can kind of think of it as a stateful function except you the state

stateful function except you the state

stateful function except you the state only lasts for the duration of your

only lasts for the duration of your

only lasts for the duration of your input but the unrolled version is the

input but the unrolled version is the

input but the unrolled version is the version you use if you’re taking

version you use if you’re taking

version you use if you’re taking gradients so this is equivalent just

gradients so this is equivalent just

gradients so this is equivalent just passing the gradients through this very

passing the gradients through this very

passing the gradients through this very long graph a last complicated slide long

long graph a last complicated slide long

long graph a last complicated slide long short term memory units this puts me in

short term memory units this puts me in

short term memory units this puts me in a really hard position because I can’t

a really hard position because I can’t

a really hard position because I can’t not talk about them because they are so

not talk about them because they are so

not talk about them because they are so big it is but they’re also extremely

big it is but they’re also extremely

big it is but they’re also extremely complicated and there they take more

complicated and there they take more

complicated and there they take more building blocks and I’ve UNIX even

building blocks and I’ve UNIX even

building blocks and I’ve UNIX even explain but there is this great blog

explain but there is this great blog

explain but there is this great blog post I think slides will be published so

post I think slides will be published so

post I think slides will be published so you don’t have to worry about that this

you don’t have to worry about that this

you don’t have to worry about that this great great blog post tries to explain

great great blog post tries to explain

great great blog post tries to explain it but I’m going to try to give like a

it but I’m going to try to give like a

it but I’m going to try to give like a high-level intuition of them just so

high-level intuition of them just so

high-level intuition of them just so like even higher than what I’ve said so

like even higher than what I’ve said so

like even higher than what I’ve said so far um just so that you can kind of

far um just so that you can kind of

far um just so that you can kind of understand where where it’s coming from

understand where where it’s coming from

understand where where it’s coming from when I talk about these things being

when I talk about these things being

when I talk about these things being used and the idea would be its kind of

used and the idea would be its kind of

used and the idea would be its kind of like an RNN and in practice no one uses

like an RNN and in practice no one uses

like an RNN and in practice no one uses the RNN that I’ve just described it’s a

the RNN that I’ve just described it’s a

the RNN that I’ve just described it’s a very simple function and there’s much

very simple function and there’s much

very simple function and there’s much more complicated versions it’s an RN n

more complicated versions it’s an RN n

more complicated versions it’s an RN n where the function is just really

where the function is just really

where the function is just really complicated so this entire thing here is

complicated so this entire thing here is

complicated so this entire thing here is a representation of that function I’m

a representation of that function I’m

a representation of that function I’m not going to get into the details of it

not going to get into the details of it

not going to get into the details of it but it involves a lot of different

but it involves a lot of different

but it involves a lot of different mechanisms in order to make optimization

mechanisms in order to make optimization

mechanisms in order to make optimization easier and the idea is that if you’d

easier and the idea is that if you’d

easier and the idea is that if you’d apply if you designed this function well

apply if you designed this function well

apply if you designed this function well the function is applied at each time

the function is applied at each time

the function is applied at each time step it can make the problem much much

step it can make the problem much much

step it can make the problem much much easier to optimize and you can have like

easier to optimize and you can have like

easier to optimize and you can have like a much much more powerful function and

a much much more powerful function and

a much much more powerful function and the key is that by having a path which

the key is that by having a path which

the key is that by having a path which is relatively simple so this is what

is relatively simple so this is what

is relatively simple so this is what represents with the top path where these

represents with the top path where these

represents with the top path where these variable operations being done to it it

variable operations being done to it it

variable operations being done to it it makes it easier to stack these things

makes it easier to stack these things

makes it easier to stack these things want back to back and that makes easier

want back to back and that makes easier

want back to back and that makes easier to learn long-term relationships between

to learn long-term relationships between

to learn long-term relationships between the functions

the functions

the functions whoo okay that was that was the

whoo okay that was that was the

whoo okay that was that was the complicated part you now know

complicated part you now know

complicated part you now know ninety-five percent of the building

ninety-five percent of the building

ninety-five percent of the building blocks that everyone uses for

blocks that everyone uses for

blocks that everyone uses for state-of-the-art deep learning with just

state-of-the-art deep learning with just

state-of-the-art deep learning with just these billing box you could probably do

these billing box you could probably do

these billing box you could probably do new state-of-the-art things on new

new state-of-the-art things on new

new state-of-the-art things on new domains so congratulations you ready for

domains so congratulations you ready for

domains so congratulations you ready for the next part um so in this part I want

the next part um so in this part I want

the next part um so in this part I want to talk about what D planning is really

to talk about what D planning is really

to talk about what D planning is really good at and what you should use it on

good at and what you should use it on

good at and what you should use it on the answer is a whole lot so I’m going

the answer is a whole lot so I’m going

the answer is a whole lot so I’m going to cover just the rough themes of where

to cover just the rough themes of where

to cover just the rough themes of where deep learning really shines but there’s

deep learning really shines but there’s

deep learning really shines but there’s just much much more to it which i think

just much much more to it which i think

just much much more to it which i think is part of the awesomeness because it’s

is part of the awesomeness because it’s

is part of the awesomeness because it’s all falls under this extremely simple

all falls under this extremely simple

all falls under this extremely simple framework that I’ve just described I

framework that I’ve just described I

framework that I’ve just described I don’t think that you could like describe

don’t think that you could like describe

don’t think that you could like describe any framework as simple as what I’ve

any framework as simple as what I’ve

any framework as simple as what I’ve just done and have it solved this many

just done and have it solved this many

just done and have it solved this many complicated unsolved tasks before 2012

complicated unsolved tasks before 2012

complicated unsolved tasks before 2012 basically so convolutional neural

basically so convolutional neural

basically so convolutional neural networks this is a general architecture

networks this is a general architecture

networks this is a general architecture commonly referred to as CNN’s this

commonly referred to as CNN’s this

commonly referred to as CNN’s this actually means a network in this case

actually means a network in this case

actually means a network in this case and not just a layer the idea is that

and not just a layer the idea is that

and not just a layer the idea is that you take your image you apply

you take your image you apply

you take your image you apply convolution you apply your value your

convolution you apply your value your

convolution you apply your value your rectified linear unit you’re probably

rectified linear unit you’re probably

rectified linear unit you’re probably convolution you apply lu and you

convolution you apply lu and you

convolution you apply lu and you basically repeat this convo you until

basically repeat this convo you until

basically repeat this convo you until you solve all the problems in computer

you solve all the problems in computer

you solve all the problems in computer vision that isn’t quite true since at

vision that isn’t quite true since at

vision that isn’t quite true since at the end you need to tack on some sort of

the end you need to tack on some sort of

the end you need to tack on some sort of outfit layer and the other player

outfit layer and the other player

outfit layer and the other player depends and what kind of input you’re

depends and what kind of input you’re

depends and what kind of input you’re trying to solve the cannot like a really

trying to solve the cannot like a really

trying to solve the cannot like a really old school task is that you its face

old school task is that you its face

old school task is that you its face recognition trying to determine like

recognition trying to determine like

recognition trying to determine like whose face this is and this is a really

whose face this is and this is a really

whose face this is and this is a really cool task because makes the

cool task because makes the

cool task because makes the representations very visual and you can

representations very visual and you can

representations very visual and you can see how the network learns over time so

see how the network learns over time so

see how the network learns over time so at the first layer you when you start

at the first layer you when you start

at the first layer you when you start with the pixels at the first layer your

with the pixels at the first layer your

with the pixels at the first layer your filters tend to just match for edges and

filters tend to just match for edges and

filters tend to just match for edges and very simple things so convolutions can

very simple things so convolutions can

very simple things so convolutions can match edges and other very simple shapes

match edges and other very simple shapes

match edges and other very simple shapes and as you get deeper and deeper into

and as you get deeper and deeper into

and as you get deeper and deeper into network you learn more complicated

network you learn more complicated

network you learn more complicated functions of the input so after that you

functions of the input so after that you

functions of the input so after that you can start combining edges into corners

can start combining edges into corners

can start combining edges into corners or blobs so this is still extremely

or blobs so this is still extremely

or blobs so this is still extremely simple but after you get to another

simple but after you get to another

simple but after you get to another layer somehow like combining two corners

layer somehow like combining two corners

layer somehow like combining two corners the right way becomes kind of like an

the right way becomes kind of like an

the right way becomes kind of like an eye like shape or if you have like two

eye like shape or if you have like two

eye like shape or if you have like two corners in a blob that becomes more I

corners in a blob that becomes more I

corners in a blob that becomes more I like and you can build up from

like and you can build up from

like and you can build up from edges to corners to object parts and

edges to corners to object parts and

edges to corners to object parts and eventually into the objects you care

eventually into the objects you care

eventually into the objects you care about and as you get really really deep

about and as you get really really deep

about and as you get really really deep networks you actually have intermediates

networks you actually have intermediates

networks you actually have intermediates that are extremely semantic objects for

that are extremely semantic objects for

that are extremely semantic objects for example people have made a lot of tools

example people have made a lot of tools

example people have made a lot of tools for visualization of neural networks

for visualization of neural networks

for visualization of neural networks where they visualize what these in with

where they visualize what these in with

where they visualize what these in with the neural networks learn and you have

the neural networks learn and you have

the neural networks learn and you have for example if you have a neural network

for example if you have a neural network

for example if you have a neural network that doesn’t learn to classify books at

that doesn’t learn to classify books at

that doesn’t learn to classify books at all but lanes classified bookshelves

all but lanes classified bookshelves

all but lanes classified bookshelves some of the intermediate features

some of the intermediate features

some of the intermediate features actually become book classifiers which

actually become book classifiers which

actually become book classifiers which is really interesting like it can learn

is really interesting like it can learn

is really interesting like it can learn or I like a hierarchical representation

or I like a hierarchical representation

or I like a hierarchical representation of your input space such that these are

of your input space such that these are

of your input space such that these are useful things to combine together in

useful things to combine together in

useful things to combine together in order to make a robust classifier and by

order to make a robust classifier and by

order to make a robust classifier and by combining so maybe if you combine like

combining so maybe if you combine like

combining so maybe if you combine like three books together as well as a square

three books together as well as a square

three books together as well as a square this becomes a bookshelf so these are

this becomes a bookshelf so these are

this becomes a bookshelf so these are kind of like what the local operations

kind of like what the local operations

kind of like what the local operations do with each neural network and the

do with each neural network and the

do with each neural network and the beauty of it is that it’s all learned

beauty of it is that it’s all learned

beauty of it is that it’s all learned automatically for you don’t need to

automatically for you don’t need to

automatically for you don’t need to program like I have a book shelf

program like I have a book shelf

program like I have a book shelf bookshelf normally have books they have

bookshelf normally have books they have

bookshelf normally have books they have books sorry they have like square stuff

books sorry they have like square stuff

books sorry they have like square stuff maybe they’re often decide flowers this

maybe they’re often decide flowers this

maybe they’re often decide flowers this all can like happen in a data set

all can like happen in a data set

all can like happen in a data set automatically for you and these

automatically for you and these

automatically for you and these convolutional neural networks are

convolutional neural networks are

convolutional neural networks are absolutely amazing they just when I

absolutely amazing they just when I

absolutely amazing they just when I wasn’t joking when they save basically

wasn’t joking when they save basically

wasn’t joking when they save basically all of computer vision right now it all

all of computer vision right now it all

all of computer vision right now it all started with imagenet this was in 2012

started with imagenet this was in 2012

started with imagenet this was in 2012 this is when deep learning actually the

this is when deep learning actually the

this is when deep learning actually the entire hype train started where you had

entire hype train started where you had

entire hype train started where you had traditional machine learning solving

traditional machine learning solving

traditional machine learning solving this very hard very large computer

this very hard very large computer

this very hard very large computer vision data set and it was kind of

vision data set and it was kind of

vision data set and it was kind of plateauing over the years and all of a

plateauing over the years and all of a

plateauing over the years and all of a sudden deep learning comes in and it

sudden deep learning comes in and it

sudden deep learning comes in and it just blows everything away and ever

just blows everything away and ever

just blows everything away and ever since then everything has been

since then everything has been

since then everything has been everything in computer vision has been

everything in computer vision has been

everything in computer vision has been deep learning like nothing can even

deep learning like nothing can even

deep learning like nothing can even compare and recently we’ve been even

compare and recently we’ve been even

compare and recently we’ve been even being able to get superhuman results

being able to get superhuman results

being able to get superhuman results which is pretty impressive because

which is pretty impressive because

which is pretty impressive because humans are pretty good at seeing things

humans are pretty good at seeing things

humans are pretty good at seeing things it’s kind of what we’ve evolved to do

it’s kind of what we’ve evolved to do

it’s kind of what we’ve evolved to do and the same architectures can do all

and the same architectures can do all

and the same architectures can do all sorts of really interesting structured

sorts of really interesting structured

sorts of really interesting structured tasks so using almost the same

tasks so using almost the same

tasks so using almost the same architecture you can use a concept to

architecture you can use a concept to

architecture you can use a concept to determine you know like you can break up

determine you know like you can break up

determine you know like you can break up your input space into a what’s called a

your input space into a what’s called a

your input space into a what’s called a semantic segmentation of like all of the

semantic segmentation of like all of the

semantic segmentation of like all of the relevant parts that you

relevant parts that you

relevant parts that you have and using basically the same

have and using basically the same

have and using basically the same architecture as well you can do crazy

architecture as well you can do crazy

architecture as well you can do crazy things like super resolution where you

things like super resolution where you

things like super resolution where you takin like a low-resolution image and

takin like a low-resolution image and

takin like a low-resolution image and make it you can fill in the details

make it you can fill in the details

make it you can fill in the details which is pretty is that’s a pretty not

which is pretty is that’s a pretty not

which is pretty is that’s a pretty not only is it incredible even though it

only is it incredible even though it

only is it incredible even though it sounds pretty easy it’s incredible that

sounds pretty easy it’s incredible that

sounds pretty easy it’s incredible that like that you can use the same

like that you can use the same

like that you can use the same architecture that takes an image and

architecture that takes an image and

architecture that takes an image and tells you whether or not there’s a dog

tells you whether or not there’s a dog

tells you whether or not there’s a dog in it to take an image and return like a

in it to take an image and return like a

in it to take an image and return like a new higher resolution image and this is

new higher resolution image and this is

new higher resolution image and this is basically the same library the same

basically the same library the same

basically the same library the same components it’s just very very

components it’s just very very

components it’s just very very composable and that’s really good

composable and that’s really good

composable and that’s really good awesome you can also use this to solve

awesome you can also use this to solve

awesome you can also use this to solve really hard medical tasks tasks that

really hard medical tasks tasks that

really hard medical tasks tasks that people could not solve before here we’re

people could not solve before here we’re

people could not solve before here we’re detecting classifying lung cancer in CT

detecting classifying lung cancer in CT

detecting classifying lung cancer in CT scans these are the kinds of things that

scans these are the kinds of things that

scans these are the kinds of things that I like to work on and it’s not only

I like to work on and it’s not only

I like to work on and it’s not only limited division there’s been a lot of

limited division there’s been a lot of

limited division there’s been a lot of work in language understanding so is

work in language understanding so is

work in language understanding so is something that deep learning is really

something that deep learning is really

something that deep learning is really good at this language modeling roughly

good at this language modeling roughly

good at this language modeling roughly this means how probable is a how much

this means how probable is a how much

this means how probable is a how much sense this statement-making a certain

sense this statement-making a certain

sense this statement-making a certain language so it might have to do with a

language so it might have to do with a

language so it might have to do with a question response how are you I’m fine

question response how are you I’m fine

question response how are you I’m fine it might have other things such as what

it might have other things such as what

it might have other things such as what would be a weird thing my laptop is

would be a weird thing my laptop is

would be a weird thing my laptop is squishy might be a very improbable

squishy might be a very improbable

squishy might be a very improbable sentence to say so a neural network

sentence to say so a neural network

sentence to say so a neural network could probably determine squishy is a

could probably determine squishy is a

could probably determine squishy is a very bad adjective for a laptop this is

very bad adjective for a laptop this is

very bad adjective for a laptop this is a very improbable sentence but if I said

a very improbable sentence but if I said

a very improbable sentence but if I said my laptop is hot that would probably be

my laptop is hot that would probably be

my laptop is hot that would probably be a much more likely sentence and this

a much more likely sentence and this

a much more likely sentence and this already has some human-like seal to it

already has some human-like seal to it

already has some human-like seal to it because language was designed for humans

because language was designed for humans

because language was designed for humans and being able to have like if you can

and being able to have like if you can

and being able to have like if you can do language understanding as in

do language understanding as in

do language understanding as in determining the probability of like any

determining the probability of like any

determining the probability of like any sentence given a context you can and if

sentence given a context you can and if

sentence given a context you can and if you do this perfectly you can solve

you do this perfectly you can solve

you do this perfectly you can solve basically any task and this is a it’s

basically any task and this is a it’s

basically any task and this is a it’s really interesting domain where it’s

really interesting domain where it’s

really interesting domain where it’s being applied because previous if you

being applied because previous if you

being applied because previous if you look at what how language understanding

look at what how language understanding

look at what how language understanding was done before deep learning was around

was done before deep learning was around

was done before deep learning was around it was just incredibly simplistic tons

it was just incredibly simplistic tons

it was just incredibly simplistic tons and tons of rules no robustness two data

and tons of rules no robustness two data

and tons of rules no robustness two data sets you’d have to make custom rules for

sets you’d have to make custom rules for

sets you’d have to make custom rules for every language and now you could you can

every language and now you could you can

every language and now you could you can use the same tricks for English as you

use the same tricks for English as you

use the same tricks for English as you can

can

can for Chinese characters as you can for

for Chinese characters as you can for

for Chinese characters as you can for byte code so that is just pretty

byte code so that is just pretty

byte code so that is just pretty incredible they’ve obviously been much

incredible they’ve obviously been much

incredible they’ve obviously been much more complicated tasks a pretty popular

more complicated tasks a pretty popular

more complicated tasks a pretty popular use for machine for deep learning that

use for machine for deep learning that

use for machine for deep learning that this people are really putting a lot of

this people are really putting a lot of

this people are really putting a lot of effort in is aunt went language

effort in is aunt went language

effort in is aunt went language understanding from scratch so the idea

understanding from scratch so the idea

understanding from scratch so the idea is you use an RN to compress a sentence

is you use an RN to compress a sentence

is you use an RN to compress a sentence in your source language into a vector

in your source language into a vector

in your source language into a vector like I described in the RNN section and

like I described in the RNN section and

like I described in the RNN section and then you use a different RNN to decode

then you use a different RNN to decode

then you use a different RNN to decode it into a target language and while it’s

it into a target language and while it’s

it into a target language and while it’s not surprising that you can design a

not surprising that you can design a

not surprising that you can design a neural network that plausibly can output

neural network that plausibly can output

neural network that plausibly can output this it is quite surprising that it

this it is quite surprising that it

this it is quite surprising that it works so well and you’ve been able to

works so well and you’ve been able to

works so well and you’ve been able to have neural networks that in the man in

have neural networks that in the man in

have neural networks that in the man in a span of a few grad student months

a span of a few grad student months

a span of a few grad student months match the performance of systems that

match the performance of systems that

match the performance of systems that people have spent decades engineering

people have spent decades engineering

people have spent decades engineering and / happened nowadays I think that

and / happened nowadays I think that

and / happened nowadays I think that deep learning systems are not into and

deep learning systems are not into and

deep learning systems are not into and deploying systems are not what’s used

deploying systems are not what’s used

deploying systems are not what’s used for this right now but they’re a very

for this right now but they’re a very

for this right now but they’re a very important component so people still use

important component so people still use

important component so people still use a bit of hard coded stuff but it’s only

a bit of hard coded stuff but it’s only

a bit of hard coded stuff but it’s only a matter of time and the beauty is that

a matter of time and the beauty is that

a matter of time and the beauty is that but if we have a new task or a new

but if we have a new task or a new

but if we have a new task or a new language now it can just automatically

language now it can just automatically

language now it can just automatically work like what if we you know we find

work like what if we you know we find

work like what if we you know we find out some lost language from a thousand

out some lost language from a thousand

out some lost language from a thousand years ago and we have like a good amount

years ago and we have like a good amount

years ago and we have like a good amount of their texts can we actually learn how

of their texts can we actually learn how

of their texts can we actually learn how to translate it or understand it without

to translate it or understand it without

to translate it or understand it without any knowledge of this and it seems like

any knowledge of this and it seems like

any knowledge of this and it seems like purely from data we can and that’s

purely from data we can and that’s

purely from data we can and that’s really cool we don’t need an

really cool we don’t need an

really cool we don’t need an understanding of something in order to

understanding of something in order to

understanding of something in order to we don’t need a understanding prior to

we don’t need a understanding prior to

we don’t need a understanding prior to applying our machine learning models in

applying our machine learning models in

applying our machine learning models in order to have an understanding

order to have an understanding

order to have an understanding afterwards and that is just really

afterwards and that is just really

afterwards and that is just really really awesome I’ve actually been

really awesome I’ve actually been

really awesome I’ve actually been chatting with the people at SETI the

chatting with the people at SETI the

chatting with the people at SETI the search for extraterrestrial intelligence

search for extraterrestrial intelligence

search for extraterrestrial intelligence and one of the tasks that they’re doing

and one of the tasks that they’re doing

and one of the tasks that they’re doing is trying to understand dolphins the

is trying to understand dolphins the

is trying to understand dolphins the rationale is that if we can dolphins of

rationale is that if we can dolphins of

rationale is that if we can dolphins of language aliens might have language if

language aliens might have language if

language aliens might have language if we if we see alien communication we

we if we see alien communication we

we if we see alien communication we probably won’t understand it

probably won’t understand it

probably won’t understand it perhaps we can use dolphins to the proxy

perhaps we can use dolphins to the proxy

perhaps we can use dolphins to the proxy for aliens to try and understand them so

for aliens to try and understand them so

for aliens to try and understand them so there’s some really cool tasks that are

there’s some really cool tasks that are

there’s some really cool tasks that are happening there it’s not limited to that

happening there it’s not limited to that

happening there it’s not limited to that there’s some really cool things being

there’s some really cool things being

there’s some really cool things being done with art in deep learning actually

done with art in deep learning actually

done with art in deep learning actually I think that companies have started up

I think that companies have started up

I think that companies have started up that their entire business model is

that their entire business model is

that their entire business model is creating awesome deep learning art and

creating awesome deep learning art and

creating awesome deep learning art and they seem to be doing well from what

they seem to be doing well from what

they seem to be doing well from what I’ve heard in this case this is a

I’ve heard in this case this is a

I’ve heard in this case this is a hallucination purely from a conf lap

hallucination purely from a conf lap

hallucination purely from a conf lap trained to do image classification so an

trained to do image classification so an

trained to do image classification so an image that continent you know something

image that continent you know something

image that continent you know something that takes an image tells you like what

that takes an image tells you like what

that takes an image tells you like what breed of dog it is with objects or in it

breed of dog it is with objects or in it

breed of dog it is with objects or in it you can use it with a few tricks to

you can use it with a few tricks to

you can use it with a few tricks to create this kind of crazy art and this

create this kind of crazy art and this

create this kind of crazy art and this was a pretty big splash it’s a very

was a pretty big splash it’s a very

was a pretty big splash it’s a very unintuitive that a neural network that

unintuitive that a neural network that

unintuitive that a neural network that isn’t even made trained to make art

isn’t even made trained to make art

isn’t even made trained to make art actually can turn out making this kind

actually can turn out making this kind

actually can turn out making this kind of thing they’ve been more popular use

of thing they’ve been more popular use

of thing they’ve been more popular use cases such as style transfer the idea

cases such as style transfer the idea

cases such as style transfer the idea would be you can take a neural network

would be you can take a neural network

would be you can take a neural network still train for classification the idea

still train for classification the idea

still train for classification the idea would be classification has some priors

would be classification has some priors

would be classification has some priors about what images some priors about the

about what images some priors about the

about what images some priors about the natural world so the what you do then is

natural world so the what you do then is

natural world so the what you do then is you say i want my image to kind of match

you say i want my image to kind of match

you say i want my image to kind of match the distribution from a different image

the distribution from a different image

the distribution from a different image and then you get this kind of style

and then you get this kind of style

and then you get this kind of style transfer where you can mix together

transfer where you can mix together

transfer where you can mix together these kinds of components and while this

these kinds of components and while this

these kinds of components and while this is actually pretty ugly example there’s

is actually pretty ugly example there’s

is actually pretty ugly example there’s there’s some good ones i promise there’s

there’s some good ones i promise there’s

there’s some good ones i promise there’s some much more complicated things you

some much more complicated things you

some much more complicated things you can do it’s not just like taking two

can do it’s not just like taking two

can do it’s not just like taking two images together and merging them

images together and merging them

images together and merging them together you can do things like

together you can do things like

together you can do things like transforming a perhaps not super great

transforming a perhaps not super great

transforming a perhaps not super great drawing something that you could

drawing something that you could

drawing something that you could probably do in paint fairly quickly into

probably do in paint fairly quickly into

probably do in paint fairly quickly into something that looks like an artist did

something that looks like an artist did

something that looks like an artist did or something that’s really awesome and

or something that’s really awesome and

or something that’s really awesome and the idea would be that you can actually

the idea would be that you can actually

the idea would be that you can actually take these arbitrary doodles and convert

take these arbitrary doodles and convert

take these arbitrary doodles and convert them into these things that look like

them into these things that look like

them into these things that look like paintings and this kind of stuff is

paintings and this kind of stuff is

paintings and this kind of stuff is really awesome and I think it’s just the

really awesome and I think it’s just the

really awesome and I think it’s just the beginning of the kind of stuff that we

beginning of the kind of stuff that we

beginning of the kind of stuff that we can do with neural network art but after

can do with neural network art but after

can do with neural network art but after basically less than a year of work on

basically less than a year of work on

basically less than a year of work on this you’re making applications that are

this you’re making applications that are

this you’re making applications that are already very tangible very awesome very

already very tangible very awesome very

already very tangible very awesome very this is already something that if I made

this is already something that if I made

this is already something that if I made this I would probably hang up in my

this I would probably hang up in my

this I would probably hang up in my living room and this has only been one

living room and this has only been one

living room and this has only been one year of work imagine what happens that

year of work imagine what happens that

year of work imagine what happens that in 10 years I saved the best for last in

in 10 years I saved the best for last in

in 10 years I saved the best for last in terms of art we can combine our pictures

terms of art we can combine our pictures

terms of art we can combine our pictures without of Pokemon so clearly the future

without of Pokemon so clearly the future

without of Pokemon so clearly the future is here um this is one of my crowning

is here um this is one of my crowning

is here um this is one of my crowning achievements I think primarily because

achievements I think primarily because

achievements I think primarily because I’ve done this with like dozens of

I’ve done this with like dozens of

I’ve done this with like dozens of people and only mine turned out well but

people and only mine turned out well but

people and only mine turned out well but yeah the I think this is really awesome

yeah the I think this is really awesome

yeah the I think this is really awesome there’s like just so many things to do

there’s like just so many things to do

there’s like just so many things to do here and so few people are working it on

here and so few people are working it on

here and so few people are working it on it and that the sky is really the limit

it and that the sky is really the limit

it and that the sky is really the limit so it’s just really exciting what on the

so it’s just really exciting what on the

so it’s just really exciting what on the kinds of stuff that we can be created

kinds of stuff that we can be created

kinds of stuff that we can be created here there’s been other huge achievement

here there’s been other huge achievement

here there’s been other huge achievement game playing has been really big if

game playing has been really big if

game playing has been really big if anyone saw deep mines 500 million dollar

anyone saw deep mines 500 million dollar

anyone saw deep mines 500 million dollar acquisition in 2013 roughly the only

acquisition in 2013 roughly the only

acquisition in 2013 roughly the only paper that they had at the time was

paper that they had at the time was

paper that they had at the time was learning to play Atari games from pixels

learning to play Atari games from pixels

learning to play Atari games from pixels which is might be harder than it sounds

which is might be harder than it sounds

which is might be harder than it sounds because humans have a prior of how to

because humans have a prior of how to

because humans have a prior of how to play the game right like they have a

play the game right like they have a

play the game right like they have a prior that this is maybe a ball and

prior that this is maybe a ball and

prior that this is maybe a ball and that’s a paddle and I want to destroy

that’s a paddle and I want to destroy

that’s a paddle and I want to destroy certain things where they were prior

certain things where they were prior

certain things where they were prior that a key opens doors or that roads are

that a key opens doors or that roads are

that a key opens doors or that roads are something I want to stay on in a driving

something I want to stay on in a driving

something I want to stay on in a driving game but neural networks not given any

game but neural networks not given any

game but neural networks not given any of these priors it’s literally only

of these priors it’s literally only

of these priors it’s literally only given the pixels given these images it

given the pixels given these images it

given the pixels given these images it learns to play at what is on median a

learns to play at what is on median a

learns to play at what is on median a superhuman level and the techniques have

superhuman level and the techniques have

superhuman level and the techniques have been continuing to get better and this

been continuing to get better and this

been continuing to get better and this kind of stuff very similar tricks have

kind of stuff very similar tricks have

kind of stuff very similar tricks have been applied to the much more recent

been applied to the much more recent

been applied to the much more recent result of google deepmind alfa GO

result of google deepmind alfa GO

result of google deepmind alfa GO network which was not that huge of a

network which was not that huge of a

network which was not that huge of a deal in the West but if you ever talk to

deal in the West but if you ever talk to

deal in the West but if you ever talk to people from the more Eastern world you

people from the more Eastern world you

people from the more Eastern world you can talk to them about here are the

can talk to them about here are the

can talk to them about here are the achievements of deep learning you talk

achievements of deep learning you talk

achievements of deep learning you talk about smart inbox and they’re like oh

about smart inbox and they’re like oh

about smart inbox and they’re like oh that’s pretty okay you talk about image

that’s pretty okay you talk about image

that’s pretty okay you talk about image search yeah that’s pretty okay and then

search yeah that’s pretty okay and then

search yeah that’s pretty okay and then you talk tell them about like oh yeah

you talk tell them about like oh yeah

you talk tell them about like oh yeah it’d also be the world champion it go

it’d also be the world champion it go

it’d also be the world champion it go and they’re like whoa we’d beat plays go

and they’re like whoa we’d beat plays go

and they’re like whoa we’d beat plays go that’s amazing and people predicted that

that’s amazing and people predicted that

that’s amazing and people predicted that even beating human

even beating human

even beating human go would probably be depending on the

go would probably be depending on the

go would probably be depending on the expert 10 to 100 years off and it

expert 10 to 100 years off and it

expert 10 to 100 years off and it happened it just happened it’s already

happened it just happened it’s already

happened it just happened it’s already done it’s already that like humans have

done it’s already that like humans have

done it’s already that like humans have lost that go and as a side effect goez

lost that go and as a side effect goez

lost that go and as a side effect goez also caused more fear over AI safety

also caused more fear over AI safety

also caused more fear over AI safety than any other neural network I believe

than any other neural network I believe

than any other neural network I believe and this is probably a good

and this is probably a good

and this is probably a good representation of that I don’t know how

representation of that I don’t know how

representation of that I don’t know how yes let’s medium clear this is an XKCD

yes let’s medium clear this is an XKCD

yes let’s medium clear this is an XKCD of like how hard people used to think

of like how hard people used to think

of like how hard people used to think these games were and you can see go as

these games were and you can see go as

these games were and you can see go as basically being the last on the level of

basically being the last on the level of

basically being the last on the level of computer still lose to top humans and

computer still lose to top humans and

computer still lose to top humans and then not all of these are solved but

then not all of these are solved but

then not all of these are solved but that is just pretty incredible that

that is just pretty incredible that

that is just pretty incredible that that’s now solved people have been

that’s now solved people have been

that’s now solved people have been trying to ask like if it can do this

trying to ask like if it can do this

trying to ask like if it can do this what can’t it do because go is a task

what can’t it do because go is a task

what can’t it do because go is a task that requires a lot of reasoning and

that requires a lot of reasoning and

that requires a lot of reasoning and these kinds of achievements have been

these kinds of achievements have been

these kinds of achievements have been being transferred into the physical

being transferred into the physical

being transferred into the physical world as well this is a google has like

world as well this is a google has like

world as well this is a google has like a farm with like a bunch of robots that

a farm with like a bunch of robots that

a farm with like a bunch of robots that have learned on their own to grasp

have learned on their own to grasp

have learned on their own to grasp objects and basically robotics control

objects and basically robotics control

objects and basically robotics control is usually pretty hard especially when

is usually pretty hard especially when

is usually pretty hard especially when you’re trying to make it generalize and

you’re trying to make it generalize and

you’re trying to make it generalize and they’ve been able to do that just by you

they’ve been able to do that just by you

they’ve been able to do that just by you know throwing the robots into a dark

know throwing the robots into a dark

know throwing the robots into a dark warehouse having a train for a while

warehouse having a train for a while

warehouse having a train for a while designing a cute objective function and

designing a cute objective function and

designing a cute objective function and it just learned to grasp things better

it just learned to grasp things better

it just learned to grasp things better than their hand design controllers did

than their hand design controllers did

than their hand design controllers did which was pretty awesome and more

which was pretty awesome and more

which was pretty awesome and more recently actually I think there was a

recently actually I think there was a

recently actually I think there was a video like that came out last week of

video like that came out last week of

video like that came out last week of nvidia using just deep learning for

nvidia using just deep learning for

nvidia using just deep learning for self-driving cars so the idea was like

self-driving cars so the idea was like

self-driving cars so the idea was like with just a single camera in front of

with just a single camera in front of

with just a single camera in front of your car now your car can learn to drive

your car now your car can learn to drive

your car now your car can learn to drive can can drive itself from learning from

can can drive itself from learning from

can can drive itself from learning from how other people drove and this is a

how other people drove and this is a

how other people drove and this is a very interesting result because even

very interesting result because even

very interesting result because even google has been working for i don’t know

google has been working for i don’t know

google has been working for i don’t know if it might have been a decade already

if it might have been a decade already

if it might have been a decade already that they’ve been working on

that they’ve been working on

that they’ve been working on self-driving cars using you know lidar

self-driving cars using you know lidar

self-driving cars using you know lidar and slam and all of that stuff and

and slam and all of that stuff and

and slam and all of that stuff and Nvidia’s by some measures caught up to

Nvidia’s by some measures caught up to

Nvidia’s by some measures caught up to them entirely within i think it’s been

them entirely within i think it’s been

them entirely within i think it’s been less than a year since they’ve been

less than a year since they’ve been

less than a year since they’ve been investing in this so a lot of thing it

investing in this so a lot of thing it

investing in this so a lot of thing it seems to be changing a lot of things

seems to be changing a lot of things

seems to be changing a lot of things especially these kinds of perception

especially these kinds of perception

especially these kinds of perception tasks because research is moving so fast

tasks because research is moving so fast

tasks because research is moving so fast I also have to spend some time and

I also have to spend some time and

I also have to spend some time and things that are not yet practical but

things that are not yet practical but

things that are not yet practical but may very well soon be as a disclaimer

may very well soon be as a disclaimer

may very well soon be as a disclaimer I’ve been traveling this weekend so i’m

I’ve been traveling this weekend so i’m

I’ve been traveling this weekend so i’m not sure if some of these things belong

not sure if some of these things belong

not sure if some of these things belong in the already solved category

in the already solved category

in the already solved category generation is a big one there’s tons and

generation is a big one there’s tons and

generation is a big one there’s tons and tons of stuff happening generation so i

tons of stuff happening generation so i

tons of stuff happening generation so i definitely can’t give it justice there’s

definitely can’t give it justice there’s

definitely can’t give it justice there’s really cool stuff and like just

really cool stuff and like just

really cool stuff and like just generating images from scratch and

generating images from scratch and

generating images from scratch and generating arbitrary other domains from

generating arbitrary other domains from

generating arbitrary other domains from scratch images are just the most visual

scratch images are just the most visual

scratch images are just the most visual so i have them here but some of the

so i have them here but some of the

so i have them here but some of the coolest and perhaps most practical

coolest and perhaps most practical

coolest and perhaps most practical examples are conditional generation

examples are conditional generation

examples are conditional generation something i’m really excited about is

something i’m really excited about is

something i’m really excited about is image to text so the idea is you’ve

image to text so the idea is you’ve

image to text so the idea is you’ve taken an input image and the output is

taken an input image and the output is

taken an input image and the output is not like yes or no whether or not the

not like yes or no whether or not the

not like yes or no whether or not the dogs engine but you output a description

dogs engine but you output a description

dogs engine but you output a description of the image and that’s like an

of the image and that’s like an

of the image and that’s like an extremely human task it to be extremely

extremely human task it to be extremely

extremely human task it to be extremely useful if you do this task right it

useful if you do this task right it

useful if you do this task right it seems like this all the whole ton of

seems like this all the whole ton of

seems like this all the whole ton of possibilities I’m very excited about

possibilities I’m very excited about

possibilities I’m very excited about like taking in a medical image and like

like taking in a medical image and like

like taking in a medical image and like outputting like a pleat report of it

outputting like a pleat report of it

outputting like a pleat report of it which would be really awesome and some

which would be really awesome and some

which would be really awesome and some people that are really excited about

people that are really excited about

people that are really excited about this that has applications in the very

this that has applications in the very

this that has applications in the very short term is I don’t know the right way

short term is I don’t know the right way

short term is I don’t know the right way to say it but like the poor eyesight

to say it but like the poor eyesight

to say it but like the poor eyesight community so web pages nowadays have

community so web pages nowadays have

community so web pages nowadays have been pretty bad about stuff for people

been pretty bad about stuff for people

been pretty bad about stuff for people with disabilities and imagine if you had

with disabilities and imagine if you had

with disabilities and imagine if you had a neural network that can just describe

a neural network that can just describe

a neural network that can just describe an image for you describe a page for you

an image for you describe a page for you

an image for you describe a page for you tell you what’s on the page in a very

tell you what’s on the page in a very

tell you what’s on the page in a very semantic summarized way and there’s also

semantic summarized way and there’s also

semantic summarized way and there’s also a really cool opposite problem which is

a really cool opposite problem which is

a really cool opposite problem which is instead of taking an image and

instead of taking an image and

instead of taking an image and outputting a description you take in a

outputting a description you take in a

outputting a description you take in a description and not put an image which

description and not put an image which

description and not put an image which as a terrible idea artist I’m probably a

as a terrible idea artist I’m probably a

as a terrible idea artist I’m probably a bit more excited about because instead

bit more excited about because instead

bit more excited about because instead of like I can describe pictures I can’t

of like I can describe pictures I can’t

of like I can describe pictures I can’t really draw them and like these are much

really draw them and like these are much

really draw them and like these are much better already than I can draw but

better already than I can draw but

better already than I can draw but that’s probably a low bar but in this

that’s probably a low bar but in this

that’s probably a low bar but in this kind of network you actually take in

kind of network you actually take in

kind of network you actually take in like a sentences text and all of these

like a sentences text and all of these

like a sentences text and all of these images are generated from that network

images are generated from that network

images are generated from that network and that’s pretty incredible some of

and that’s pretty incredible some of

and that’s pretty incredible some of them are not super great but like these

them are not super great but like these

them are not super great but like these birds are actually there’s I believe

birds are actually there’s I believe

birds are actually there’s I believe they’re real the flower is not the

they’re real the flower is not the

they’re real the flower is not the purple ones

purple ones

purple ones but they actually see him close like if

but they actually see him close like if

but they actually see him close like if I if it was zoomed out enough I could

I if it was zoomed out enough I could

I if it was zoomed out enough I could see this is being pretty real and can

see this is being pretty real and can

see this is being pretty real and can you imagine in a future where instead of

you imagine in a future where instead of

you imagine in a future where instead of having to spend millions of dollars in a

having to spend millions of dollars in a

having to spend millions of dollars in a movie you just like type it up and then

movie you just like type it up and then

movie you just like type it up and then a neural network just generates the

a neural network just generates the

a neural network just generates the movie for you we’re quite away from that

movie for you we’re quite away from that

movie for you we’re quite away from that but perhaps not that far away especially

but perhaps not that far away especially

but perhaps not that far away especially like with some focused work and this

like with some focused work and this

like with some focused work and this could name like all sorts of like new

could name like all sorts of like new

could name like all sorts of like new forms of creativity that people don’t

forms of creativity that people don’t

forms of creativity that people don’t even know about while language

even know about while language

even know about while language understanding does quite well there is a

understanding does quite well there is a

understanding does quite well there is a deeper language understanding which we

deeper language understanding which we

deeper language understanding which we can kind of solvent oi tasks but it’s

can kind of solvent oi tasks but it’s

can kind of solvent oi tasks but it’s kind of harder for real task so QA so

kind of harder for real task so QA so

kind of harder for real task so QA so cute question answering that requires

cute question answering that requires

cute question answering that requires more complicated reasoning such as if

more complicated reasoning such as if

more complicated reasoning such as if you have like a story here and you ask

you have like a story here and you ask

you have like a story here and you ask something question complicated like

something question complicated like

something question complicated like where is the football then yes like go

where is the football then yes like go

where is the football then yes like go back in the store and figure out where

back in the store and figure out where

back in the store and figure out where that kind of thing happened that thing’s

that kind of thing happened that thing’s

that kind of thing happened that thing’s kind of complicated people are very good

kind of complicated people are very good

kind of complicated people are very good at this task models can solve these

at this task models can solve these

at this task models can solve these simple ones quite well but they can’t

simple ones quite well but they can’t

simple ones quite well but they can’t real do real question answering yet

real do real question answering yet

real do real question answering yet which is unfortunate but something

which is unfortunate but something

which is unfortunate but something people really care about and we’re not

people really care about and we’re not

people really care about and we’re not quite there yet but I also love how

quite there yet but I also love how

quite there yet but I also love how awesome this problem sounds is that like

awesome this problem sounds is that like

awesome this problem sounds is that like our machines that we have like basically

our machines that we have like basically

our machines that we have like basically spent no work on only automatically

spent no work on only automatically

spent no work on only automatically learn a shallow level of reasoning like

learn a shallow level of reasoning like

learn a shallow level of reasoning like that’s like such a first real problem

that’s like such a first real problem

that’s like such a first real problem while there’s like language

while there’s like language

while there’s like language understanding there’s also visual

understanding there’s also visual

understanding there’s also visual understanding that is um kind of

understanding that is um kind of

understanding that is um kind of unsolved there is a there’s some awesome

unsolved there is a there’s some awesome

unsolved there is a there’s some awesome data set that involves images and

data set that involves images and

data set that involves images and questions and the goal is to find an

questions and the goal is to find an

questions and the goal is to find an answer and the models there are models

answer and the models there are models

answer and the models there are models that can do pretty okay at this task but

that can do pretty okay at this task but

that can do pretty okay at this task but still very not good and still like face

still very not good and still like face

still very not good and still like face significantly worse than people do so

significantly worse than people do so

significantly worse than people do so this kind of thing is something that we

this kind of thing is something that we

this kind of thing is something that we can’t do just yet while game playing is

can’t do just yet while game playing is

can’t do just yet while game playing is solved harder game playing is still an

solved harder game playing is still an

solved harder game playing is still an open problem and you might think harder

open problem and you might think harder

open problem and you might think harder game playing my five-year-old brother

game playing my five-year-old brother

game playing my five-year-old brother can play minecraft and he almost

can play minecraft and he almost

can play minecraft and he almost certainly can’t beat the world champion

certainly can’t beat the world champion

certainly can’t beat the world champion at go

at go

at go the harder in this case means stateful

the harder in this case means stateful

the harder in this case means stateful it turns out that humans are really good

it turns out that humans are really good

it turns out that humans are really good at remembering something while neural

at remembering something while neural

at remembering something while neural networks have some difficulty with it so

networks have some difficulty with it so

networks have some difficulty with it so the neural networks that people have

the neural networks that people have

the neural networks that people have been using for playing games have been

been using for playing games have been

been using for playing games have been completely stateless so when you have a

completely stateless so when you have a

completely stateless so when you have a partially observed world like Minecraft

partially observed world like Minecraft

partially observed world like Minecraft where you like only have one direction

where you like only have one direction

where you like only have one direction that you’re looking at if you like look

that you’re looking at if you like look

that you’re looking at if you like look to the left it forgets what was on the

to the left it forgets what was on the

to the left it forgets what was on the right and this is something that people

right and this is something that people

right and this is something that people are still working to solve it’s the same

are still working to solve it’s the same

are still working to solve it’s the same thing with doom and the work has been

thing with doom and the work has been

thing with doom and the work has been done a bit but it’s far from being a

done a bit but it’s far from being a

done a bit but it’s far from being a solved problem and I do believe it

solved problem and I do believe it

solved problem and I do believe it they’re still subhuman at this task

they’re still subhuman at this task

they’re still subhuman at this task there’s some really cool stuff with

there’s some really cool stuff with

there’s some really cool stuff with automatically discovering hierarchical

automatically discovering hierarchical

automatically discovering hierarchical structure so in language the

structure so in language the

structure so in language the hierarchical structures may be clear to

hierarchical structures may be clear to

hierarchical structures may be clear to us because we use language like

us because we use language like

us because we use language like character a word limit of character your

character a word limit of character your

character a word limit of character your senses are made of words paragraphs are

senses are made of words paragraphs are

senses are made of words paragraphs are made of senses this is like this

made of senses this is like this

made of senses this is like this semantic hierarchy which makes it easy

semantic hierarchy which makes it easy

semantic hierarchy which makes it easy to break down a problem into simpler

to break down a problem into simpler

to break down a problem into simpler problems but this is not the case in

problems but this is not the case in

problems but this is not the case in many domains and there have been people

many domains and there have been people

many domains and there have been people who’ve designed neural networks that can

who’ve designed neural networks that can

who’ve designed neural networks that can actually automatically discover this

actually automatically discover this

actually automatically discover this hierarchy and this could be really

hierarchy and this could be really

hierarchy and this could be really useful for tasks where we don’t know how

useful for tasks where we don’t know how

useful for tasks where we don’t know how to interpret that so something I’ve

to interpret that so something I’ve

to interpret that so something I’ve worked a bit on is genomics and we

worked a bit on is genomics and we

worked a bit on is genomics and we really don’t even know how to read

really don’t even know how to read

really don’t even know how to read genomics right but if a neural network

genomics right but if a neural network

genomics right but if a neural network and automatically break it up into like

and automatically break it up into like

and automatically break it up into like this part goes together with that part

this part goes together with that part

this part goes together with that part you know there’s connections between

you know there’s connections between

you know there’s connections between here and here this could actually help a

here and here this could actually help a

here and here this could actually help a whole lot with all sorts of different

whole lot with all sorts of different

whole lot with all sorts of different kinds of scientific tasks just purely

kinds of scientific tasks just purely

kinds of scientific tasks just purely from data this is when it gets a little

from data this is when it gets a little

from data this is when it gets a little bit computery but these are things that

bit computery but these are things that

bit computery but these are things that I’m excited as a computer scientist

I’m excited as a computer scientist

I’m excited as a computer scientist there’s this model called neural turing

there’s this model called neural turing

there’s this model called neural turing machines which learns to use like a big

machines which learns to use like a big

machines which learns to use like a big memory buffer which is very cool so you

memory buffer which is very cool so you

memory buffer which is very cool so you can actually see like how the network

can actually see like how the network

can actually see like how the network reads and writes and reads in order to

reads and writes and reads in order to

reads and writes and reads in order to copy an input there’s ways to implement

copy an input there’s ways to implement

copy an input there’s ways to implement differentiable data structures so things

differentiable data structures so things

differentiable data structures so things that you thought where instead of having

that you thought where instead of having

that you thought where instead of having like this black box of like arbitrage

like this black box of like arbitrage

like this black box of like arbitrage activations with matrix multiplies you

activations with matrix multiplies you

activations with matrix multiplies you can actually plug in a data structure

can actually plug in a data structure

can actually plug in a data structure into a network and now your network can

into a network and now your network can

into a network and now your network can learn to do things like pushing and

learn to do things like pushing and

learn to do things like pushing and popping to a stack you know getting from

popping to a stack you know getting from

popping to a stack you know getting from both ends of a queue and all of these

both ends of a queue and all of these

both ends of a queue and all of these kinds of things and this could

kinds of things and this could

kinds of things and this could potentially enable all sorts of very

potentially enable all sorts of very

potentially enable all sorts of very cool use cases

cool use cases

cool use cases as learning to program people have done

as learning to program people have done

as learning to program people have done some work where you can create models

some work where you can create models

some work where you can create models that not only can like have simple input

that not only can like have simple input

that not only can like have simple input output mappings but as an intermediate

output mappings but as an intermediate

output mappings but as an intermediate in this input output mapping they can

in this input output mapping they can

in this input output mapping they can learn subroutines and play with pointers

learn subroutines and play with pointers

learn subroutines and play with pointers and this actually makes them a very very

and this actually makes them a very very

and this actually makes them a very very general computing like it potentially

general computing like it potentially

general computing like it potentially could do all the problems we care about

could do all the problems we care about

could do all the problems we care about if you can learn subroutines and play

if you can learn subroutines and play

if you can learn subroutines and play with pointers it’s like that could learn

with pointers it’s like that could learn

with pointers it’s like that could learn abstraction automatically for you and by

abstraction automatically for you and by

abstraction automatically for you and by putting these things together people

putting these things together people

putting these things together people have been able to do things like

have been able to do things like

have been able to do things like learning to actually execute code so the

learning to actually execute code so the

learning to actually execute code so the idea would be given like code is a

idea would be given like code is a

idea would be given like code is a string and targets for that code like

string and targets for that code like

string and targets for that code like what the output is you can actually

what the output is you can actually

what the output is you can actually learn an interpreter for that language

learn an interpreter for that language

learn an interpreter for that language and this is really exciting to me as a

and this is really exciting to me as a

and this is really exciting to me as a programming language guy like maybe I

programming language guy like maybe I

programming language guy like maybe I could design a programming language not

could design a programming language not

could design a programming language not by implementing it but by just showing a

by implementing it but by just showing a

by implementing it but by just showing a whole bunch of examples and the

whole bunch of examples and the

whole bunch of examples and the implementation automatically happening

implementation automatically happening

implementation automatically happening for me or perhaps I could just write the

for me or perhaps I could just write the

for me or perhaps I could just write the test cases for the language and a neural

test cases for the language and a neural

test cases for the language and a neural network can generate an efficient

network can generate an efficient

network can generate an efficient language for me and something else that

language for me and something else that

language for me and something else that is related to all of this stuff is this

is related to all of this stuff is this

is related to all of this stuff is this is really early but I think a lot of

is really early but I think a lot of

is really early but I think a lot of people are really excited about that

people are really excited about that

people are really excited about that which are neural module networks we’re

which are neural module networks we’re

which are neural module networks we’re instead of having a single architecture

instead of having a single architecture

instead of having a single architecture that you play with you can have

that you play with you can have

that you play with you can have architectures that are you can have a

architectures that are you can have a

architectures that are you can have a library of components and that F for

library of components and that F for

library of components and that F for every single example you make a custom

every single example you make a custom

every single example you make a custom architecture and you output it so for

architecture and you output it so for

architecture and you output it so for example if you have the question

example if you have the question

example if you have the question answering task and you have an image and

answering task and you have an image and

answering task and you have an image and you have a question where is the dog

you have a question where is the dog

you have a question where is the dog instead of using an arbitrary network

instead of using an arbitrary network

instead of using an arbitrary network that takes in the question and the

that takes in the question and the

that takes in the question and the answer you actually convert this

answer you actually convert this

answer you actually convert this question into a custom neural network

question into a custom neural network

question into a custom neural network which combines a dog module with a where

which combines a dog module with a where

which combines a dog module with a where module and outputs the answer and this

module and outputs the answer and this

module and outputs the answer and this kind of thing is very early but really

kind of thing is very early but really

kind of thing is very early but really promising so that that’s it for the

promising so that that’s it for the

promising so that that’s it for the future of it I hopefully you guys are

future of it I hopefully you guys are

future of it I hopefully you guys are pumped to deep learning some problems

pumped to deep learning some problems

pumped to deep learning some problems there’s a lot of software to help you

there’s a lot of software to help you

there’s a lot of software to help you I’m not going to talk about that right

I’m not going to talk about that right

I’m not going to talk about that right now because there’s a lot of tutorials

now because there’s a lot of tutorials

now because there’s a lot of tutorials out there and I think the high level

out there and I think the high level

out there and I think the high level understanding is much more important my

understanding is much more important my

understanding is much more important my recommendation is that if you want to

recommendation is that if you want to

recommendation is that if you want to customize a lot of things

customize a lot of things

customize a lot of things theano intensive floor the best because

theano intensive floor the best because

theano intensive floor the best because it allows you to get this automatic

it allows you to get this automatic

it allows you to get this automatic differentiation that I was talking about

differentiation that I was talking about

differentiation that I was talking about then you never have to worry about the

then you never have to worry about the

then you never have to worry about the backwards fast basically and if you want

backwards fast basically and if you want

backwards fast basically and if you want to just use like the modules that I

to just use like the modules that I

to just use like the modules that I talked about as well as a few others

talked about as well as a few others

talked about as well as a few others Karis can solve that and you can do a

Karis can solve that and you can do a

Karis can solve that and you can do a lot of these things with Karis if you

lot of these things with Karis if you

lot of these things with Karis if you want to do this there’s a lot more

want to do this there’s a lot more

want to do this there’s a lot more learning to do and the devil’s really in

learning to do and the devil’s really in

learning to do and the devil’s really in the details so I was super high level

the details so I was super high level

the details so I was super high level with lots of the stuff but all like

with lots of the stuff but all like

with lots of the stuff but all like there’s so many little things that you

there’s so many little things that you

there’s so many little things that you need to know such as how do you take how

need to know such as how do you take how

need to know such as how do you take how you perform the updates in a way that

you perform the updates in a way that

you perform the updates in a way that doesn’t cause your parameters to grow

doesn’t cause your parameters to grow

doesn’t cause your parameters to grow too large how do you initialize the

too large how do you initialize the

too large how do you initialize the parameter is to not be a trivial

parameter is to not be a trivial

parameter is to not be a trivial function how do you not over fit your

function how do you not over fit your

function how do you not over fit your training set so there’s a lot of

training set so there’s a lot of

training set so there’s a lot of resources out there my favorite one is

resources out there my favorite one is

resources out there my favorite one is this stanford class by Andre Carpathia

this stanford class by Andre Carpathia

this stanford class by Andre Carpathia cs2 31 n it is specifically incontinence

cs2 31 n it is specifically incontinence

cs2 31 n it is specifically incontinence but it’s constantly updated with

but it’s constantly updated with

but it’s constantly updated with state-of-the-art stuff and it’s

state-of-the-art stuff and it’s

state-of-the-art stuff and it’s generally very high quality so I think

generally very high quality so I think

generally very high quality so I think it’s very approachable for anyone like

it’s very approachable for anyone like

it’s very approachable for anyone like beginner to very advanced and if you

beginner to very advanced and if you

beginner to very advanced and if you want to do this you probably need GPU or

want to do this you probably need GPU or

want to do this you probably need GPU or 50 I think that’s it for time so sorry I

50 I think that’s it for time so sorry I

50 I think that’s it for time so sorry I was rushing at the end with any

was rushing at the end with any

was rushing at the end with any questions also I have these slides which

questions also I have these slides which

questions also I have these slides which is slide should I leave it on some

is slide should I leave it on some

is slide should I leave it on some questions here go so one is how can we

questions here go so one is how can we

questions here go so one is how can we avoid that autonomous cars pick up the

avoid that autonomous cars pick up the

avoid that autonomous cars pick up the human bad habits how can we avoid that

human bad habits how can we avoid that

human bad habits how can we avoid that autonomous cars pick up human bad habits

autonomous cars pick up human bad habits

autonomous cars pick up human bad habits that is a very interesting question it’s

that is a very interesting question it’s

that is a very interesting question it’s very dependent on how the cars are

very dependent on how the cars are

very dependent on how the cars are trained so if you train a car to copy

trained so if you train a car to copy

trained so if you train a car to copy the UN bad habits so if you train a car

the UN bad habits so if you train a car

the UN bad habits so if you train a car to copy humans which is by far the

to copy humans which is by far the

to copy humans which is by far the easiest thing to do it’s not the most

easiest thing to do it’s not the most

easiest thing to do it’s not the most correct thing to do because the most

correct thing to do because the most

correct thing to do because the most correct thing to do would be to learn

correct thing to do would be to learn

correct thing to do would be to learn how to drive optimally from scratch that

how to drive optimally from scratch that

how to drive optimally from scratch that unfortunately involves trial and error

unfortunately involves trial and error

unfortunately involves trial and error but you probably don’t want that in

but you probably don’t want that in

but you probably don’t want that in self-driving cars so we can skip that or

self-driving cars so we can skip that or

self-driving cars so we can skip that or hard coded rules so what content happen

hard coded rules so what content happen

hard coded rules so what content happen is if you’re training it to learn from

is if you’re training it to learn from

is if you’re training it to learn from humans it’ll mimic those humans but the

humans it’ll mimic those humans but the

humans it’ll mimic those humans but the idea is that if humans make mistakes you

idea is that if humans make mistakes you

idea is that if humans make mistakes you like let’s hope let’s say you want to

like let’s hope let’s say you want to

like let’s hope let’s say you want to make mistakes and let’s say humans don’t

make mistakes and let’s say humans don’t

make mistakes and let’s say humans don’t make consistent mistakes if they don’t

make consistent mistakes if they don’t

make consistent mistakes if they don’t make consistent mistakes and different

make consistent mistakes and different

make consistent mistakes and different humans make different kinds of mistakes

humans make different kinds of mistakes

humans make different kinds of mistakes are the same human-like only makes it

are the same human-like only makes it

are the same human-like only makes it makes mistakes sometimes and you have a

makes mistakes sometimes and you have a

makes mistakes sometimes and you have a neural network that neural network can

neural network that neural network can

neural network that neural network can predict the expectation of what the

predict the expectation of what the

predict the expectation of what the human can do rather than the worst-case

human can do rather than the worst-case

human can do rather than the worst-case scenario so if you’re kind you can think

scenario so if you’re kind you can think

scenario so if you’re kind you can think of humans as an in Samba in this case

of humans as an in Samba in this case

of humans as an in Samba in this case that if you’re predicting what the

that if you’re predicting what the

that if you’re predicting what the average of a bunch of humans can do you

average of a bunch of humans can do you

average of a bunch of humans can do you can drive better than a human can but if

can drive better than a human can but if

can drive better than a human can but if humans are consistently make if humans

humans are consistently make if humans

humans are consistently make if humans consistently make mistakes then there’s

consistently make mistakes then there’s

consistently make mistakes then there’s nothing you can do about that other than

nothing you can do about that other than

nothing you can do about that other than get more data I think we have time for

get more data I think we have time for

get more data I think we have time for one more what do you think about chat

one more what do you think about chat

one more what do you think about chat BOTS is it possible to build only with

BOTS is it possible to build only with

BOTS is it possible to build only with deep learning yes there’s actually many

deep learning yes there’s actually many

deep learning yes there’s actually many startups that are doing this right now

startups that are doing this right now

startups that are doing this right now so this seems to be the next wave in

so this seems to be the next wave in

so this seems to be the next wave in startups are like the hot thing right

startups are like the hot thing right

startups are like the hot thing right now where people are trying to use chat

now where people are trying to use chat

now where people are trying to use chat BOTS to do all sorts of things for like

BOTS to do all sorts of things for like

BOTS to do all sorts of things for like very specific domains it has some really

very specific domains it has some really

very specific domains it has some really nice properties from a business point of

nice properties from a business point of

nice properties from a business point of view because your goal is to replace

view because your goal is to replace

view because your goal is to replace humans who chap so it’s very easy to

humans who chap so it’s very easy to

humans who chap so it’s very easy to like replace them with an algorithm

like replace them with an algorithm

like replace them with an algorithm because the humans when you have if you

because the humans when you have if you

because the humans when you have if you have a bunch of them they generate a

have a bunch of them they generate a

have a bunch of them they generate a bunch of data so it is very plausible

bunch of data so it is very plausible

bunch of data so it is very plausible it’s still hard for chat BOTS it’s kind

it’s still hard for chat BOTS it’s kind

it’s still hard for chat BOTS it’s kind of like the game-playing problem where

of like the game-playing problem where

of like the game-playing problem where it’s hard for chat BOTS to have a memory

it’s hard for chat BOTS to have a memory

it’s hard for chat BOTS to have a memory of what you said so if you talk about

of what you said so if you talk about

of what you said so if you talk about like Oh try you know opening this menu

like Oh try you know opening this menu

like Oh try you know opening this menu and you know go here and here and here

and you know go here and here and here

and you know go here and here and here and you could like might have five

and you could like might have five

and you could like might have five sentences later the chat BOTS might say

sentences later the chat BOTS might say

sentences later the chat BOTS might say the same thing because the neuron lyrics

the same thing because the neuron lyrics

the same thing because the neuron lyrics i still have memory issues cool I think

i still have memory issues cool I think

i still have memory issues cool I think that’s it so please remember to vote and

that’s it so please remember to vote and

that’s it so please remember to vote and let’s yeah give a big applause to do

”

GOTO 2016 • Deep Learning – What is it and What It Can Do For You • Diogo Moitinho de Almeida

Be First to Comment

Leave a Reply Cancel reply