cool thank you I’m do go as a bit of
cool thank you I’m do go as a bit of background I have a very math and
background I have a very math and
background I have a very math and computer background which is very good
computer background which is very good
computer background which is very good for deep learning list of achievements
for deep learning list of achievements
for deep learning list of achievements on why you should listen to me is that I
on why you should listen to me is that I
on why you should listen to me is that I currently work at a super cool company
currently work at a super cool company
currently work at a super cool company that does uses deep learning to make
that does uses deep learning to make
that does uses deep learning to make faster and more accurate medical
faster and more accurate medical
faster and more accurate medical diagnosis and in the past lives I’ve one
diagnosis and in the past lives I’ve one
diagnosis and in the past lives I’ve one whole lot of like international math
whole lot of like international math
whole lot of like international math competitions and some programming
competitions and some programming
competitions and some programming competitions as well and today I’m going
competitions as well and today I’m going
competitions as well and today I’m going to chat about what deep learning is and
to chat about what deep learning is and
to chat about what deep learning is and what it can do for you feel free to ask
what it can do for you feel free to ask
what it can do for you feel free to ask questions at any time if I can ask that
questions at any time if I can ask that
questions at any time if I can ask that right they can ask that right Simon cool
right they can ask that right Simon cool
right they can ask that right Simon cool so yeah feel free to ask questions any
so yeah feel free to ask questions any
so yeah feel free to ask questions any time just seal them out especially if
time just seal them out especially if
time just seal them out especially if you think I’m lying to you so what is
you think I’m lying to you so what is
you think I’m lying to you so what is deep learning gonna start with a
deep learning gonna start with a
deep learning gonna start with a disclaimer deep learning is actually
disclaimer deep learning is actually
disclaimer deep learning is actually pretty complicated it’s hard to be very
pretty complicated it’s hard to be very
pretty complicated it’s hard to be very general about everything and be correct
general about everything and be correct
general about everything and be correct so when in doubt I’m going to favor
so when in doubt I’m going to favor
so when in doubt I’m going to favor generality so if you’re familiar with
generality so if you’re familiar with
generality so if you’re familiar with the appointing already it’s going to
the appointing already it’s going to
the appointing already it’s going to sound like I’m lying a lot but in
sound like I’m lying a lot but in
sound like I’m lying a lot but in reality like this is just to like kind
reality like this is just to like kind
reality like this is just to like kind of give the hot really high level of it
of give the hot really high level of it
of give the hot really high level of it and also whenever possible going to
and also whenever possible going to
and also whenever possible going to favor some of the shortcuts that might
favor some of the shortcuts that might
favor some of the shortcuts that might not be a hundred percent correct but
not be a hundred percent correct but
not be a hundred percent correct but should give the correct mental model of
should give the correct mental model of
should give the correct mental model of how these things work but if you have
how these things work but if you have
how these things work but if you have any questions feel free to ask so from
any questions feel free to ask so from
any questions feel free to ask so from super high level there’s a very lace a
super high level there’s a very lace a
super high level there’s a very lace a lot of like different levels of
lot of like different levels of
lot of like different levels of hierarchy here in the ecosystem there’s
hierarchy here in the ecosystem there’s
hierarchy here in the ecosystem there’s artificial intelligence which is a
artificial intelligence which is a
artificial intelligence which is a superset of like everything an example
superset of like everything an example
superset of like everything an example would be IBM Watson where lots of hand
would be IBM Watson where lots of hand
would be IBM Watson where lots of hand coded rules and uses extremely large
coded rules and uses extremely large
coded rules and uses extremely large amounts of expert manpower built to do a
amounts of expert manpower built to do a
amounts of expert manpower built to do a specific task there’s machine learning
specific task there’s machine learning
specific task there’s machine learning which is subset of that an example of
which is subset of that an example of
which is subset of that an example of this would be like Google ad click
this would be like Google ad click
this would be like Google ad click prediction and how you do this is rather
prediction and how you do this is rather
prediction and how you do this is rather than using tons and tons of hard coded
than using tons and tons of hard coded
than using tons and tons of hard coded rules you start using more examples to
rules you start using more examples to
rules you start using more examples to figure out how to combine some hand
figure out how to combine some hand
figure out how to combine some hand coded statistics to predict the
coded statistics to predict the
coded statistics to predict the probability of for example an ad click
probability of for example an ad click
probability of for example an ad click at a slightly deeper level you have
at a slightly deeper level you have
at a slightly deeper level you have representation learning which is
representation learning which is
representation learning which is sometimes seen as one layer deep
sometimes seen as one layer deep
sometimes seen as one layer deep learning so sometimes referred like
learning so sometimes referred like
learning so sometimes referred like these levels are called shallow learning
these levels are called shallow learning
these levels are called shallow learning if you’re trying to start a fight
if you’re trying to start a fight
if you’re trying to start a fight and an example of this would be netflix
and an example of this would be netflix
and an example of this would be netflix movie recommendation where the
movie recommendation where the
movie recommendation where the statistics of what you even know about
statistics of what you even know about
statistics of what you even know about each movie is learned from data but
each movie is learned from data but
each movie is learned from data but you’re still learning a simple
you’re still learning a simple
you’re still learning a simple combination of how these features go
combination of how these features go
combination of how these features go together and after a few levels you get
together and after a few levels you get
together and after a few levels you get into deep learning so this would be an
into deep learning so this would be an
into deep learning so this would be an example of figuring out diseases from
example of figuring out diseases from
example of figuring out diseases from images where instead of you know having
images where instead of you know having
images where instead of you know having a layer of manual statistics that are
a layer of manual statistics that are
a layer of manual statistics that are learned and then combining together you
learned and then combining together you
learned and then combining together you might learn all of these statistics at
might learn all of these statistics at
might learn all of these statistics at the same time in tens hundreds or even
the same time in tens hundreds or even
the same time in tens hundreds or even thousands of steps which is what some
thousands of steps which is what some
thousands of steps which is what some people use nowadays this is probably not
people use nowadays this is probably not
people use nowadays this is probably not a common view of what deep learning is
a common view of what deep learning is
a common view of what deep learning is but I think the easiest view of how to
but I think the easiest view of how to
but I think the easiest view of how to see it is deep line exists an interface
see it is deep line exists an interface
see it is deep line exists an interface and this interface has roughly two
and this interface has roughly two
and this interface has roughly two methods as the first method you have a
methods as the first method you have a
methods as the first method you have a forward pass and this is definitely the
forward pass and this is definitely the
forward pass and this is definitely the easy part given arbitrary input make
easy part given arbitrary input make
easy part given arbitrary input make arbitrary output anyone can do this part
arbitrary output anyone can do this part
arbitrary output anyone can do this part this is really easy the the trick that
this is really easy the the trick that
this is really easy the the trick that makes it work is the backwards pass so
makes it work is the backwards pass so
makes it work is the backwards pass so given a desired change in the output you
given a desired change in the output you
given a desired change in the output you want to be able to transform this into a
want to be able to transform this into a
want to be able to transform this into a desired change in the input and once you
desired change in the input and once you
desired change in the input and once you have these you can make arbitrarily
have these you can make arbitrarily
have these you can make arbitrarily complex things by chaining thing up
complex things by chaining thing up
complex things by chaining thing up changing them up into a directed acyclic
changing them up into a directed acyclic
changing them up into a directed acyclic graph and if this sounds too good to be
graph and if this sounds too good to be
graph and if this sounds too good to be true especially with how we design the
true especially with how we design the
true especially with how we design the forward pass because if you just say
forward pass because if you just say
forward pass because if you just say arbitrary input an arbitrary output of
arbitrary input an arbitrary output of
arbitrary input an arbitrary output of course you can do anything you want but
course you can do anything you want but
course you can do anything you want but the hard part is how you define that
the hard part is how you define that
the hard part is how you define that backwards pass because as you make your
backwards pass because as you make your
backwards pass because as you make your forward pass more and more complicated
forward pass more and more complicated
forward pass more and more complicated like if you have like some really crazy
like if you have like some really crazy
like if you have like some really crazy function it becomes hard to define how
function it becomes hard to define how
function it becomes hard to define how to map the inputs back into outputs so
to map the inputs back into outputs so
to map the inputs back into outputs so by keeping these things simple and
by keeping these things simple and
by keeping these things simple and combining them together it gives us this
combining them together it gives us this
combining them together it gives us this almost composable language of modules
almost composable language of modules
almost composable language of modules that allow us to do the things we want
that allow us to do the things we want
that allow us to do the things we want to do so once you have this interface
to do so once you have this interface
to do so once you have this interface you just can build up from this once you
you just can build up from this once you
you just can build up from this once you once you have that you could have a
once you have that you could have a
once you have that you could have a bunch of these modules that satisfy this
bunch of these modules that satisfy this
bunch of these modules that satisfy this interface as a side note a bunch of
interface as a side note a bunch of
interface as a side note a bunch of these modules will be parametric which
these modules will be parametric which
these modules will be parametric which means that they have parameters which
means that they have parameters which
means that they have parameters which roughly means that they’re stateful and
roughly means that they’re stateful and
roughly means that they’re stateful and they’re stateful means that once you
they’re stateful means that once you
they’re stateful means that once you have the state
have the state
have the state state changes and it’s this change in
state changes and it’s this change in
state changes and it’s this change in state that allows you to change this
state that allows you to change this
state that allows you to change this function from something that you just
function from something that you just
function from something that you just cobbled together to something that’s it
cobbled together to something that’s it
cobbled together to something that’s it gets closer and closer to what you want
gets closer and closer to what you want
gets closer and closer to what you want to do and once you have a frame rail
to do and once you have a frame rail
to do and once you have a frame rail language of what you want to do now you
language of what you want to do now you
language of what you want to do now you can start doing the tasks that you care
can start doing the tasks that you care
can start doing the tasks that you care about you and deepen you always define a
about you and deepen you always define a
about you and deepen you always define a loss or a cost depending on how you want
loss or a cost depending on how you want
loss or a cost depending on how you want to define this and this is something you
to define this and this is something you
to define this and this is something you want to minimize for reasons that I’d
want to minimize for reasons that I’d
want to minimize for reasons that I’d happily explain this has always be a
happily explain this has always be a
happily explain this has always be a scalar so it can’t be several costs the
scalar so it can’t be several costs the
scalar so it can’t be several costs the same time you have to squash it down
same time you have to squash it down
same time you have to squash it down into a single thing that you care about
into a single thing that you care about
into a single thing that you care about and once you squish this number like
and once you squish this number like
and once you squish this number like everything care about in the world into
everything care about in the world into
everything care about in the world into a single number now you can start using
a single number now you can start using
a single number now you can start using deep learning to optimize this you
deep learning to optimize this you
deep learning to optimize this you create an architecture which is the
create an architecture which is the
create an architecture which is the function that you want to do and this
function that you want to do and this
function that you want to do and this would be how you compose together these
would be how you compose together these
would be how you compose together these modules that I talked about so the way
modules that I talked about so the way
modules that I talked about so the way you connect them together changes the
you connect them together changes the
you connect them together changes the function that you have and the kind of
function that you have and the kind of
function that you have and the kind of representation power it has and that
representation power it has and that
representation power it has and that becomes the hard part after that you
becomes the hard part after that you
becomes the hard part after that you initialize the parameters and you train
initialize the parameters and you train
initialize the parameters and you train this architecture by repeatedly updating
this architecture by repeatedly updating
this architecture by repeatedly updating the parameters to minimize the cost so
the parameters to minimize the cost so
the parameters to minimize the cost so you go forward through the network to
you go forward through the network to
you go forward through the network to get the things you care about and you go
get the things you care about and you go
get the things you care about and you go backwards through the network to change
backwards through the network to change
backwards through the network to change the parameters to be slightly better for
the parameters to be slightly better for
the parameters to be slightly better for your costs and you repeat these many
your costs and you repeat these many
your costs and you repeat these many times until you get a function that
times until you get a function that
times until you get a function that you’re really really happy with and
you’re really really happy with and
you’re really really happy with and solves whatever problem you want and at
solves whatever problem you want and at
solves whatever problem you want and at the end you just use that function just
the end you just use that function just
the end you just use that function just the forward pass how to implement the
the forward pass how to implement the
the forward pass how to implement the backwards pass is in general we almost
backwards pass is in general we almost
backwards pass is in general we almost always use the chain rule this is really
always use the chain rule this is really
always use the chain rule this is really nice because it makes implementing the
nice because it makes implementing the
nice because it makes implementing the backwards pass easy how this works is if
backwards pass easy how this works is if
backwards pass easy how this works is if you have your output as sum function of
you have your output as sum function of
you have your output as sum function of some X and you have the partial
some X and you have the partial
some X and you have the partial derivative of your output dld f you can
derivative of your output dld f you can
derivative of your output dld f you can get dl DX by simply multiplying the
get dl DX by simply multiplying the
get dl DX by simply multiplying the partial derivative of DF DX and the nice
partial derivative of DF DX and the nice
partial derivative of DF DX and the nice part about this is that dl DF is gotten
part about this is that dl DF is gotten
part about this is that dl DF is gotten from the rest of your network and DF DX
from the rest of your network and DF DX
from the rest of your network and DF DX is gotten just from your module so this
is gotten just from your module so this
is gotten just from your module so this allows you to chain these things
allows you to chain these things
allows you to chain these things together in a way that only requires
together in a way that only requires
together in a way that only requires local information in order to get this
local information in order to get this
local information in order to get this backward pass it’s very nice there’s
backward pass it’s very nice there’s
backward pass it’s very nice there’s theoretical reasons of why this is a
theoretical reasons of why this is a
theoretical reasons of why this is a good way to do this and perhaps the best
good way to do this and perhaps the best
good way to do this and perhaps the best part about this is some frameworks make
part about this is some frameworks make
part about this is some frameworks make us completely automatic by the
us completely automatic by the
us completely automatic by the finding a forward pass using automatic
finding a forward pass using automatic
finding a forward pass using automatic differentiation you can figure out how
differentiation you can figure out how
differentiation you can figure out how to make a backward pass automatically so
to make a backward pass automatically so
to make a backward pass automatically so it becomes basically as easy as defining
it becomes basically as easy as defining
it becomes basically as easy as defining arbitrary functions as so you actually
arbitrary functions as so you actually
arbitrary functions as so you actually do get this benefit of just define
do get this benefit of just define
do get this benefit of just define arbitrary things return arbitrary things
arbitrary things return arbitrary things
arbitrary things return arbitrary things as long as all the operations you do are
as long as all the operations you do are
as long as all the operations you do are differentiable you can just make it work
differentiable you can just make it work
differentiable you can just make it work like magic and optimize it and this is
like magic and optimize it and this is
like magic and optimize it and this is literally how people do this in practice
literally how people do this in practice
literally how people do this in practice updating the parameters these are just
updating the parameters these are just
updating the parameters these are just minor details to like get an
minor details to like get an
minor details to like get an understanding of how this works is that
understanding of how this works is that
understanding of how this works is that once you have your existing parameters
once you have your existing parameters
once you have your existing parameters you get your gradient and you take a
you get your gradient and you take a
you get your gradient and you take a step in the opposite direction of the
step in the opposite direction of the
step in the opposite direction of the gradient and the partial derivatives
gradient and the partial derivatives
gradient and the partial derivatives tells us how to change the parameters to
tells us how to change the parameters to
tells us how to change the parameters to increase or decrease the cost that we
increase or decrease the cost that we
increase or decrease the cost that we care about an important word to note to
care about an important word to note to
care about an important word to note to know though is and that people always
know though is and that people always
know though is and that people always use I think kind of makes it more
use I think kind of makes it more
use I think kind of makes it more complicated it’s a big word is back
complicated it’s a big word is back
complicated it’s a big word is back propagation or back prop for short this
propagation or back prop for short this
propagation or back prop for short this is has a longer name called reverse mode
is has a longer name called reverse mode
is has a longer name called reverse mode automatic differentiation which sounds
automatic differentiation which sounds
automatic differentiation which sounds pretty complicated but this is just the
pretty complicated but this is just the
pretty complicated but this is just the chain rule plus dynamic programming I
chain rule plus dynamic programming I
chain rule plus dynamic programming I assume that I just talked about change
assume that I just talked about change
assume that I just talked about change were like some people are familiar with
were like some people are familiar with
were like some people are familiar with dynamic programming but this is just
dynamic programming but this is just
dynamic programming but this is just cashing and the idea would be when you
cashing and the idea would be when you
cashing and the idea would be when you have a computation graph this is a very
have a computation graph this is a very
have a computation graph this is a very simple computation graph y equals C
simple computation graph y equals C
simple computation graph y equals C times the C equals a plus B D equals B
times the C equals a plus B D equals B
times the C equals a plus B D equals B plus 1 the idea would be um you traverse
plus 1 the idea would be um you traverse
plus 1 the idea would be um you traverse the graph from the top to the bottom and
the graph from the top to the bottom and
the graph from the top to the bottom and by doing it from the top to the bottom
by doing it from the top to the bottom
by doing it from the top to the bottom instead of the bottom to the top you can
instead of the bottom to the top you can
instead of the bottom to the top you can cash the intermediate so that I used
cash the intermediate so that I used
cash the intermediate so that I used many times in the graph and by cashing
many times in the graph and by cashing
many times in the graph and by cashing these intermediates you get something
these intermediates you get something
these intermediates you get something that’s much more efficient than if you
that’s much more efficient than if you
that’s much more efficient than if you were going to do a naive solution and
were going to do a naive solution and
were going to do a naive solution and this allows you to get gradients that
this allows you to get gradients that
this allows you to get gradients that are computable in linear time in the
are computable in linear time in the
are computable in linear time in the size of your graph so you basically
size of your graph so you basically
size of your graph so you basically evaluate each node once and this is a
evaluate each node once and this is a
evaluate each node once and this is a really nice property this makes it all
really nice property this makes it all
really nice property this makes it all really efficient and that’s basically it
really efficient and that’s basically it
really efficient and that’s basically it for the basics from a high level deep
for the basics from a high level deep
for the basics from a high level deep learning is just composing optimized
learning is just composing optimized
learning is just composing optimized abul subcomponents optimize the Ville
abul subcomponents optimize the Ville
abul subcomponents optimize the Ville almost always means differentiable
almost always means differentiable
almost always means differentiable differentiable means that you can do
differentiable means that you can do
differentiable means that you can do backdrop backdrop is just the chain rule
backdrop backdrop is just the chain rule
backdrop backdrop is just the chain rule in dynamic programming when once you get
in dynamic programming when once you get
in dynamic programming when once you get to practical deep learning normally
to practical deep learning normally
to practical deep learning normally you have to combine this with gradient
you have to combine this with gradient
you have to combine this with gradient descent software and a data set but you
descent software and a data set but you
descent software and a data set but you care about and the space of software is
care about and the space of software is
care about and the space of software is there’s a very rich space of software
there’s a very rich space of software
there’s a very rich space of software that will talk a little bit about in the
that will talk a little bit about in the
that will talk a little bit about in the future but it this these things are
future but it this these things are
future but it this these things are solved for you so you can do deep
solved for you so you can do deep
solved for you so you can do deep learning without even knowing how to
learning without even knowing how to
learning without even knowing how to calculate the gradient yourself so while
calculate the gradient yourself so while
calculate the gradient yourself so while we can do arbitrarily complicated things
we can do arbitrarily complicated things
we can do arbitrarily complicated things there are a few standard modules that
there are a few standard modules that
there are a few standard modules that are the main workhorse of deep learning
are the main workhorse of deep learning
are the main workhorse of deep learning today and the goal of this section is
today and the goal of this section is
today and the goal of this section is going to be to get a high level
going to be to get a high level
going to be to get a high level understanding of each since all of them
understanding of each since all of them
understanding of each since all of them can be very incredibly nuanced but these
can be very incredibly nuanced but these
can be very incredibly nuanced but these standard modules will cover almost all
standard modules will cover almost all
standard modules will cover almost all of what’s happening in papers the
of what’s happening in papers the
of what’s happening in papers the simplest of them is perhaps the simplest
simplest of them is perhaps the simplest
simplest of them is perhaps the simplest is just matrix multiplication it has
is just matrix multiplication it has
is just matrix multiplication it has many names the fully connected layer
many names the fully connected layer
many names the fully connected layer sometimes shortened to FC sometimes
sometimes shortened to FC sometimes
sometimes shortened to FC sometimes called dense because you have lots of
called dense because you have lots of
called dense because you have lots of connections linear layer because it’s a
connections linear layer because it’s a
connections linear layer because it’s a linear transformation or a fine because
linear transformation or a fine because
linear transformation or a fine because sometimes there’s a bias and the matrix
sometimes there’s a bias and the matrix
sometimes there’s a bias and the matrix multiplication is basically every time
multiplication is basically every time
multiplication is basically every time you have a neural network diagram all of
you have a neural network diagram all of
you have a neural network diagram all of these arrows correspond to the matrix
these arrows correspond to the matrix
these arrows correspond to the matrix multiplication so when you have a
multiplication so when you have a
multiplication so when you have a diagram that looks complicated that’s
diagram that looks complicated that’s
diagram that looks complicated that’s from that’s from this kind of thing and
from that’s from this kind of thing and
from that’s from this kind of thing and you can interpret this as a weight from
you can interpret this as a weight from
you can interpret this as a weight from every input to every output so if you
every input to every output so if you
every input to every output so if you have em inputs and you have n outputs
have em inputs and you have n outputs
have em inputs and you have n outputs you have M by n way to the transform
you have M by n way to the transform
you have M by n way to the transform your inputs to outputs and its
your inputs to outputs and its
your inputs to outputs and its implementation is literally a matrix
implementation is literally a matrix
implementation is literally a matrix multiplication and W in this case it’s
multiplication and W in this case it’s
multiplication and W in this case it’s generally a parameter which means you
generally a parameter which means you
generally a parameter which means you learn the connections from inputs to
learn the connections from inputs to
learn the connections from inputs to outputs this on its own is not powerful
outputs this on its own is not powerful
outputs this on its own is not powerful enough so you need at least one more
enough so you need at least one more
enough so you need at least one more thing which is non linearity the
thing which is non linearity the
thing which is non linearity the original non-linearity is called a
original non-linearity is called a
original non-linearity is called a sigmoid it’s just this function it has
sigmoid it’s just this function it has
sigmoid it’s just this function it has the nice property that it Maps reals
the nice property that it Maps reals
the nice property that it Maps reals into the the space 0 1 and it can be
into the the space 0 1 and it can be
into the the space 0 1 and it can be interpreted as a probability but that’s
interpreted as a probability but that’s
interpreted as a probability but that’s not as important that’s just being
not as important that’s just being
not as important that’s just being nonlinear and the reason the
nonlinear and the reason the
nonlinear and the reason the non-linearity is important is if you
non-linearity is important is if you
non-linearity is important is if you have this kind of like neural network
have this kind of like neural network
have this kind of like neural network when you stack up the layers
when you stack up the layers
when you stack up the layers back-to-back if you had no non-linearity
back-to-back if you had no non-linearity
back-to-back if you had no non-linearity in the middle this would just be two
in the middle this would just be two
in the middle this would just be two matrix multiplies back-to-back and what
matrix multiplies back-to-back and what
matrix multiplies back-to-back and what would happen is you could just combine
would happen is you could just combine
would happen is you could just combine this into a single matrix multiply so if
this into a single matrix multiply so if
this into a single matrix multiply so if you have that 100 layer purely linear
you have that 100 layer purely linear
you have that 100 layer purely linear network of just matrix multiplications
network of just matrix multiplications
network of just matrix multiplications while this thing is pretty complicated
while this thing is pretty complicated
while this thing is pretty complicated and you do all the work of a real neural
and you do all the work of a real neural
and you do all the work of a real neural network you could actually flatten it
network you could actually flatten it
network you could actually flatten it into a single weight matrix because of a
into a single weight matrix because of a
into a single weight matrix because of a linear composition of linearity so this
linear composition of linearity so this
linear composition of linearity so this was the original one people like it
was the original one people like it
was the original one people like it because it’s very similar to what people
because it’s very similar to what people
because it’s very similar to what people use before they really got how machine
use before they really got how machine
use before they really got how machine learning words which was just binary
learning words which was just binary
learning words which was just binary threshold injustice fired back in the
threshold injustice fired back in the
threshold injustice fired back in the day and the cool part is with just those
day and the cool part is with just those
day and the cool part is with just those two units you know how to make a neural
two units you know how to make a neural
two units you know how to make a neural network you can just simply do you get
network you can just simply do you get
network you can just simply do you get your input you apply the matrix multiply
your input you apply the matrix multiply
your input you apply the matrix multiply you apply a sigmoid you apply another
you apply a sigmoid you apply another
you apply a sigmoid you apply another matrix multiply and you have one and
matrix multiply and you have one and
matrix multiply and you have one and these are called multi-layer perceptrons
these are called multi-layer perceptrons
these are called multi-layer perceptrons when you only have matrix multiplies and
when you only have matrix multiplies and
when you only have matrix multiplies and nonlinearities and the cool part is that
nonlinearities and the cool part is that
nonlinearities and the cool part is that there’s a serum on this that this simple
there’s a serum on this that this simple
there’s a serum on this that this simple architecture like literally three
architecture like literally three
architecture like literally three functions can solve can approximate
functions can solve can approximate
functions can solve can approximate arbitrary functions which means it can
arbitrary functions which means it can
arbitrary functions which means it can solve any problem that you care about
solve any problem that you care about
solve any problem that you care about there’s a cool theorem on this the idea
there’s a cool theorem on this the idea
there’s a cool theorem on this the idea is that if you make the middle big
is that if you make the middle big
is that if you make the middle big enough you can calculate basically any
enough you can calculate basically any
enough you can calculate basically any function the downside is that just
function the downside is that just
function the downside is that just because it can it doesn’t mean it will
because it can it doesn’t mean it will
because it can it doesn’t mean it will and a single layer multi-layer
and a single layer multi-layer
and a single layer multi-layer perceptron often causes more problems
perceptron often causes more problems
perceptron often causes more problems than it solves so this is why there was
than it solves so this is why there was
than it solves so this is why there was an AI winter in the 90s just because
an AI winter in the 90s just because
an AI winter in the 90s just because these things are your kind of terrible
these things are your kind of terrible
these things are your kind of terrible but people have gotten a lot better it
but people have gotten a lot better it
but people have gotten a lot better it it’s and that now now neural networks
it’s and that now now neural networks
it’s and that now now neural networks are cool and as a disclaimer these are
are cool and as a disclaimer these are
are cool and as a disclaimer these are these are neural networks when you have
these are neural networks when you have
these are neural networks when you have so this is would be a multi-layer
so this is would be a multi-layer
so this is would be a multi-layer perceptron these are neural networks
perceptron these are neural networks
perceptron these are neural networks everything I’m talking about today is
everything I’m talking about today is
everything I’m talking about today is still a neural network but these are
still a neural network but these are
still a neural network but these are specifically when you talk about the
specifically when you talk about the
specifically when you talk about the multi-layer perceptron that is what this
multi-layer perceptron that is what this
multi-layer perceptron that is what this is so since then people have made better
is so since then people have made better
is so since then people have made better nonlinearities this is probably majority
nonlinearities this is probably majority
nonlinearities this is probably majority of the improvement between 1990 and 2012
of the improvement between 1990 and 2012
of the improvement between 1990 and 2012 unfortunately which is you have like a
unfortunately which is you have like a
unfortunately which is you have like a kind of smarter non-linearity so instead
kind of smarter non-linearity so instead
kind of smarter non-linearity so instead of taking this weird squiggly function
of taking this weird squiggly function
of taking this weird squiggly function you just do a threshold so anything
you just do a threshold so anything
you just do a threshold so anything that’s negative you just turn it into 0
that’s negative you just turn it into 0
that’s negative you just turn it into 0 this is actually the most popular
this is actually the most popular
this is actually the most popular non-linearity nowadays it does
non-linearity nowadays it does
non-linearity nowadays it does incredibly well
incredibly well
incredibly well some really nice optimization properties
some really nice optimization properties
some really nice optimization properties in particular when you have zero like is
in particular when you have zero like is
in particular when you have zero like is this not only is very linear so it works
this not only is very linear so it works
this not only is very linear so it works very well with a chain rule and this
very well with a chain rule and this
very well with a chain rule and this thing is used almost everywhere nowadays
thing is used almost everywhere nowadays
thing is used almost everywhere nowadays especially in the middle of a neural
especially in the middle of a neural
especially in the middle of a neural network there is a softmax which you can
network there is a softmax which you can
network there is a softmax which you can think of as converting a bunch of
think of as converting a bunch of
think of as converting a bunch of numbers into a discrete probability
numbers into a discrete probability
numbers into a discrete probability distribution so the math of it is p
distribution so the math of it is p
distribution so the math of it is p equals you exponentiate your input and
equals you exponentiate your input and
equals you exponentiate your input and then you divide it by the sum of the
then you divide it by the sum of the
then you divide it by the sum of the inputs you can think of the explanation
inputs you can think of the explanation
inputs you can think of the explanation is turning it all the numbers into
is turning it all the numbers into
is turning it all the numbers into positive and the dividing by the sum is
positive and the dividing by the sum is
positive and the dividing by the sum is a normalization term there’s some very
a normalization term there’s some very
a normalization term there’s some very nice properties about this and it’s used
nice properties about this and it’s used
nice properties about this and it’s used as the final layer for classification
as the final layer for classification
as the final layer for classification problems and it’s used in almost every
problems and it’s used in almost every
problems and it’s used in almost every neural network cool that was the easy
neural network cool that was the easy
neural network cool that was the easy part this gets complicated I like feel
part this gets complicated I like feel
part this gets complicated I like feel free to ask questions during this I
free to ask questions during this I
free to ask questions during this I normally explain this with a whiteboard
normally explain this with a whiteboard
normally explain this with a whiteboard and it normally is complicated even with
and it normally is complicated even with
and it normally is complicated even with a whiteboard but i’ll try to go through
a whiteboard but i’ll try to go through
a whiteboard but i’ll try to go through this so a convolution is a the main
this so a convolution is a the main
this so a convolution is a the main workhorse for deep learning on images
workhorse for deep learning on images
workhorse for deep learning on images and deep learning and images is
and deep learning and images is
and deep learning and images is basically it’s kind of where this
basically it’s kind of where this
basically it’s kind of where this revolution started so it’s very very
revolution started so it’s very very
revolution started so it’s very very important it’s probably the place where
important it’s probably the place where
important it’s probably the place where deep learning is the most advanced so
deep learning is the most advanced so
deep learning is the most advanced so it’s a very important primitive and I
it’s a very important primitive and I
it’s a very important primitive and I think the very cool primitive to
think the very cool primitive to
think the very cool primitive to understand because you really realize
understand because you really realize
understand because you really realize like how beautiful the framework is when
like how beautiful the framework is when
like how beautiful the framework is when you see like wow this thing sounds
you see like wow this thing sounds
you see like wow this thing sounds pretty complicated but it you can just
pretty complicated but it you can just
pretty complicated but it you can just plug it in and you’re doing that need to
plug it in and you’re doing that need to
plug it in and you’re doing that need to know how it works when someone has coded
know how it works when someone has coded
know how it works when someone has coded up for you which is what I do so this is
up for you which is what I do so this is
up for you which is what I do so this is a linear operation for 2d images so once
a linear operation for 2d images so once
a linear operation for 2d images so once you have a multi-layer perceptron you
you have a multi-layer perceptron you
you have a multi-layer perceptron you have a mapping from every input to every
have a mapping from every input to every
have a mapping from every input to every output but in the case of images your
output but in the case of images your
output but in the case of images your inputs are structured so you have like
inputs are structured so you have like
inputs are structured so you have like this spatial relationship between your
this spatial relationship between your
this spatial relationship between your inputs and if you have a mapping from
inputs and if you have a mapping from
inputs and if you have a mapping from every input every output you kind of
every input every output you kind of
every input every output you kind of throw away the spatial relationship so
throw away the spatial relationship so
throw away the spatial relationship so the idea would be what if rather than
the idea would be what if rather than
the idea would be what if rather than having a connection from every input to
having a connection from every input to
having a connection from every input to every output what if every output the
every output what if every output the
every output what if every output the output look like an image as well and
output look like an image as well and
output look like an image as well and every output was only locally connected
every output was only locally connected
every output was only locally connected to the things that corresponds to so
to the things that corresponds to so
to the things that corresponds to so that is in sight number one so local
that is in sight number one so local
that is in sight number one so local connections and inside number two is
connections and inside number two is
connections and inside number two is every output is
every output is
every output is a local function of its input what if
a local function of its input what if
a local function of its input what if instead of having every output be its
instead of having every output be its
instead of having every output be its own function which would be the general
own function which would be the general
own function which would be the general case what if every output was the same
case what if every output was the same
case what if every output was the same function of its input so what this then
function of its input so what this then
function of its input so what this then becomes is equivalent to like a
becomes is equivalent to like a
becomes is equivalent to like a well-known function in computer vision
well-known function in computer vision
well-known function in computer vision which is a convolution which is you have
which is a convolution which is you have
which is a convolution which is you have a kernel which is you can think of it
a kernel which is you can think of it
a kernel which is you can think of it like a local weight matrix so it’s
like a local weight matrix so it’s
like a local weight matrix so it’s represented yeah oh cool my mouse is
represented yeah oh cool my mouse is
represented yeah oh cool my mouse is here so it’s often represented as the
here so it’s often represented as the
here so it’s often represented as the square in an image like thing which
square in an image like thing which
square in an image like thing which means that they’re capturing that local
means that they’re capturing that local
means that they’re capturing that local input you just do a matrix multiply
input you just do a matrix multiply
input you just do a matrix multiply between all of the weights of the kernel
between all of the weights of the kernel
between all of the weights of the kernel which would be something like this you
which would be something like this you
which would be something like this you do that multiply for everything the
do that multiply for everything the
do that multiply for everything the local region you sum up the results so
local region you sum up the results so
local region you sum up the results so this is just a dot product and then you
this is just a dot product and then you
this is just a dot product and then you do that at every single location at the
do that at every single location at the
do that at every single location at the input so it’s kind of like tiling your
input so it’s kind of like tiling your
input so it’s kind of like tiling your input with the same function or it can
input with the same function or it can
input with the same function or it can be interpreted extracting the same
be interpreted extracting the same
be interpreted extracting the same features at every location which is the
features at every location which is the
features at every location which is the more common way to interpret it this is
more common way to interpret it this is
more common way to interpret it this is it’s very powerful it’s very parameter
it’s very powerful it’s very parameter
it’s very powerful it’s very parameter efficient because you have a lot of
efficient because you have a lot of
efficient because you have a lot of weight sharing between the parameters
weight sharing between the parameters
weight sharing between the parameters and you can end up having much larger
and you can end up having much larger
and you can end up having much larger outputs then you can have with a normal
outputs then you can have with a normal
outputs then you can have with a normal matrix multiplication and you also don’t
matrix multiplication and you also don’t
matrix multiplication and you also don’t lose spatial information which is a very
lose spatial information which is a very
lose spatial information which is a very important structure of images so these
important structure of images so these
important structure of images so these are some really nice properties and as a
are some really nice properties and as a
are some really nice properties and as a side effect you might think that this
side effect you might think that this
side effect you might think that this thing is really complicated how do I
thing is really complicated how do I
thing is really complicated how do I take a gradient of it because I maybe
take a gradient of it because I maybe
take a gradient of it because I maybe the whole thing is kind of complicated
the whole thing is kind of complicated
the whole thing is kind of complicated but this is actually equivalent to a
but this is actually equivalent to a
but this is actually equivalent to a very constrained matrix multiplication
very constrained matrix multiplication
very constrained matrix multiplication so if you take your input image and you
so if you take your input image and you
so if you take your input image and you unroll it because with a matrix moulton
unroll it because with a matrix moulton
unroll it because with a matrix moulton you lose that spatial structure and you
you lose that spatial structure and you
you lose that spatial structure and you unroll your input you basically have a
unroll your input you basically have a
unroll your input you basically have a few connection like every input every so
few connection like every input every so
few connection like every input every so every output is connected to maybe like
every output is connected to maybe like
every output is connected to maybe like nine of your inputs and that just
nine of your inputs and that just
nine of your inputs and that just becomes equivalent to the really like
becomes equivalent to the really like
becomes equivalent to the really like all of the diagram with lots of arrows
all of the diagram with lots of arrows
all of the diagram with lots of arrows but most of the arrows being zero or
but most of the arrows being zero or
but most of the arrows being zero or missing so this is still completely
missing so this is still completely
missing so this is still completely differentiable and still fits very
differentiable and still fits very
differentiable and still fits very nicely into this framework that you can
nicely into this framework that you can
nicely into this framework that you can plug in with all the other
plug in with all the other
plug in with all the other nonlinearities cool it’s going to get a
nonlinearities cool it’s going to get a
nonlinearities cool it’s going to get a little bit harder
little bit harder
little bit harder another very fundamental building block
another very fundamental building block
another very fundamental building block is called a recurrent neural network I
is called a recurrent neural network I
is called a recurrent neural network I don’t know why the building block is
don’t know why the building block is
don’t know why the building block is called a network when everything else is
called a network when everything else is
called a network when everything else is called layer but that’s just kind of
called layer but that’s just kind of
called layer but that’s just kind of convention and this is solving a a
convention and this is solving a a
convention and this is solving a a problem that is basically has not been
problem that is basically has not been
problem that is basically has not been solved in machine learning before which
solved in machine learning before which
solved in machine learning before which is we want functions to take to take in
is we want functions to take to take in
is we want functions to take to take in variable size input but they can only
variable size input but they can only
variable size input but they can only take in fixed size input and this is
take in fixed size input and this is
take in fixed size input and this is becomes a problem when you’re a function
becomes a problem when you’re a function
becomes a problem when you’re a function is parametric like a fully connected
is parametric like a fully connected
is parametric like a fully connected layer is because if you want a
layer is because if you want a
layer is because if you want a connection from every input to every
connection from every input to every
connection from every input to every output but your input size changes that
output but your input size changes that
output but your input size changes that means you’re the number of weights you
means you’re the number of weights you
means you’re the number of weights you have changes and that means that if you
have changes and that means that if you
have changes and that means that if you get a longer example at the inference
get a longer example at the inference
get a longer example at the inference time you now don’t know what to do with
time you now don’t know what to do with
time you now don’t know what to do with it and this also might be inefficient
it and this also might be inefficient
it and this also might be inefficient because you might have like really
because you might have like really
because you might have like really really big a really large number of
really big a really large number of
really big a really large number of inputs and you might not need all of the
inputs and you might not need all of the
inputs and you might not need all of the power of having like every connection
power of having like every connection
power of having like every connection there so a recurrent neural network is a
there so a recurrent neural network is a
there so a recurrent neural network is a way to solve this problem and the
way to solve this problem and the
way to solve this problem and the solution to this problem is recursion so
solution to this problem is recursion so
solution to this problem is recursion so what you have is an initial state which
what you have is an initial state which
what you have is an initial state which would just be let’s just call it h in
would just be let’s just call it h in
would just be let’s just call it h in this example and you have a bunch of
this example and you have a bunch of
this example and you have a bunch of these inputs X and there’s a variable
these inputs X and there’s a variable
these inputs X and there’s a variable number of them so you don’t really know
number of them so you don’t really know
number of them so you don’t really know what like this capital T is and you can
what like this capital T is and you can
what like this capital T is and you can make a function that takes in a fixed
make a function that takes in a fixed
make a function that takes in a fixed size and because each X is fixed size
size and because each X is fixed size
size and because each X is fixed size you can make that function have taken
you can make that function have taken
you can make that function have taken both h + X and now you can recurse
both h + X and now you can recurse
both h + X and now you can recurse through this list by saying h of t
through this list by saying h of t
through this list by saying h of t equals the function of the previous
equals the function of the previous
equals the function of the previous state sorry the new state is a function
state sorry the new state is a function
state sorry the new state is a function of the previous state and the current
of the previous state and the current
of the previous state and the current input and then you just return the final
input and then you just return the final
input and then you just return the final one and what this allows you to do is it
one and what this allows you to do is it
one and what this allows you to do is it allows you to with a fixed size input
allows you to with a fixed size input
allows you to with a fixed size input you can have it operate on sorry with a
you can have it operate on sorry with a
you can have it operate on sorry with a fixed function that takes fixed size
fixed function that takes fixed size
fixed function that takes fixed size input you can now turn it into a
input you can now turn it into a
input you can now turn it into a function that takes in a variable sized
function that takes in a variable sized
function that takes in a variable sized input by applying that function of
input by applying that function of
input by applying that function of variable number of times this is not the
variable number of times this is not the
variable number of times this is not the this is like a pretty obvious insight
this is like a pretty obvious insight
this is like a pretty obvious insight and you could do that with any kind of
and you could do that with any kind of
and you could do that with any kind of machine learning algorithm you could
machine learning algorithm you could
machine learning algorithm you could like apply a random forest like an
like apply a random forest like an
like apply a random forest like an arbitrary number of times but the cool
arbitrary number of times but the cool
arbitrary number of times but the cool part about this is that because this
part about this is that because this
part about this is that because this function is differentiable this
function is differentiable this
function is differentiable this recursive function is also
recursive function is also
recursive function is also differentiable so you can take the
differentiable so you can take the
differentiable so you can take the derivatives of each of the inputs you
derivatives of each of the inputs you
derivatives of each of the inputs you can take even take the derivative of the
can take even take the derivative of the
can take even take the derivative of the weight matrices you use at each step you
weight matrices you use at each step you
weight matrices you use at each step you can use that f reach up and you get a
can use that f reach up and you get a
can use that f reach up and you get a diagram
diagram
diagram it looks kind of like this now you can
it looks kind of like this now you can
it looks kind of like this now you can think of it as applying an FC layer for
think of it as applying an FC layer for
think of it as applying an FC layer for each input that takes the input and the
each input that takes the input and the
each input that takes the input and the state so far and this diagram might not
state so far and this diagram might not
state so far and this diagram might not be very clear but there are many
be very clear but there are many
be very clear but there are many different diagrams for RN ends and
different diagrams for RN ends and
different diagrams for RN ends and they’re all equally confusing if you’re
they’re all equally confusing if you’re
they’re all equally confusing if you’re unfamiliar with them so this is kind of
unfamiliar with them so this is kind of
unfamiliar with them so this is kind of the one on the left is my favorite one
the one on the left is my favorite one
the one on the left is my favorite one because you can kind of think of it as a
because you can kind of think of it as a
because you can kind of think of it as a stateful function except you the state
stateful function except you the state
stateful function except you the state only lasts for the duration of your
only lasts for the duration of your
only lasts for the duration of your input but the unrolled version is the
input but the unrolled version is the
input but the unrolled version is the version you use if you’re taking
version you use if you’re taking
version you use if you’re taking gradients so this is equivalent just
gradients so this is equivalent just
gradients so this is equivalent just passing the gradients through this very
passing the gradients through this very
passing the gradients through this very long graph a last complicated slide long
long graph a last complicated slide long
long graph a last complicated slide long short term memory units this puts me in
short term memory units this puts me in
short term memory units this puts me in a really hard position because I can’t
a really hard position because I can’t
a really hard position because I can’t not talk about them because they are so
not talk about them because they are so
not talk about them because they are so big it is but they’re also extremely
big it is but they’re also extremely
big it is but they’re also extremely complicated and there they take more
complicated and there they take more
complicated and there they take more building blocks and I’ve UNIX even
building blocks and I’ve UNIX even
building blocks and I’ve UNIX even explain but there is this great blog
explain but there is this great blog
explain but there is this great blog post I think slides will be published so
post I think slides will be published so
post I think slides will be published so you don’t have to worry about that this
you don’t have to worry about that this
you don’t have to worry about that this great great blog post tries to explain
great great blog post tries to explain
great great blog post tries to explain it but I’m going to try to give like a
it but I’m going to try to give like a
it but I’m going to try to give like a high-level intuition of them just so
high-level intuition of them just so
high-level intuition of them just so like even higher than what I’ve said so
like even higher than what I’ve said so
like even higher than what I’ve said so far um just so that you can kind of
far um just so that you can kind of
far um just so that you can kind of understand where where it’s coming from
understand where where it’s coming from
understand where where it’s coming from when I talk about these things being
when I talk about these things being
when I talk about these things being used and the idea would be its kind of
used and the idea would be its kind of
used and the idea would be its kind of like an RNN and in practice no one uses
like an RNN and in practice no one uses
like an RNN and in practice no one uses the RNN that I’ve just described it’s a
the RNN that I’ve just described it’s a
the RNN that I’ve just described it’s a very simple function and there’s much
very simple function and there’s much
very simple function and there’s much more complicated versions it’s an RN n
more complicated versions it’s an RN n
more complicated versions it’s an RN n where the function is just really
where the function is just really
where the function is just really complicated so this entire thing here is
complicated so this entire thing here is
complicated so this entire thing here is a representation of that function I’m
a representation of that function I’m
a representation of that function I’m not going to get into the details of it
not going to get into the details of it
not going to get into the details of it but it involves a lot of different
but it involves a lot of different
but it involves a lot of different mechanisms in order to make optimization
mechanisms in order to make optimization
mechanisms in order to make optimization easier and the idea is that if you’d
easier and the idea is that if you’d
easier and the idea is that if you’d apply if you designed this function well
apply if you designed this function well
apply if you designed this function well the function is applied at each time
the function is applied at each time
the function is applied at each time step it can make the problem much much
step it can make the problem much much
step it can make the problem much much easier to optimize and you can have like
easier to optimize and you can have like
easier to optimize and you can have like a much much more powerful function and
a much much more powerful function and
a much much more powerful function and the key is that by having a path which
the key is that by having a path which
the key is that by having a path which is relatively simple so this is what
is relatively simple so this is what
is relatively simple so this is what represents with the top path where these
represents with the top path where these
represents with the top path where these variable operations being done to it it
variable operations being done to it it
variable operations being done to it it makes it easier to stack these things
makes it easier to stack these things
makes it easier to stack these things want back to back and that makes easier
want back to back and that makes easier
want back to back and that makes easier to learn long-term relationships between
to learn long-term relationships between
to learn long-term relationships between the functions
the functions
the functions whoo okay that was that was the
whoo okay that was that was the
whoo okay that was that was the complicated part you now know
complicated part you now know
complicated part you now know ninety-five percent of the building
ninety-five percent of the building
ninety-five percent of the building blocks that everyone uses for
blocks that everyone uses for
blocks that everyone uses for state-of-the-art deep learning with just
state-of-the-art deep learning with just
state-of-the-art deep learning with just these billing box you could probably do
these billing box you could probably do
these billing box you could probably do new state-of-the-art things on new
new state-of-the-art things on new
new state-of-the-art things on new domains so congratulations you ready for
domains so congratulations you ready for
domains so congratulations you ready for the next part um so in this part I want
the next part um so in this part I want
the next part um so in this part I want to talk about what D planning is really
to talk about what D planning is really
to talk about what D planning is really good at and what you should use it on
good at and what you should use it on
good at and what you should use it on the answer is a whole lot so I’m going
the answer is a whole lot so I’m going
the answer is a whole lot so I’m going to cover just the rough themes of where
to cover just the rough themes of where
to cover just the rough themes of where deep learning really shines but there’s
deep learning really shines but there’s
deep learning really shines but there’s just much much more to it which i think
just much much more to it which i think
just much much more to it which i think is part of the awesomeness because it’s
is part of the awesomeness because it’s
is part of the awesomeness because it’s all falls under this extremely simple
all falls under this extremely simple
all falls under this extremely simple framework that I’ve just described I
framework that I’ve just described I
framework that I’ve just described I don’t think that you could like describe
don’t think that you could like describe
don’t think that you could like describe any framework as simple as what I’ve
any framework as simple as what I’ve
any framework as simple as what I’ve just done and have it solved this many
just done and have it solved this many
just done and have it solved this many complicated unsolved tasks before 2012
complicated unsolved tasks before 2012
complicated unsolved tasks before 2012 basically so convolutional neural
basically so convolutional neural
basically so convolutional neural networks this is a general architecture
networks this is a general architecture
networks this is a general architecture commonly referred to as CNN’s this
commonly referred to as CNN’s this
commonly referred to as CNN’s this actually means a network in this case
actually means a network in this case
actually means a network in this case and not just a layer the idea is that
and not just a layer the idea is that
and not just a layer the idea is that you take your image you apply
you take your image you apply
you take your image you apply convolution you apply your value your
convolution you apply your value your
convolution you apply your value your rectified linear unit you’re probably
rectified linear unit you’re probably
rectified linear unit you’re probably convolution you apply lu and you
convolution you apply lu and you
convolution you apply lu and you basically repeat this convo you until
basically repeat this convo you until
basically repeat this convo you until you solve all the problems in computer
you solve all the problems in computer
you solve all the problems in computer vision that isn’t quite true since at
vision that isn’t quite true since at
vision that isn’t quite true since at the end you need to tack on some sort of
the end you need to tack on some sort of
the end you need to tack on some sort of outfit layer and the other player
outfit layer and the other player
outfit layer and the other player depends and what kind of input you’re
depends and what kind of input you’re
depends and what kind of input you’re trying to solve the cannot like a really
trying to solve the cannot like a really
trying to solve the cannot like a really old school task is that you its face
old school task is that you its face
old school task is that you its face recognition trying to determine like
recognition trying to determine like
recognition trying to determine like whose face this is and this is a really
whose face this is and this is a really
whose face this is and this is a really cool task because makes the
cool task because makes the
cool task because makes the representations very visual and you can
representations very visual and you can
representations very visual and you can see how the network learns over time so
see how the network learns over time so
see how the network learns over time so at the first layer you when you start
at the first layer you when you start
at the first layer you when you start with the pixels at the first layer your
with the pixels at the first layer your
with the pixels at the first layer your filters tend to just match for edges and
filters tend to just match for edges and
filters tend to just match for edges and very simple things so convolutions can
very simple things so convolutions can
very simple things so convolutions can match edges and other very simple shapes
match edges and other very simple shapes
match edges and other very simple shapes and as you get deeper and deeper into
and as you get deeper and deeper into
and as you get deeper and deeper into network you learn more complicated
network you learn more complicated
network you learn more complicated functions of the input so after that you
functions of the input so after that you
functions of the input so after that you can start combining edges into corners
can start combining edges into corners
can start combining edges into corners or blobs so this is still extremely
or blobs so this is still extremely
or blobs so this is still extremely simple but after you get to another
simple but after you get to another
simple but after you get to another layer somehow like combining two corners
layer somehow like combining two corners
layer somehow like combining two corners the right way becomes kind of like an
the right way becomes kind of like an
the right way becomes kind of like an eye like shape or if you have like two
eye like shape or if you have like two
eye like shape or if you have like two corners in a blob that becomes more I
corners in a blob that becomes more I
corners in a blob that becomes more I like and you can build up from
like and you can build up from
like and you can build up from edges to corners to object parts and
edges to corners to object parts and
edges to corners to object parts and eventually into the objects you care
eventually into the objects you care
eventually into the objects you care about and as you get really really deep
about and as you get really really deep
about and as you get really really deep networks you actually have intermediates
networks you actually have intermediates
networks you actually have intermediates that are extremely semantic objects for
that are extremely semantic objects for
that are extremely semantic objects for example people have made a lot of tools
example people have made a lot of tools
example people have made a lot of tools for visualization of neural networks
for visualization of neural networks
for visualization of neural networks where they visualize what these in with
where they visualize what these in with
where they visualize what these in with the neural networks learn and you have
the neural networks learn and you have
the neural networks learn and you have for example if you have a neural network
for example if you have a neural network
for example if you have a neural network that doesn’t learn to classify books at
that doesn’t learn to classify books at
that doesn’t learn to classify books at all but lanes classified bookshelves
all but lanes classified bookshelves
all but lanes classified bookshelves some of the intermediate features
some of the intermediate features
some of the intermediate features actually become book classifiers which
actually become book classifiers which
actually become book classifiers which is really interesting like it can learn
is really interesting like it can learn
is really interesting like it can learn or I like a hierarchical representation
or I like a hierarchical representation
or I like a hierarchical representation of your input space such that these are
of your input space such that these are
of your input space such that these are useful things to combine together in
useful things to combine together in
useful things to combine together in order to make a robust classifier and by
order to make a robust classifier and by
order to make a robust classifier and by combining so maybe if you combine like
combining so maybe if you combine like
combining so maybe if you combine like three books together as well as a square
three books together as well as a square
three books together as well as a square this becomes a bookshelf so these are
this becomes a bookshelf so these are
this becomes a bookshelf so these are kind of like what the local operations
kind of like what the local operations
kind of like what the local operations do with each neural network and the
do with each neural network and the
do with each neural network and the beauty of it is that it’s all learned
beauty of it is that it’s all learned
beauty of it is that it’s all learned automatically for you don’t need to
automatically for you don’t need to
automatically for you don’t need to program like I have a book shelf
program like I have a book shelf
program like I have a book shelf bookshelf normally have books they have
bookshelf normally have books they have
bookshelf normally have books they have books sorry they have like square stuff
books sorry they have like square stuff
books sorry they have like square stuff maybe they’re often decide flowers this
maybe they’re often decide flowers this
maybe they’re often decide flowers this all can like happen in a data set
all can like happen in a data set
all can like happen in a data set automatically for you and these
automatically for you and these
automatically for you and these convolutional neural networks are
convolutional neural networks are
convolutional neural networks are absolutely amazing they just when I
absolutely amazing they just when I
absolutely amazing they just when I wasn’t joking when they save basically
wasn’t joking when they save basically
wasn’t joking when they save basically all of computer vision right now it all
all of computer vision right now it all
all of computer vision right now it all started with imagenet this was in 2012
started with imagenet this was in 2012
started with imagenet this was in 2012 this is when deep learning actually the
this is when deep learning actually the
this is when deep learning actually the entire hype train started where you had
entire hype train started where you had
entire hype train started where you had traditional machine learning solving
traditional machine learning solving
traditional machine learning solving this very hard very large computer
this very hard very large computer
this very hard very large computer vision data set and it was kind of
vision data set and it was kind of
vision data set and it was kind of plateauing over the years and all of a
plateauing over the years and all of a
plateauing over the years and all of a sudden deep learning comes in and it
sudden deep learning comes in and it
sudden deep learning comes in and it just blows everything away and ever
just blows everything away and ever
just blows everything away and ever since then everything has been
since then everything has been
since then everything has been everything in computer vision has been
everything in computer vision has been
everything in computer vision has been deep learning like nothing can even
deep learning like nothing can even
deep learning like nothing can even compare and recently we’ve been even
compare and recently we’ve been even
compare and recently we’ve been even being able to get superhuman results
being able to get superhuman results
being able to get superhuman results which is pretty impressive because
which is pretty impressive because
which is pretty impressive because humans are pretty good at seeing things
humans are pretty good at seeing things
humans are pretty good at seeing things it’s kind of what we’ve evolved to do
it’s kind of what we’ve evolved to do
it’s kind of what we’ve evolved to do and the same architectures can do all
and the same architectures can do all
and the same architectures can do all sorts of really interesting structured
sorts of really interesting structured
sorts of really interesting structured tasks so using almost the same
tasks so using almost the same
tasks so using almost the same architecture you can use a concept to
architecture you can use a concept to
architecture you can use a concept to determine you know like you can break up
determine you know like you can break up
determine you know like you can break up your input space into a what’s called a
your input space into a what’s called a
your input space into a what’s called a semantic segmentation of like all of the
semantic segmentation of like all of the
semantic segmentation of like all of the relevant parts that you
relevant parts that you
relevant parts that you have and using basically the same
have and using basically the same
have and using basically the same architecture as well you can do crazy
architecture as well you can do crazy
architecture as well you can do crazy things like super resolution where you
things like super resolution where you
things like super resolution where you takin like a low-resolution image and
takin like a low-resolution image and
takin like a low-resolution image and make it you can fill in the details
make it you can fill in the details
make it you can fill in the details which is pretty is that’s a pretty not
which is pretty is that’s a pretty not
which is pretty is that’s a pretty not only is it incredible even though it
only is it incredible even though it
only is it incredible even though it sounds pretty easy it’s incredible that
sounds pretty easy it’s incredible that
sounds pretty easy it’s incredible that like that you can use the same
like that you can use the same
like that you can use the same architecture that takes an image and
architecture that takes an image and
architecture that takes an image and tells you whether or not there’s a dog
tells you whether or not there’s a dog
tells you whether or not there’s a dog in it to take an image and return like a
in it to take an image and return like a
in it to take an image and return like a new higher resolution image and this is
new higher resolution image and this is
new higher resolution image and this is basically the same library the same
basically the same library the same
basically the same library the same components it’s just very very
components it’s just very very
components it’s just very very composable and that’s really good
composable and that’s really good
composable and that’s really good awesome you can also use this to solve
awesome you can also use this to solve
awesome you can also use this to solve really hard medical tasks tasks that
really hard medical tasks tasks that
really hard medical tasks tasks that people could not solve before here we’re
people could not solve before here we’re
people could not solve before here we’re detecting classifying lung cancer in CT
detecting classifying lung cancer in CT
detecting classifying lung cancer in CT scans these are the kinds of things that
scans these are the kinds of things that
scans these are the kinds of things that I like to work on and it’s not only
I like to work on and it’s not only
I like to work on and it’s not only limited division there’s been a lot of
limited division there’s been a lot of
limited division there’s been a lot of work in language understanding so is
work in language understanding so is
work in language understanding so is something that deep learning is really
something that deep learning is really
something that deep learning is really good at this language modeling roughly
good at this language modeling roughly
good at this language modeling roughly this means how probable is a how much
this means how probable is a how much
this means how probable is a how much sense this statement-making a certain
sense this statement-making a certain
sense this statement-making a certain language so it might have to do with a
language so it might have to do with a
language so it might have to do with a question response how are you I’m fine
question response how are you I’m fine
question response how are you I’m fine it might have other things such as what
it might have other things such as what
it might have other things such as what would be a weird thing my laptop is
would be a weird thing my laptop is
would be a weird thing my laptop is squishy might be a very improbable
squishy might be a very improbable
squishy might be a very improbable sentence to say so a neural network
sentence to say so a neural network
sentence to say so a neural network could probably determine squishy is a
could probably determine squishy is a
could probably determine squishy is a very bad adjective for a laptop this is
very bad adjective for a laptop this is
very bad adjective for a laptop this is a very improbable sentence but if I said
a very improbable sentence but if I said
a very improbable sentence but if I said my laptop is hot that would probably be
my laptop is hot that would probably be
my laptop is hot that would probably be a much more likely sentence and this
a much more likely sentence and this
a much more likely sentence and this already has some human-like seal to it
already has some human-like seal to it
already has some human-like seal to it because language was designed for humans
because language was designed for humans
because language was designed for humans and being able to have like if you can
and being able to have like if you can
and being able to have like if you can do language understanding as in
do language understanding as in
do language understanding as in determining the probability of like any
determining the probability of like any
determining the probability of like any sentence given a context you can and if
sentence given a context you can and if
sentence given a context you can and if you do this perfectly you can solve
you do this perfectly you can solve
you do this perfectly you can solve basically any task and this is a it’s
basically any task and this is a it’s
basically any task and this is a it’s really interesting domain where it’s
really interesting domain where it’s
really interesting domain where it’s being applied because previous if you
being applied because previous if you
being applied because previous if you look at what how language understanding
look at what how language understanding
look at what how language understanding was done before deep learning was around
was done before deep learning was around
was done before deep learning was around it was just incredibly simplistic tons
it was just incredibly simplistic tons
it was just incredibly simplistic tons and tons of rules no robustness two data
and tons of rules no robustness two data
and tons of rules no robustness two data sets you’d have to make custom rules for
sets you’d have to make custom rules for
sets you’d have to make custom rules for every language and now you could you can
every language and now you could you can
every language and now you could you can use the same tricks for English as you
use the same tricks for English as you
use the same tricks for English as you can
can
can for Chinese characters as you can for
for Chinese characters as you can for
for Chinese characters as you can for byte code so that is just pretty
byte code so that is just pretty
byte code so that is just pretty incredible they’ve obviously been much
incredible they’ve obviously been much
incredible they’ve obviously been much more complicated tasks a pretty popular
more complicated tasks a pretty popular
more complicated tasks a pretty popular use for machine for deep learning that
use for machine for deep learning that
use for machine for deep learning that this people are really putting a lot of
this people are really putting a lot of
this people are really putting a lot of effort in is aunt went language
effort in is aunt went language
effort in is aunt went language understanding from scratch so the idea
understanding from scratch so the idea
understanding from scratch so the idea is you use an RN to compress a sentence
is you use an RN to compress a sentence
is you use an RN to compress a sentence in your source language into a vector
in your source language into a vector
in your source language into a vector like I described in the RNN section and
like I described in the RNN section and
like I described in the RNN section and then you use a different RNN to decode
then you use a different RNN to decode
then you use a different RNN to decode it into a target language and while it’s
it into a target language and while it’s
it into a target language and while it’s not surprising that you can design a
not surprising that you can design a
not surprising that you can design a neural network that plausibly can output
neural network that plausibly can output
neural network that plausibly can output this it is quite surprising that it
this it is quite surprising that it
this it is quite surprising that it works so well and you’ve been able to
works so well and you’ve been able to
works so well and you’ve been able to have neural networks that in the man in
have neural networks that in the man in
have neural networks that in the man in a span of a few grad student months
a span of a few grad student months
a span of a few grad student months match the performance of systems that
match the performance of systems that
match the performance of systems that people have spent decades engineering
people have spent decades engineering
people have spent decades engineering and / happened nowadays I think that
and / happened nowadays I think that
and / happened nowadays I think that deep learning systems are not into and
deep learning systems are not into and
deep learning systems are not into and deploying systems are not what’s used
deploying systems are not what’s used
deploying systems are not what’s used for this right now but they’re a very
for this right now but they’re a very
for this right now but they’re a very important component so people still use
important component so people still use
important component so people still use a bit of hard coded stuff but it’s only
a bit of hard coded stuff but it’s only
a bit of hard coded stuff but it’s only a matter of time and the beauty is that
a matter of time and the beauty is that
a matter of time and the beauty is that but if we have a new task or a new
but if we have a new task or a new
but if we have a new task or a new language now it can just automatically
language now it can just automatically
language now it can just automatically work like what if we you know we find
work like what if we you know we find
work like what if we you know we find out some lost language from a thousand
out some lost language from a thousand
out some lost language from a thousand years ago and we have like a good amount
years ago and we have like a good amount
years ago and we have like a good amount of their texts can we actually learn how
of their texts can we actually learn how
of their texts can we actually learn how to translate it or understand it without
to translate it or understand it without
to translate it or understand it without any knowledge of this and it seems like
any knowledge of this and it seems like
any knowledge of this and it seems like purely from data we can and that’s
purely from data we can and that’s
purely from data we can and that’s really cool we don’t need an
really cool we don’t need an
really cool we don’t need an understanding of something in order to
understanding of something in order to
understanding of something in order to we don’t need a understanding prior to
we don’t need a understanding prior to
we don’t need a understanding prior to applying our machine learning models in
applying our machine learning models in
applying our machine learning models in order to have an understanding
order to have an understanding
order to have an understanding afterwards and that is just really
afterwards and that is just really
afterwards and that is just really really awesome I’ve actually been
really awesome I’ve actually been
really awesome I’ve actually been chatting with the people at SETI the
chatting with the people at SETI the
chatting with the people at SETI the search for extraterrestrial intelligence
search for extraterrestrial intelligence
search for extraterrestrial intelligence and one of the tasks that they’re doing
and one of the tasks that they’re doing
and one of the tasks that they’re doing is trying to understand dolphins the
is trying to understand dolphins the
is trying to understand dolphins the rationale is that if we can dolphins of
rationale is that if we can dolphins of
rationale is that if we can dolphins of language aliens might have language if
language aliens might have language if
language aliens might have language if we if we see alien communication we
we if we see alien communication we
we if we see alien communication we probably won’t understand it
probably won’t understand it
probably won’t understand it perhaps we can use dolphins to the proxy
perhaps we can use dolphins to the proxy
perhaps we can use dolphins to the proxy for aliens to try and understand them so
for aliens to try and understand them so
for aliens to try and understand them so there’s some really cool tasks that are
there’s some really cool tasks that are
there’s some really cool tasks that are happening there it’s not limited to that
happening there it’s not limited to that
happening there it’s not limited to that there’s some really cool things being
there’s some really cool things being
there’s some really cool things being done with art in deep learning actually
done with art in deep learning actually
done with art in deep learning actually I think that companies have started up
I think that companies have started up
I think that companies have started up that their entire business model is
that their entire business model is
that their entire business model is creating awesome deep learning art and
creating awesome deep learning art and
creating awesome deep learning art and they seem to be doing well from what
they seem to be doing well from what
they seem to be doing well from what I’ve heard in this case this is a
I’ve heard in this case this is a
I’ve heard in this case this is a hallucination purely from a conf lap
hallucination purely from a conf lap
hallucination purely from a conf lap trained to do image classification so an
trained to do image classification so an
trained to do image classification so an image that continent you know something
image that continent you know something
image that continent you know something that takes an image tells you like what
that takes an image tells you like what
that takes an image tells you like what breed of dog it is with objects or in it
breed of dog it is with objects or in it
breed of dog it is with objects or in it you can use it with a few tricks to
you can use it with a few tricks to
you can use it with a few tricks to create this kind of crazy art and this
create this kind of crazy art and this
create this kind of crazy art and this was a pretty big splash it’s a very
was a pretty big splash it’s a very
was a pretty big splash it’s a very unintuitive that a neural network that
unintuitive that a neural network that
unintuitive that a neural network that isn’t even made trained to make art
isn’t even made trained to make art
isn’t even made trained to make art actually can turn out making this kind
actually can turn out making this kind
actually can turn out making this kind of thing they’ve been more popular use
of thing they’ve been more popular use
of thing they’ve been more popular use cases such as style transfer the idea
cases such as style transfer the idea
cases such as style transfer the idea would be you can take a neural network
would be you can take a neural network
would be you can take a neural network still train for classification the idea
still train for classification the idea
still train for classification the idea would be classification has some priors
would be classification has some priors
would be classification has some priors about what images some priors about the
about what images some priors about the
about what images some priors about the natural world so the what you do then is
natural world so the what you do then is
natural world so the what you do then is you say i want my image to kind of match
you say i want my image to kind of match
you say i want my image to kind of match the distribution from a different image
the distribution from a different image
the distribution from a different image and then you get this kind of style
and then you get this kind of style
and then you get this kind of style transfer where you can mix together
transfer where you can mix together
transfer where you can mix together these kinds of components and while this
these kinds of components and while this
these kinds of components and while this is actually pretty ugly example there’s
is actually pretty ugly example there’s
is actually pretty ugly example there’s there’s some good ones i promise there’s
there’s some good ones i promise there’s
there’s some good ones i promise there’s some much more complicated things you
some much more complicated things you
some much more complicated things you can do it’s not just like taking two
can do it’s not just like taking two
can do it’s not just like taking two images together and merging them
images together and merging them
images together and merging them together you can do things like
together you can do things like
together you can do things like transforming a perhaps not super great
transforming a perhaps not super great
transforming a perhaps not super great drawing something that you could
drawing something that you could
drawing something that you could probably do in paint fairly quickly into
probably do in paint fairly quickly into
probably do in paint fairly quickly into something that looks like an artist did
something that looks like an artist did
something that looks like an artist did or something that’s really awesome and
or something that’s really awesome and
or something that’s really awesome and the idea would be that you can actually
the idea would be that you can actually
the idea would be that you can actually take these arbitrary doodles and convert
take these arbitrary doodles and convert
take these arbitrary doodles and convert them into these things that look like
them into these things that look like
them into these things that look like paintings and this kind of stuff is
paintings and this kind of stuff is
paintings and this kind of stuff is really awesome and I think it’s just the
really awesome and I think it’s just the
really awesome and I think it’s just the beginning of the kind of stuff that we
beginning of the kind of stuff that we
beginning of the kind of stuff that we can do with neural network art but after
can do with neural network art but after
can do with neural network art but after basically less than a year of work on
basically less than a year of work on
basically less than a year of work on this you’re making applications that are
this you’re making applications that are
this you’re making applications that are already very tangible very awesome very
already very tangible very awesome very
already very tangible very awesome very this is already something that if I made
this is already something that if I made
this is already something that if I made this I would probably hang up in my
this I would probably hang up in my
this I would probably hang up in my living room and this has only been one
living room and this has only been one
living room and this has only been one year of work imagine what happens that
year of work imagine what happens that
year of work imagine what happens that in 10 years I saved the best for last in
in 10 years I saved the best for last in
in 10 years I saved the best for last in terms of art we can combine our pictures
terms of art we can combine our pictures
terms of art we can combine our pictures without of Pokemon so clearly the future
without of Pokemon so clearly the future
without of Pokemon so clearly the future is here um this is one of my crowning
is here um this is one of my crowning
is here um this is one of my crowning achievements I think primarily because
achievements I think primarily because
achievements I think primarily because I’ve done this with like dozens of
I’ve done this with like dozens of
I’ve done this with like dozens of people and only mine turned out well but
people and only mine turned out well but
people and only mine turned out well but yeah the I think this is really awesome
yeah the I think this is really awesome
yeah the I think this is really awesome there’s like just so many things to do
there’s like just so many things to do
there’s like just so many things to do here and so few people are working it on
here and so few people are working it on
here and so few people are working it on it and that the sky is really the limit
it and that the sky is really the limit
it and that the sky is really the limit so it’s just really exciting what on the
so it’s just really exciting what on the
so it’s just really exciting what on the kinds of stuff that we can be created
kinds of stuff that we can be created
kinds of stuff that we can be created here there’s been other huge achievement
here there’s been other huge achievement
here there’s been other huge achievement game playing has been really big if
game playing has been really big if
game playing has been really big if anyone saw deep mines 500 million dollar
anyone saw deep mines 500 million dollar
anyone saw deep mines 500 million dollar acquisition in 2013 roughly the only
acquisition in 2013 roughly the only
acquisition in 2013 roughly the only paper that they had at the time was
paper that they had at the time was
paper that they had at the time was learning to play Atari games from pixels
learning to play Atari games from pixels
learning to play Atari games from pixels which is might be harder than it sounds
which is might be harder than it sounds
which is might be harder than it sounds because humans have a prior of how to
because humans have a prior of how to
because humans have a prior of how to play the game right like they have a
play the game right like they have a
play the game right like they have a prior that this is maybe a ball and
prior that this is maybe a ball and
prior that this is maybe a ball and that’s a paddle and I want to destroy
that’s a paddle and I want to destroy
that’s a paddle and I want to destroy certain things where they were prior
certain things where they were prior
certain things where they were prior that a key opens doors or that roads are
that a key opens doors or that roads are
that a key opens doors or that roads are something I want to stay on in a driving
something I want to stay on in a driving
something I want to stay on in a driving game but neural networks not given any
game but neural networks not given any
game but neural networks not given any of these priors it’s literally only
of these priors it’s literally only
of these priors it’s literally only given the pixels given these images it
given the pixels given these images it
given the pixels given these images it learns to play at what is on median a
learns to play at what is on median a
learns to play at what is on median a superhuman level and the techniques have
superhuman level and the techniques have
superhuman level and the techniques have been continuing to get better and this
been continuing to get better and this
been continuing to get better and this kind of stuff very similar tricks have
kind of stuff very similar tricks have
kind of stuff very similar tricks have been applied to the much more recent
been applied to the much more recent
been applied to the much more recent result of google deepmind alfa GO
result of google deepmind alfa GO
result of google deepmind alfa GO network which was not that huge of a
network which was not that huge of a
network which was not that huge of a deal in the West but if you ever talk to
deal in the West but if you ever talk to
deal in the West but if you ever talk to people from the more Eastern world you
people from the more Eastern world you
people from the more Eastern world you can talk to them about here are the
can talk to them about here are the
can talk to them about here are the achievements of deep learning you talk
achievements of deep learning you talk
achievements of deep learning you talk about smart inbox and they’re like oh
about smart inbox and they’re like oh
about smart inbox and they’re like oh that’s pretty okay you talk about image
that’s pretty okay you talk about image
that’s pretty okay you talk about image search yeah that’s pretty okay and then
search yeah that’s pretty okay and then
search yeah that’s pretty okay and then you talk tell them about like oh yeah
you talk tell them about like oh yeah
you talk tell them about like oh yeah it’d also be the world champion it go
it’d also be the world champion it go
it’d also be the world champion it go and they’re like whoa we’d beat plays go
and they’re like whoa we’d beat plays go
and they’re like whoa we’d beat plays go that’s amazing and people predicted that
that’s amazing and people predicted that
that’s amazing and people predicted that even beating human
even beating human
even beating human go would probably be depending on the
go would probably be depending on the
go would probably be depending on the expert 10 to 100 years off and it
expert 10 to 100 years off and it
expert 10 to 100 years off and it happened it just happened it’s already
happened it just happened it’s already
happened it just happened it’s already done it’s already that like humans have
done it’s already that like humans have
done it’s already that like humans have lost that go and as a side effect goez
lost that go and as a side effect goez
lost that go and as a side effect goez also caused more fear over AI safety
also caused more fear over AI safety
also caused more fear over AI safety than any other neural network I believe
than any other neural network I believe
than any other neural network I believe and this is probably a good
and this is probably a good
and this is probably a good representation of that I don’t know how
representation of that I don’t know how
representation of that I don’t know how yes let’s medium clear this is an XKCD
yes let’s medium clear this is an XKCD
yes let’s medium clear this is an XKCD of like how hard people used to think
of like how hard people used to think
of like how hard people used to think these games were and you can see go as
these games were and you can see go as
these games were and you can see go as basically being the last on the level of
basically being the last on the level of
basically being the last on the level of computer still lose to top humans and
computer still lose to top humans and
computer still lose to top humans and then not all of these are solved but
then not all of these are solved but
then not all of these are solved but that is just pretty incredible that
that is just pretty incredible that
that is just pretty incredible that that’s now solved people have been
that’s now solved people have been
that’s now solved people have been trying to ask like if it can do this
trying to ask like if it can do this
trying to ask like if it can do this what can’t it do because go is a task
what can’t it do because go is a task
what can’t it do because go is a task that requires a lot of reasoning and
that requires a lot of reasoning and
that requires a lot of reasoning and these kinds of achievements have been
these kinds of achievements have been
these kinds of achievements have been being transferred into the physical
being transferred into the physical
being transferred into the physical world as well this is a google has like
world as well this is a google has like
world as well this is a google has like a farm with like a bunch of robots that
a farm with like a bunch of robots that
a farm with like a bunch of robots that have learned on their own to grasp
have learned on their own to grasp
have learned on their own to grasp objects and basically robotics control
objects and basically robotics control
objects and basically robotics control is usually pretty hard especially when
is usually pretty hard especially when
is usually pretty hard especially when you’re trying to make it generalize and
you’re trying to make it generalize and
you’re trying to make it generalize and they’ve been able to do that just by you
they’ve been able to do that just by you
they’ve been able to do that just by you know throwing the robots into a dark
know throwing the robots into a dark
know throwing the robots into a dark warehouse having a train for a while
warehouse having a train for a while
warehouse having a train for a while designing a cute objective function and
designing a cute objective function and
designing a cute objective function and it just learned to grasp things better
it just learned to grasp things better
it just learned to grasp things better than their hand design controllers did
than their hand design controllers did
than their hand design controllers did which was pretty awesome and more
which was pretty awesome and more
which was pretty awesome and more recently actually I think there was a
recently actually I think there was a
recently actually I think there was a video like that came out last week of
video like that came out last week of
video like that came out last week of nvidia using just deep learning for
nvidia using just deep learning for
nvidia using just deep learning for self-driving cars so the idea was like
self-driving cars so the idea was like
self-driving cars so the idea was like with just a single camera in front of
with just a single camera in front of
with just a single camera in front of your car now your car can learn to drive
your car now your car can learn to drive
your car now your car can learn to drive can can drive itself from learning from
can can drive itself from learning from
can can drive itself from learning from how other people drove and this is a
how other people drove and this is a
how other people drove and this is a very interesting result because even
very interesting result because even
very interesting result because even google has been working for i don’t know
google has been working for i don’t know
google has been working for i don’t know if it might have been a decade already
if it might have been a decade already
if it might have been a decade already that they’ve been working on
that they’ve been working on
that they’ve been working on self-driving cars using you know lidar
self-driving cars using you know lidar
self-driving cars using you know lidar and slam and all of that stuff and
and slam and all of that stuff and
and slam and all of that stuff and Nvidia’s by some measures caught up to
Nvidia’s by some measures caught up to
Nvidia’s by some measures caught up to them entirely within i think it’s been
them entirely within i think it’s been
them entirely within i think it’s been less than a year since they’ve been
less than a year since they’ve been
less than a year since they’ve been investing in this so a lot of thing it
investing in this so a lot of thing it
investing in this so a lot of thing it seems to be changing a lot of things
seems to be changing a lot of things
seems to be changing a lot of things especially these kinds of perception
especially these kinds of perception
especially these kinds of perception tasks because research is moving so fast
tasks because research is moving so fast
tasks because research is moving so fast I also have to spend some time and
I also have to spend some time and
I also have to spend some time and things that are not yet practical but
things that are not yet practical but
things that are not yet practical but may very well soon be as a disclaimer
may very well soon be as a disclaimer
may very well soon be as a disclaimer I’ve been traveling this weekend so i’m
I’ve been traveling this weekend so i’m
I’ve been traveling this weekend so i’m not sure if some of these things belong
not sure if some of these things belong
not sure if some of these things belong in the already solved category
in the already solved category
in the already solved category generation is a big one there’s tons and
generation is a big one there’s tons and
generation is a big one there’s tons and tons of stuff happening generation so i
tons of stuff happening generation so i
tons of stuff happening generation so i definitely can’t give it justice there’s
definitely can’t give it justice there’s
definitely can’t give it justice there’s really cool stuff and like just
really cool stuff and like just
really cool stuff and like just generating images from scratch and
generating images from scratch and
generating images from scratch and generating arbitrary other domains from
generating arbitrary other domains from
generating arbitrary other domains from scratch images are just the most visual
scratch images are just the most visual
scratch images are just the most visual so i have them here but some of the
so i have them here but some of the
so i have them here but some of the coolest and perhaps most practical
coolest and perhaps most practical
coolest and perhaps most practical examples are conditional generation
examples are conditional generation
examples are conditional generation something i’m really excited about is
something i’m really excited about is
something i’m really excited about is image to text so the idea is you’ve
image to text so the idea is you’ve
image to text so the idea is you’ve taken an input image and the output is
taken an input image and the output is
taken an input image and the output is not like yes or no whether or not the
not like yes or no whether or not the
not like yes or no whether or not the dogs engine but you output a description
dogs engine but you output a description
dogs engine but you output a description of the image and that’s like an
of the image and that’s like an
of the image and that’s like an extremely human task it to be extremely
extremely human task it to be extremely
extremely human task it to be extremely useful if you do this task right it
useful if you do this task right it
useful if you do this task right it seems like this all the whole ton of
seems like this all the whole ton of
seems like this all the whole ton of possibilities I’m very excited about
possibilities I’m very excited about
possibilities I’m very excited about like taking in a medical image and like
like taking in a medical image and like
like taking in a medical image and like outputting like a pleat report of it
outputting like a pleat report of it
outputting like a pleat report of it which would be really awesome and some
which would be really awesome and some
which would be really awesome and some people that are really excited about
people that are really excited about
people that are really excited about this that has applications in the very
this that has applications in the very
this that has applications in the very short term is I don’t know the right way
short term is I don’t know the right way
short term is I don’t know the right way to say it but like the poor eyesight
to say it but like the poor eyesight
to say it but like the poor eyesight community so web pages nowadays have
community so web pages nowadays have
community so web pages nowadays have been pretty bad about stuff for people
been pretty bad about stuff for people
been pretty bad about stuff for people with disabilities and imagine if you had
with disabilities and imagine if you had
with disabilities and imagine if you had a neural network that can just describe
a neural network that can just describe
a neural network that can just describe an image for you describe a page for you
an image for you describe a page for you
an image for you describe a page for you tell you what’s on the page in a very
tell you what’s on the page in a very
tell you what’s on the page in a very semantic summarized way and there’s also
semantic summarized way and there’s also
semantic summarized way and there’s also a really cool opposite problem which is
a really cool opposite problem which is
a really cool opposite problem which is instead of taking an image and
instead of taking an image and
instead of taking an image and outputting a description you take in a
outputting a description you take in a
outputting a description you take in a description and not put an image which
description and not put an image which
description and not put an image which as a terrible idea artist I’m probably a
as a terrible idea artist I’m probably a
as a terrible idea artist I’m probably a bit more excited about because instead
bit more excited about because instead
bit more excited about because instead of like I can describe pictures I can’t
of like I can describe pictures I can’t
of like I can describe pictures I can’t really draw them and like these are much
really draw them and like these are much
really draw them and like these are much better already than I can draw but
better already than I can draw but
better already than I can draw but that’s probably a low bar but in this
that’s probably a low bar but in this
that’s probably a low bar but in this kind of network you actually take in
kind of network you actually take in
kind of network you actually take in like a sentences text and all of these
like a sentences text and all of these
like a sentences text and all of these images are generated from that network
images are generated from that network
images are generated from that network and that’s pretty incredible some of
and that’s pretty incredible some of
and that’s pretty incredible some of them are not super great but like these
them are not super great but like these
them are not super great but like these birds are actually there’s I believe
birds are actually there’s I believe
birds are actually there’s I believe they’re real the flower is not the
they’re real the flower is not the
they’re real the flower is not the purple ones
purple ones
purple ones but they actually see him close like if
but they actually see him close like if
but they actually see him close like if I if it was zoomed out enough I could
I if it was zoomed out enough I could
I if it was zoomed out enough I could see this is being pretty real and can
see this is being pretty real and can
see this is being pretty real and can you imagine in a future where instead of
you imagine in a future where instead of
you imagine in a future where instead of having to spend millions of dollars in a
having to spend millions of dollars in a
having to spend millions of dollars in a movie you just like type it up and then
movie you just like type it up and then
movie you just like type it up and then a neural network just generates the
a neural network just generates the
a neural network just generates the movie for you we’re quite away from that
movie for you we’re quite away from that
movie for you we’re quite away from that but perhaps not that far away especially
but perhaps not that far away especially
but perhaps not that far away especially like with some focused work and this
like with some focused work and this
like with some focused work and this could name like all sorts of like new
could name like all sorts of like new
could name like all sorts of like new forms of creativity that people don’t
forms of creativity that people don’t
forms of creativity that people don’t even know about while language
even know about while language
even know about while language understanding does quite well there is a
understanding does quite well there is a
understanding does quite well there is a deeper language understanding which we
deeper language understanding which we
deeper language understanding which we can kind of solvent oi tasks but it’s
can kind of solvent oi tasks but it’s
can kind of solvent oi tasks but it’s kind of harder for real task so QA so
kind of harder for real task so QA so
kind of harder for real task so QA so cute question answering that requires
cute question answering that requires
cute question answering that requires more complicated reasoning such as if
more complicated reasoning such as if
more complicated reasoning such as if you have like a story here and you ask
you have like a story here and you ask
you have like a story here and you ask something question complicated like
something question complicated like
something question complicated like where is the football then yes like go
where is the football then yes like go
where is the football then yes like go back in the store and figure out where
back in the store and figure out where
back in the store and figure out where that kind of thing happened that thing’s
that kind of thing happened that thing’s
that kind of thing happened that thing’s kind of complicated people are very good
kind of complicated people are very good
kind of complicated people are very good at this task models can solve these
at this task models can solve these
at this task models can solve these simple ones quite well but they can’t
simple ones quite well but they can’t
simple ones quite well but they can’t real do real question answering yet
real do real question answering yet
real do real question answering yet which is unfortunate but something
which is unfortunate but something
which is unfortunate but something people really care about and we’re not
people really care about and we’re not
people really care about and we’re not quite there yet but I also love how
quite there yet but I also love how
quite there yet but I also love how awesome this problem sounds is that like
awesome this problem sounds is that like
awesome this problem sounds is that like our machines that we have like basically
our machines that we have like basically
our machines that we have like basically spent no work on only automatically
spent no work on only automatically
spent no work on only automatically learn a shallow level of reasoning like
learn a shallow level of reasoning like
learn a shallow level of reasoning like that’s like such a first real problem
that’s like such a first real problem
that’s like such a first real problem while there’s like language
while there’s like language
while there’s like language understanding there’s also visual
understanding there’s also visual
understanding there’s also visual understanding that is um kind of
understanding that is um kind of
understanding that is um kind of unsolved there is a there’s some awesome
unsolved there is a there’s some awesome
unsolved there is a there’s some awesome data set that involves images and
data set that involves images and
data set that involves images and questions and the goal is to find an
questions and the goal is to find an
questions and the goal is to find an answer and the models there are models
answer and the models there are models
answer and the models there are models that can do pretty okay at this task but
that can do pretty okay at this task but
that can do pretty okay at this task but still very not good and still like face
still very not good and still like face
still very not good and still like face significantly worse than people do so
significantly worse than people do so
significantly worse than people do so this kind of thing is something that we
this kind of thing is something that we
this kind of thing is something that we can’t do just yet while game playing is
can’t do just yet while game playing is
can’t do just yet while game playing is solved harder game playing is still an
solved harder game playing is still an
solved harder game playing is still an open problem and you might think harder
open problem and you might think harder
open problem and you might think harder game playing my five-year-old brother
game playing my five-year-old brother
game playing my five-year-old brother can play minecraft and he almost
can play minecraft and he almost
can play minecraft and he almost certainly can’t beat the world champion
certainly can’t beat the world champion
certainly can’t beat the world champion at go
at go
at go the harder in this case means stateful
the harder in this case means stateful
the harder in this case means stateful it turns out that humans are really good
it turns out that humans are really good
it turns out that humans are really good at remembering something while neural
at remembering something while neural
at remembering something while neural networks have some difficulty with it so
networks have some difficulty with it so
networks have some difficulty with it so the neural networks that people have
the neural networks that people have
the neural networks that people have been using for playing games have been
been using for playing games have been
been using for playing games have been completely stateless so when you have a
completely stateless so when you have a
completely stateless so when you have a partially observed world like Minecraft
partially observed world like Minecraft
partially observed world like Minecraft where you like only have one direction
where you like only have one direction
where you like only have one direction that you’re looking at if you like look
that you’re looking at if you like look
that you’re looking at if you like look to the left it forgets what was on the
to the left it forgets what was on the
to the left it forgets what was on the right and this is something that people
right and this is something that people
right and this is something that people are still working to solve it’s the same
are still working to solve it’s the same
are still working to solve it’s the same thing with doom and the work has been
thing with doom and the work has been
thing with doom and the work has been done a bit but it’s far from being a
done a bit but it’s far from being a
done a bit but it’s far from being a solved problem and I do believe it
solved problem and I do believe it
solved problem and I do believe it they’re still subhuman at this task
they’re still subhuman at this task
they’re still subhuman at this task there’s some really cool stuff with
there’s some really cool stuff with
there’s some really cool stuff with automatically discovering hierarchical
automatically discovering hierarchical
automatically discovering hierarchical structure so in language the
structure so in language the
structure so in language the hierarchical structures may be clear to
hierarchical structures may be clear to
hierarchical structures may be clear to us because we use language like
us because we use language like
us because we use language like character a word limit of character your
character a word limit of character your
character a word limit of character your senses are made of words paragraphs are
senses are made of words paragraphs are
senses are made of words paragraphs are made of senses this is like this
made of senses this is like this
made of senses this is like this semantic hierarchy which makes it easy
semantic hierarchy which makes it easy
semantic hierarchy which makes it easy to break down a problem into simpler
to break down a problem into simpler
to break down a problem into simpler problems but this is not the case in
problems but this is not the case in
problems but this is not the case in many domains and there have been people
many domains and there have been people
many domains and there have been people who’ve designed neural networks that can
who’ve designed neural networks that can
who’ve designed neural networks that can actually automatically discover this
actually automatically discover this
actually automatically discover this hierarchy and this could be really
hierarchy and this could be really
hierarchy and this could be really useful for tasks where we don’t know how
useful for tasks where we don’t know how
useful for tasks where we don’t know how to interpret that so something I’ve
to interpret that so something I’ve
to interpret that so something I’ve worked a bit on is genomics and we
worked a bit on is genomics and we
worked a bit on is genomics and we really don’t even know how to read
really don’t even know how to read
really don’t even know how to read genomics right but if a neural network
genomics right but if a neural network
genomics right but if a neural network and automatically break it up into like
and automatically break it up into like
and automatically break it up into like this part goes together with that part
this part goes together with that part
this part goes together with that part you know there’s connections between
you know there’s connections between
you know there’s connections between here and here this could actually help a
here and here this could actually help a
here and here this could actually help a whole lot with all sorts of different
whole lot with all sorts of different
whole lot with all sorts of different kinds of scientific tasks just purely
kinds of scientific tasks just purely
kinds of scientific tasks just purely from data this is when it gets a little
from data this is when it gets a little
from data this is when it gets a little bit computery but these are things that
bit computery but these are things that
bit computery but these are things that I’m excited as a computer scientist
I’m excited as a computer scientist
I’m excited as a computer scientist there’s this model called neural turing
there’s this model called neural turing
there’s this model called neural turing machines which learns to use like a big
machines which learns to use like a big
machines which learns to use like a big memory buffer which is very cool so you
memory buffer which is very cool so you
memory buffer which is very cool so you can actually see like how the network
can actually see like how the network
can actually see like how the network reads and writes and reads in order to
reads and writes and reads in order to
reads and writes and reads in order to copy an input there’s ways to implement
copy an input there’s ways to implement
copy an input there’s ways to implement differentiable data structures so things
differentiable data structures so things
differentiable data structures so things that you thought where instead of having
that you thought where instead of having
that you thought where instead of having like this black box of like arbitrage
like this black box of like arbitrage
like this black box of like arbitrage activations with matrix multiplies you
activations with matrix multiplies you
activations with matrix multiplies you can actually plug in a data structure
can actually plug in a data structure
can actually plug in a data structure into a network and now your network can
into a network and now your network can
into a network and now your network can learn to do things like pushing and
learn to do things like pushing and
learn to do things like pushing and popping to a stack you know getting from
popping to a stack you know getting from
popping to a stack you know getting from both ends of a queue and all of these
both ends of a queue and all of these
both ends of a queue and all of these kinds of things and this could
kinds of things and this could
kinds of things and this could potentially enable all sorts of very
potentially enable all sorts of very
potentially enable all sorts of very cool use cases
cool use cases
cool use cases as learning to program people have done
as learning to program people have done
as learning to program people have done some work where you can create models
some work where you can create models
some work where you can create models that not only can like have simple input
that not only can like have simple input
that not only can like have simple input output mappings but as an intermediate
output mappings but as an intermediate
output mappings but as an intermediate in this input output mapping they can
in this input output mapping they can
in this input output mapping they can learn subroutines and play with pointers
learn subroutines and play with pointers
learn subroutines and play with pointers and this actually makes them a very very
and this actually makes them a very very
and this actually makes them a very very general computing like it potentially
general computing like it potentially
general computing like it potentially could do all the problems we care about
could do all the problems we care about
could do all the problems we care about if you can learn subroutines and play
if you can learn subroutines and play
if you can learn subroutines and play with pointers it’s like that could learn
with pointers it’s like that could learn
with pointers it’s like that could learn abstraction automatically for you and by
abstraction automatically for you and by
abstraction automatically for you and by putting these things together people
putting these things together people
putting these things together people have been able to do things like
have been able to do things like
have been able to do things like learning to actually execute code so the
learning to actually execute code so the
learning to actually execute code so the idea would be given like code is a
idea would be given like code is a
idea would be given like code is a string and targets for that code like
string and targets for that code like
string and targets for that code like what the output is you can actually
what the output is you can actually
what the output is you can actually learn an interpreter for that language
learn an interpreter for that language
learn an interpreter for that language and this is really exciting to me as a
and this is really exciting to me as a
and this is really exciting to me as a programming language guy like maybe I
programming language guy like maybe I
programming language guy like maybe I could design a programming language not
could design a programming language not
could design a programming language not by implementing it but by just showing a
by implementing it but by just showing a
by implementing it but by just showing a whole bunch of examples and the
whole bunch of examples and the
whole bunch of examples and the implementation automatically happening
implementation automatically happening
implementation automatically happening for me or perhaps I could just write the
for me or perhaps I could just write the
for me or perhaps I could just write the test cases for the language and a neural
test cases for the language and a neural
test cases for the language and a neural network can generate an efficient
network can generate an efficient
network can generate an efficient language for me and something else that
language for me and something else that
language for me and something else that is related to all of this stuff is this
is related to all of this stuff is this
is related to all of this stuff is this is really early but I think a lot of
is really early but I think a lot of
is really early but I think a lot of people are really excited about that
people are really excited about that
people are really excited about that which are neural module networks we’re
which are neural module networks we’re
which are neural module networks we’re instead of having a single architecture
instead of having a single architecture
instead of having a single architecture that you play with you can have
that you play with you can have
that you play with you can have architectures that are you can have a
architectures that are you can have a
architectures that are you can have a library of components and that F for
library of components and that F for
library of components and that F for every single example you make a custom
every single example you make a custom
every single example you make a custom architecture and you output it so for
architecture and you output it so for
architecture and you output it so for example if you have the question
example if you have the question
example if you have the question answering task and you have an image and
answering task and you have an image and
answering task and you have an image and you have a question where is the dog
you have a question where is the dog
you have a question where is the dog instead of using an arbitrary network
instead of using an arbitrary network
instead of using an arbitrary network that takes in the question and the
that takes in the question and the
that takes in the question and the answer you actually convert this
answer you actually convert this
answer you actually convert this question into a custom neural network
question into a custom neural network
question into a custom neural network which combines a dog module with a where
which combines a dog module with a where
which combines a dog module with a where module and outputs the answer and this
module and outputs the answer and this
module and outputs the answer and this kind of thing is very early but really
kind of thing is very early but really
kind of thing is very early but really promising so that that’s it for the
promising so that that’s it for the
promising so that that’s it for the future of it I hopefully you guys are
future of it I hopefully you guys are
future of it I hopefully you guys are pumped to deep learning some problems
pumped to deep learning some problems
pumped to deep learning some problems there’s a lot of software to help you
there’s a lot of software to help you
there’s a lot of software to help you I’m not going to talk about that right
I’m not going to talk about that right
I’m not going to talk about that right now because there’s a lot of tutorials
now because there’s a lot of tutorials
now because there’s a lot of tutorials out there and I think the high level
out there and I think the high level
out there and I think the high level understanding is much more important my
understanding is much more important my
understanding is much more important my recommendation is that if you want to
recommendation is that if you want to
recommendation is that if you want to customize a lot of things
customize a lot of things
customize a lot of things theano intensive floor the best because
theano intensive floor the best because
theano intensive floor the best because it allows you to get this automatic
it allows you to get this automatic
it allows you to get this automatic differentiation that I was talking about
differentiation that I was talking about
differentiation that I was talking about then you never have to worry about the
then you never have to worry about the
then you never have to worry about the backwards fast basically and if you want
backwards fast basically and if you want
backwards fast basically and if you want to just use like the modules that I
to just use like the modules that I
to just use like the modules that I talked about as well as a few others
talked about as well as a few others
talked about as well as a few others Karis can solve that and you can do a
Karis can solve that and you can do a
Karis can solve that and you can do a lot of these things with Karis if you
lot of these things with Karis if you
lot of these things with Karis if you want to do this there’s a lot more
want to do this there’s a lot more
want to do this there’s a lot more learning to do and the devil’s really in
learning to do and the devil’s really in
learning to do and the devil’s really in the details so I was super high level
the details so I was super high level
the details so I was super high level with lots of the stuff but all like
with lots of the stuff but all like
with lots of the stuff but all like there’s so many little things that you
there’s so many little things that you
there’s so many little things that you need to know such as how do you take how
need to know such as how do you take how
need to know such as how do you take how you perform the updates in a way that
you perform the updates in a way that
you perform the updates in a way that doesn’t cause your parameters to grow
doesn’t cause your parameters to grow
doesn’t cause your parameters to grow too large how do you initialize the
too large how do you initialize the
too large how do you initialize the parameter is to not be a trivial
parameter is to not be a trivial
parameter is to not be a trivial function how do you not over fit your
function how do you not over fit your
function how do you not over fit your training set so there’s a lot of
training set so there’s a lot of
training set so there’s a lot of resources out there my favorite one is
resources out there my favorite one is
resources out there my favorite one is this stanford class by Andre Carpathia
this stanford class by Andre Carpathia
this stanford class by Andre Carpathia cs2 31 n it is specifically incontinence
cs2 31 n it is specifically incontinence
cs2 31 n it is specifically incontinence but it’s constantly updated with
but it’s constantly updated with
but it’s constantly updated with state-of-the-art stuff and it’s
state-of-the-art stuff and it’s
state-of-the-art stuff and it’s generally very high quality so I think
generally very high quality so I think
generally very high quality so I think it’s very approachable for anyone like
it’s very approachable for anyone like
it’s very approachable for anyone like beginner to very advanced and if you
beginner to very advanced and if you
beginner to very advanced and if you want to do this you probably need GPU or
want to do this you probably need GPU or
want to do this you probably need GPU or 50 I think that’s it for time so sorry I
50 I think that’s it for time so sorry I
50 I think that’s it for time so sorry I was rushing at the end with any
was rushing at the end with any
was rushing at the end with any questions also I have these slides which
questions also I have these slides which
questions also I have these slides which is slide should I leave it on some
is slide should I leave it on some
is slide should I leave it on some questions here go so one is how can we
questions here go so one is how can we
questions here go so one is how can we avoid that autonomous cars pick up the
avoid that autonomous cars pick up the
avoid that autonomous cars pick up the human bad habits how can we avoid that
human bad habits how can we avoid that
human bad habits how can we avoid that autonomous cars pick up human bad habits
autonomous cars pick up human bad habits
autonomous cars pick up human bad habits that is a very interesting question it’s
that is a very interesting question it’s
that is a very interesting question it’s very dependent on how the cars are
very dependent on how the cars are
very dependent on how the cars are trained so if you train a car to copy
trained so if you train a car to copy
trained so if you train a car to copy the UN bad habits so if you train a car
the UN bad habits so if you train a car
the UN bad habits so if you train a car to copy humans which is by far the
to copy humans which is by far the
to copy humans which is by far the easiest thing to do it’s not the most
easiest thing to do it’s not the most
easiest thing to do it’s not the most correct thing to do because the most
correct thing to do because the most
correct thing to do because the most correct thing to do would be to learn
correct thing to do would be to learn
correct thing to do would be to learn how to drive optimally from scratch that
how to drive optimally from scratch that
how to drive optimally from scratch that unfortunately involves trial and error
unfortunately involves trial and error
unfortunately involves trial and error but you probably don’t want that in
but you probably don’t want that in
but you probably don’t want that in self-driving cars so we can skip that or
self-driving cars so we can skip that or
self-driving cars so we can skip that or hard coded rules so what content happen
hard coded rules so what content happen
hard coded rules so what content happen is if you’re training it to learn from
is if you’re training it to learn from
is if you’re training it to learn from humans it’ll mimic those humans but the
humans it’ll mimic those humans but the
humans it’ll mimic those humans but the idea is that if humans make mistakes you
idea is that if humans make mistakes you
idea is that if humans make mistakes you like let’s hope let’s say you want to
like let’s hope let’s say you want to
like let’s hope let’s say you want to make mistakes and let’s say humans don’t
make mistakes and let’s say humans don’t
make mistakes and let’s say humans don’t make consistent mistakes if they don’t
make consistent mistakes if they don’t
make consistent mistakes if they don’t make consistent mistakes and different
make consistent mistakes and different
make consistent mistakes and different humans make different kinds of mistakes
humans make different kinds of mistakes
humans make different kinds of mistakes are the same human-like only makes it
are the same human-like only makes it
are the same human-like only makes it makes mistakes sometimes and you have a
makes mistakes sometimes and you have a
makes mistakes sometimes and you have a neural network that neural network can
neural network that neural network can
neural network that neural network can predict the expectation of what the
predict the expectation of what the
predict the expectation of what the human can do rather than the worst-case
human can do rather than the worst-case
human can do rather than the worst-case scenario so if you’re kind you can think
scenario so if you’re kind you can think
scenario so if you’re kind you can think of humans as an in Samba in this case
of humans as an in Samba in this case
of humans as an in Samba in this case that if you’re predicting what the
that if you’re predicting what the
that if you’re predicting what the average of a bunch of humans can do you
average of a bunch of humans can do you
average of a bunch of humans can do you can drive better than a human can but if
can drive better than a human can but if
can drive better than a human can but if humans are consistently make if humans
humans are consistently make if humans
humans are consistently make if humans consistently make mistakes then there’s
consistently make mistakes then there’s
consistently make mistakes then there’s nothing you can do about that other than
nothing you can do about that other than
nothing you can do about that other than get more data I think we have time for
get more data I think we have time for
get more data I think we have time for one more what do you think about chat
one more what do you think about chat
one more what do you think about chat BOTS is it possible to build only with
BOTS is it possible to build only with
BOTS is it possible to build only with deep learning yes there’s actually many
deep learning yes there’s actually many
deep learning yes there’s actually many startups that are doing this right now
startups that are doing this right now
startups that are doing this right now so this seems to be the next wave in
so this seems to be the next wave in
so this seems to be the next wave in startups are like the hot thing right
startups are like the hot thing right
startups are like the hot thing right now where people are trying to use chat
now where people are trying to use chat
now where people are trying to use chat BOTS to do all sorts of things for like
BOTS to do all sorts of things for like
BOTS to do all sorts of things for like very specific domains it has some really
very specific domains it has some really
very specific domains it has some really nice properties from a business point of
nice properties from a business point of
nice properties from a business point of view because your goal is to replace
view because your goal is to replace
view because your goal is to replace humans who chap so it’s very easy to
humans who chap so it’s very easy to
humans who chap so it’s very easy to like replace them with an algorithm
like replace them with an algorithm
like replace them with an algorithm because the humans when you have if you
because the humans when you have if you
because the humans when you have if you have a bunch of them they generate a
have a bunch of them they generate a
have a bunch of them they generate a bunch of data so it is very plausible
bunch of data so it is very plausible
bunch of data so it is very plausible it’s still hard for chat BOTS it’s kind
it’s still hard for chat BOTS it’s kind
it’s still hard for chat BOTS it’s kind of like the game-playing problem where
of like the game-playing problem where
of like the game-playing problem where it’s hard for chat BOTS to have a memory
it’s hard for chat BOTS to have a memory
it’s hard for chat BOTS to have a memory of what you said so if you talk about
of what you said so if you talk about
of what you said so if you talk about like Oh try you know opening this menu
like Oh try you know opening this menu
like Oh try you know opening this menu and you know go here and here and here
and you know go here and here and here
and you know go here and here and here and you could like might have five
and you could like might have five
and you could like might have five sentences later the chat BOTS might say
sentences later the chat BOTS might say
sentences later the chat BOTS might say the same thing because the neuron lyrics
the same thing because the neuron lyrics
the same thing because the neuron lyrics i still have memory issues cool I think
i still have memory issues cool I think
i still have memory issues cool I think that’s it so please remember to vote and
that’s it so please remember to vote and
that’s it so please remember to vote and let’s yeah give a big applause to do
Be First to Comment