00:00:11cool thank you I’m do go as a bit of
00:00:15background I have a very math and
00:00:17computer background which is very good
00:00:19for deep learning list of achievements
00:00:22on why you should listen to me is that I
00:00:24currently work at a super cool company
00:00:26that does uses deep learning to make
00:00:28faster and more accurate medical
00:00:29diagnosis and in the past lives I’ve one
00:00:33whole lot of like international math
00:00:34competitions and some programming
00:00:36competitions as well and today I’m going
00:00:38to chat about what deep learning is and
00:00:40what it can do for you feel free to ask
00:00:42questions at any time if I can ask that
00:00:44right they can ask that right Simon cool
00:00:47so yeah feel free to ask questions any
00:00:48time just seal them out especially if
00:00:51you think I’m lying to you so what is
00:00:55deep learning gonna start with a
00:00:57disclaimer deep learning is actually
00:00:59pretty complicated it’s hard to be very
00:01:02general about everything and be correct
00:01:03so when in doubt I’m going to favor
00:01:05generality so if you’re familiar with
00:01:07the appointing already it’s going to
00:01:08sound like I’m lying a lot but in
00:01:10reality like this is just to like kind
00:01:12of give the hot really high level of it
00:01:14and also whenever possible going to
00:01:16favor some of the shortcuts that might
00:01:18not be a hundred percent correct but
00:01:19should give the correct mental model of
00:01:21how these things work but if you have
00:01:23any questions feel free to ask so from
00:01:26super high level there’s a very lace a
00:01:29lot of like different levels of
00:01:31hierarchy here in the ecosystem there’s
00:01:34artificial intelligence which is a
00:01:35superset of like everything an example
00:01:38would be IBM Watson where lots of hand
00:01:40coded rules and uses extremely large
00:01:42amounts of expert manpower built to do a
00:01:45specific task there’s machine learning
00:01:48which is subset of that an example of
00:01:50this would be like Google ad click
00:01:51prediction and how you do this is rather
00:01:54than using tons and tons of hard coded
00:01:56rules you start using more examples to
00:01:58figure out how to combine some hand
00:02:00coded statistics to predict the
00:02:01probability of for example an ad click
00:02:03at a slightly deeper level you have
00:02:06representation learning which is
00:02:08sometimes seen as one layer deep
00:02:11learning so sometimes referred like
00:02:14these levels are called shallow learning
00:02:15if you’re trying to start a fight
00:02:17and an example of this would be netflix
00:02:21movie recommendation where the
00:02:23statistics of what you even know about
00:02:25each movie is learned from data but
00:02:27you’re still learning a simple
00:02:28combination of how these features go
00:02:29together and after a few levels you get
00:02:33into deep learning so this would be an
00:02:34example of figuring out diseases from
00:02:36images where instead of you know having
00:02:39a layer of manual statistics that are
00:02:42learned and then combining together you
00:02:43might learn all of these statistics at
00:02:45the same time in tens hundreds or even
00:02:47thousands of steps which is what some
00:02:49people use nowadays this is probably not
00:02:53a common view of what deep learning is
00:02:55but I think the easiest view of how to
00:02:58see it is deep line exists an interface
00:03:00and this interface has roughly two
00:03:03methods as the first method you have a
00:03:06forward pass and this is definitely the
00:03:09easy part given arbitrary input make
00:03:11arbitrary output anyone can do this part
00:03:13this is really easy the the trick that
00:03:16makes it work is the backwards pass so
00:03:19given a desired change in the output you
00:03:22want to be able to transform this into a
00:03:24desired change in the input and once you
00:03:27have these you can make arbitrarily
00:03:29complex things by chaining thing up
00:03:31changing them up into a directed acyclic
00:03:32graph and if this sounds too good to be
00:03:34true especially with how we design the
00:03:36forward pass because if you just say
00:03:38arbitrary input an arbitrary output of
00:03:40course you can do anything you want but
00:03:42the hard part is how you define that
00:03:44backwards pass because as you make your
00:03:47forward pass more and more complicated
00:03:49like if you have like some really crazy
00:03:50function it becomes hard to define how
00:03:53to map the inputs back into outputs so
00:03:56by keeping these things simple and
00:03:58combining them together it gives us this
00:04:00almost composable language of modules
00:04:02that allow us to do the things we want
00:04:04to do so once you have this interface
00:04:07you just can build up from this once you
00:04:10once you have that you could have a
00:04:12bunch of these modules that satisfy this
00:04:14interface as a side note a bunch of
00:04:19these modules will be parametric which
00:04:21means that they have parameters which
00:04:24roughly means that they’re stateful and
00:04:26they’re stateful means that once you
00:04:30have the state
00:04:31state changes and it’s this change in
00:04:33state that allows you to change this
00:04:34function from something that you just
00:04:36cobbled together to something that’s it
00:04:38gets closer and closer to what you want
00:04:39to do and once you have a frame rail
00:04:43language of what you want to do now you
00:04:45can start doing the tasks that you care
00:04:46about you and deepen you always define a
00:04:49loss or a cost depending on how you want
00:04:51to define this and this is something you
00:04:52want to minimize for reasons that I’d
00:04:55happily explain this has always be a
00:04:57scalar so it can’t be several costs the
00:05:00same time you have to squash it down
00:05:02into a single thing that you care about
00:05:04and once you squish this number like
00:05:06everything care about in the world into
00:05:08a single number now you can start using
00:05:09deep learning to optimize this you
00:05:12create an architecture which is the
00:05:14function that you want to do and this
00:05:16would be how you compose together these
00:05:18modules that I talked about so the way
00:05:21you connect them together changes the
00:05:22function that you have and the kind of
00:05:24representation power it has and that
00:05:26becomes the hard part after that you
00:05:28initialize the parameters and you train
00:05:30this architecture by repeatedly updating
00:05:33the parameters to minimize the cost so
00:05:35you go forward through the network to
00:05:36get the things you care about and you go
00:05:39backwards through the network to change
00:05:40the parameters to be slightly better for
00:05:42your costs and you repeat these many
00:05:43times until you get a function that
00:05:44you’re really really happy with and
00:05:46solves whatever problem you want and at
00:05:48the end you just use that function just
00:05:50the forward pass how to implement the
00:05:53backwards pass is in general we almost
00:05:57always use the chain rule this is really
00:05:59nice because it makes implementing the
00:06:01backwards pass easy how this works is if
00:06:04you have your output as sum function of
00:06:07some X and you have the partial
00:06:09derivative of your output dld f you can
00:06:12get dl DX by simply multiplying the
00:06:15partial derivative of DF DX and the nice
00:06:18part about this is that dl DF is gotten
00:06:21from the rest of your network and DF DX
00:06:24is gotten just from your module so this
00:06:26allows you to chain these things
00:06:28together in a way that only requires
00:06:29local information in order to get this
00:06:32backward pass it’s very nice there’s
00:06:35theoretical reasons of why this is a
00:06:36good way to do this and perhaps the best
00:06:40part about this is some frameworks make
00:06:42us completely automatic by the
00:06:44finding a forward pass using automatic
00:06:46differentiation you can figure out how
00:06:48to make a backward pass automatically so
00:06:50it becomes basically as easy as defining
00:06:52arbitrary functions as so you actually
00:06:57do get this benefit of just define
00:07:00arbitrary things return arbitrary things
00:07:02as long as all the operations you do are
00:07:04differentiable you can just make it work
00:07:06like magic and optimize it and this is
00:07:08literally how people do this in practice
00:07:11updating the parameters these are just
00:07:14minor details to like get an
00:07:15understanding of how this works is that
00:07:17once you have your existing parameters
00:07:20you get your gradient and you take a
00:07:22step in the opposite direction of the
00:07:24gradient and the partial derivatives
00:07:26tells us how to change the parameters to
00:07:28increase or decrease the cost that we
00:07:30care about an important word to note to
00:07:34know though is and that people always
00:07:36use I think kind of makes it more
00:07:38complicated it’s a big word is back
00:07:40propagation or back prop for short this
00:07:44is has a longer name called reverse mode
00:07:47automatic differentiation which sounds
00:07:51pretty complicated but this is just the
00:07:52chain rule plus dynamic programming I
00:07:54assume that I just talked about change
00:07:56were like some people are familiar with
00:07:57dynamic programming but this is just
00:07:59cashing and the idea would be when you
00:08:02have a computation graph this is a very
00:08:04simple computation graph y equals C
00:08:06times the C equals a plus B D equals B
00:08:08plus 1 the idea would be um you traverse
00:08:12the graph from the top to the bottom and
00:08:15by doing it from the top to the bottom
00:08:16instead of the bottom to the top you can
00:08:18cash the intermediate so that I used
00:08:20many times in the graph and by cashing
00:08:22these intermediates you get something
00:08:23that’s much more efficient than if you
00:08:25were going to do a naive solution and
00:08:26this allows you to get gradients that
00:08:29are computable in linear time in the
00:08:32size of your graph so you basically
00:08:33evaluate each node once and this is a
00:08:35really nice property this makes it all
00:08:38really efficient and that’s basically it
00:08:40for the basics from a high level deep
00:08:43learning is just composing optimized
00:08:45abul subcomponents optimize the Ville
00:08:47almost always means differentiable
00:08:48differentiable means that you can do
00:08:50backdrop backdrop is just the chain rule
00:08:52in dynamic programming when once you get
00:08:56to practical deep learning normally
00:08:57you have to combine this with gradient
00:08:59descent software and a data set but you
00:09:01care about and the space of software is
00:09:05there’s a very rich space of software
00:09:07that will talk a little bit about in the
00:09:09future but it this these things are
00:09:11solved for you so you can do deep
00:09:12learning without even knowing how to
00:09:14calculate the gradient yourself so while
00:09:18we can do arbitrarily complicated things
00:09:20there are a few standard modules that
00:09:22are the main workhorse of deep learning
00:09:23today and the goal of this section is
00:09:26going to be to get a high level
00:09:28understanding of each since all of them
00:09:29can be very incredibly nuanced but these
00:09:32standard modules will cover almost all
00:09:34of what’s happening in papers the
00:09:37simplest of them is perhaps the simplest
00:09:40is just matrix multiplication it has
00:09:42many names the fully connected layer
00:09:44sometimes shortened to FC sometimes
00:09:47called dense because you have lots of
00:09:48connections linear layer because it’s a
00:09:50linear transformation or a fine because
00:09:52sometimes there’s a bias and the matrix
00:09:55multiplication is basically every time
00:09:56you have a neural network diagram all of
00:09:58these arrows correspond to the matrix
00:10:00multiplication so when you have a
00:10:02diagram that looks complicated that’s
00:10:03from that’s from this kind of thing and
00:10:05you can interpret this as a weight from
00:10:07every input to every output so if you
00:10:09have em inputs and you have n outputs
00:10:12you have M by n way to the transform
00:10:14your inputs to outputs and its
00:10:16implementation is literally a matrix
00:10:18multiplication and W in this case it’s
00:10:21generally a parameter which means you
00:10:23learn the connections from inputs to
00:10:25outputs this on its own is not powerful
00:10:29enough so you need at least one more
00:10:31thing which is non linearity the
00:10:35original non-linearity is called a
00:10:36sigmoid it’s just this function it has
00:10:39the nice property that it Maps reals
00:10:42into the the space 0 1 and it can be
00:10:46interpreted as a probability but that’s
00:10:49not as important that’s just being
00:10:50nonlinear and the reason the
00:10:52non-linearity is important is if you
00:10:54have this kind of like neural network
00:10:56when you stack up the layers
00:10:57back-to-back if you had no non-linearity
00:11:00in the middle this would just be two
00:11:02matrix multiplies back-to-back and what
00:11:04would happen is you could just combine
00:11:05this into a single matrix multiply so if
00:11:07you have that 100 layer purely linear
00:11:10network of just matrix multiplications
00:11:11while this thing is pretty complicated
00:11:13and you do all the work of a real neural
00:11:15network you could actually flatten it
00:11:18into a single weight matrix because of a
00:11:20linear composition of linearity so this
00:11:23was the original one people like it
00:11:25because it’s very similar to what people
00:11:27use before they really got how machine
00:11:29learning words which was just binary
00:11:32threshold injustice fired back in the
00:11:35day and the cool part is with just those
00:11:39two units you know how to make a neural
00:11:41network you can just simply do you get
00:11:43your input you apply the matrix multiply
00:11:45you apply a sigmoid you apply another
00:11:47matrix multiply and you have one and
00:11:50these are called multi-layer perceptrons
00:11:52when you only have matrix multiplies and
00:11:54nonlinearities and the cool part is that
00:11:57there’s a serum on this that this simple
00:11:59architecture like literally three
00:12:01functions can solve can approximate
00:12:04arbitrary functions which means it can
00:12:06solve any problem that you care about
00:12:08there’s a cool theorem on this the idea
00:12:12is that if you make the middle big
00:12:14enough you can calculate basically any
00:12:16function the downside is that just
00:12:19because it can it doesn’t mean it will
00:12:20and a single layer multi-layer
00:12:23perceptron often causes more problems
00:12:25than it solves so this is why there was
00:12:27an AI winter in the 90s just because
00:12:30these things are your kind of terrible
00:12:33but people have gotten a lot better it
00:12:35it’s and that now now neural networks
00:12:36are cool and as a disclaimer these are
00:12:40these are neural networks when you have
00:12:43so this is would be a multi-layer
00:12:45perceptron these are neural networks
00:12:47everything I’m talking about today is
00:12:48still a neural network but these are
00:12:51specifically when you talk about the
00:12:53multi-layer perceptron that is what this
00:12:54is so since then people have made better
00:12:59nonlinearities this is probably majority
00:13:03of the improvement between 1990 and 2012
00:13:06unfortunately which is you have like a
00:13:09kind of smarter non-linearity so instead
00:13:11of taking this weird squiggly function
00:13:13you just do a threshold so anything
00:13:16that’s negative you just turn it into 0
00:13:19this is actually the most popular
00:13:21non-linearity nowadays it does
00:13:23incredibly well
00:13:25some really nice optimization properties
00:13:27in particular when you have zero like is
00:13:32this not only is very linear so it works
00:13:34very well with a chain rule and this
00:13:36thing is used almost everywhere nowadays
00:13:38especially in the middle of a neural
00:13:40network there is a softmax which you can
00:13:45think of as converting a bunch of
00:13:47numbers into a discrete probability
00:13:48distribution so the math of it is p
00:13:53equals you exponentiate your input and
00:13:56then you divide it by the sum of the
00:13:57inputs you can think of the explanation
00:13:59is turning it all the numbers into
00:14:01positive and the dividing by the sum is
00:14:03a normalization term there’s some very
00:14:04nice properties about this and it’s used
00:14:07as the final layer for classification
00:14:08problems and it’s used in almost every
00:14:11neural network cool that was the easy
00:14:14part this gets complicated I like feel
00:14:20free to ask questions during this I
00:14:22normally explain this with a whiteboard
00:14:23and it normally is complicated even with
00:14:25a whiteboard but i’ll try to go through
00:14:28this so a convolution is a the main
00:14:31workhorse for deep learning on images
00:14:34and deep learning and images is
00:14:35basically it’s kind of where this
00:14:37revolution started so it’s very very
00:14:39important it’s probably the place where
00:14:41deep learning is the most advanced so
00:14:42it’s a very important primitive and I
00:14:44think the very cool primitive to
00:14:45understand because you really realize
00:14:48like how beautiful the framework is when
00:14:49you see like wow this thing sounds
00:14:51pretty complicated but it you can just
00:14:53plug it in and you’re doing that need to
00:14:54know how it works when someone has coded
00:14:56up for you which is what I do so this is
00:15:01a linear operation for 2d images so once
00:15:03you have a multi-layer perceptron you
00:15:05have a mapping from every input to every
00:15:07output but in the case of images your
00:15:09inputs are structured so you have like
00:15:11this spatial relationship between your
00:15:13inputs and if you have a mapping from
00:15:15every input every output you kind of
00:15:16throw away the spatial relationship so
00:15:19the idea would be what if rather than
00:15:21having a connection from every input to
00:15:23every output what if every output the
00:15:25output look like an image as well and
00:15:27every output was only locally connected
00:15:29to the things that corresponds to so
00:15:31that is in sight number one so local
00:15:33connections and inside number two is
00:15:36every output is
00:15:38a local function of its input what if
00:15:41instead of having every output be its
00:15:43own function which would be the general
00:15:45case what if every output was the same
00:15:46function of its input so what this then
00:15:49becomes is equivalent to like a
00:15:51well-known function in computer vision
00:15:52which is a convolution which is you have
00:15:55a kernel which is you can think of it
00:15:58like a local weight matrix so it’s
00:16:00represented yeah oh cool my mouse is
00:16:04here so it’s often represented as the
00:16:06square in an image like thing which
00:16:09means that they’re capturing that local
00:16:11input you just do a matrix multiply
00:16:13between all of the weights of the kernel
00:16:15which would be something like this you
00:16:18do that multiply for everything the
00:16:20local region you sum up the results so
00:16:22this is just a dot product and then you
00:16:24do that at every single location at the
00:16:26input so it’s kind of like tiling your
00:16:28input with the same function or it can
00:16:30be interpreted extracting the same
00:16:32features at every location which is the
00:16:34more common way to interpret it this is
00:16:37it’s very powerful it’s very parameter
00:16:41efficient because you have a lot of
00:16:42weight sharing between the parameters
00:16:44and you can end up having much larger
00:16:48outputs then you can have with a normal
00:16:50matrix multiplication and you also don’t
00:16:52lose spatial information which is a very
00:16:54important structure of images so these
00:16:57are some really nice properties and as a
00:17:00side effect you might think that this
00:17:01thing is really complicated how do I
00:17:04take a gradient of it because I maybe
00:17:06the whole thing is kind of complicated
00:17:08but this is actually equivalent to a
00:17:11very constrained matrix multiplication
00:17:13so if you take your input image and you
00:17:16unroll it because with a matrix moulton
00:17:18you lose that spatial structure and you
00:17:20unroll your input you basically have a
00:17:22few connection like every input every so
00:17:25every output is connected to maybe like
00:17:27nine of your inputs and that just
00:17:30becomes equivalent to the really like
00:17:32all of the diagram with lots of arrows
00:17:33but most of the arrows being zero or
00:17:35missing so this is still completely
00:17:37differentiable and still fits very
00:17:40nicely into this framework that you can
00:17:41plug in with all the other
00:17:42nonlinearities cool it’s going to get a
00:17:47little bit harder
00:17:49another very fundamental building block
00:17:52is called a recurrent neural network I
00:17:54don’t know why the building block is
00:17:56called a network when everything else is
00:17:57called layer but that’s just kind of
00:18:01convention and this is solving a a
00:18:04problem that is basically has not been
00:18:06solved in machine learning before which
00:18:08is we want functions to take to take in
00:18:11variable size input but they can only
00:18:13take in fixed size input and this is
00:18:15becomes a problem when you’re a function
00:18:17is parametric like a fully connected
00:18:20layer is because if you want a
00:18:21connection from every input to every
00:18:22output but your input size changes that
00:18:25means you’re the number of weights you
00:18:26have changes and that means that if you
00:18:28get a longer example at the inference
00:18:31time you now don’t know what to do with
00:18:32it and this also might be inefficient
00:18:34because you might have like really
00:18:36really big a really large number of
00:18:38inputs and you might not need all of the
00:18:40power of having like every connection
00:18:41there so a recurrent neural network is a
00:18:44way to solve this problem and the
00:18:47solution to this problem is recursion so
00:18:49what you have is an initial state which
00:18:51would just be let’s just call it h in
00:18:53this example and you have a bunch of
00:18:55these inputs X and there’s a variable
00:18:57number of them so you don’t really know
00:18:59what like this capital T is and you can
00:19:02make a function that takes in a fixed
00:19:03size and because each X is fixed size
00:19:06you can make that function have taken
00:19:07both h + X and now you can recurse
00:19:10through this list by saying h of t
00:19:13equals the function of the previous
00:19:15state sorry the new state is a function
00:19:16of the previous state and the current
00:19:18input and then you just return the final
00:19:19one and what this allows you to do is it
00:19:22allows you to with a fixed size input
00:19:25you can have it operate on sorry with a
00:19:28fixed function that takes fixed size
00:19:29input you can now turn it into a
00:19:31function that takes in a variable sized
00:19:32input by applying that function of
00:19:34variable number of times this is not the
00:19:37this is like a pretty obvious insight
00:19:40and you could do that with any kind of
00:19:42machine learning algorithm you could
00:19:43like apply a random forest like an
00:19:45arbitrary number of times but the cool
00:19:47part about this is that because this
00:19:49function is differentiable this
00:19:50recursive function is also
00:19:52differentiable so you can take the
00:19:53derivatives of each of the inputs you
00:19:55can take even take the derivative of the
00:19:57weight matrices you use at each step you
00:19:59can use that f reach up and you get a
00:20:02diagram
00:20:03it looks kind of like this now you can
00:20:06think of it as applying an FC layer for
00:20:08each input that takes the input and the
00:20:09state so far and this diagram might not
00:20:13be very clear but there are many
00:20:15different diagrams for RN ends and
00:20:16they’re all equally confusing if you’re
00:20:18unfamiliar with them so this is kind of
00:20:21the one on the left is my favorite one
00:20:23because you can kind of think of it as a
00:20:25stateful function except you the state
00:20:29only lasts for the duration of your
00:20:31input but the unrolled version is the
00:20:33version you use if you’re taking
00:20:35gradients so this is equivalent just
00:20:37passing the gradients through this very
00:20:39long graph a last complicated slide long
00:20:43short term memory units this puts me in
00:20:46a really hard position because I can’t
00:20:48not talk about them because they are so
00:20:50big it is but they’re also extremely
00:20:51complicated and there they take more
00:20:55building blocks and I’ve UNIX even
00:20:56explain but there is this great blog
00:20:58post I think slides will be published so
00:21:00you don’t have to worry about that this
00:21:02great great blog post tries to explain
00:21:04it but I’m going to try to give like a
00:21:05high-level intuition of them just so
00:21:07like even higher than what I’ve said so
00:21:09far um just so that you can kind of
00:21:11understand where where it’s coming from
00:21:16when I talk about these things being
00:21:17used and the idea would be its kind of
00:21:21like an RNN and in practice no one uses
00:21:24the RNN that I’ve just described it’s a
00:21:26very simple function and there’s much
00:21:28more complicated versions it’s an RN n
00:21:30where the function is just really
00:21:32complicated so this entire thing here is
00:21:35a representation of that function I’m
00:21:38not going to get into the details of it
00:21:40but it involves a lot of different
00:21:44mechanisms in order to make optimization
00:21:46easier and the idea is that if you’d
00:21:50apply if you designed this function well
00:21:53the function is applied at each time
00:21:54step it can make the problem much much
00:21:56easier to optimize and you can have like
00:21:57a much much more powerful function and
00:21:59the key is that by having a path which
00:22:02is relatively simple so this is what
00:22:04represents with the top path where these
00:22:06variable operations being done to it it
00:22:08makes it easier to stack these things
00:22:10want back to back and that makes easier
00:22:13to learn long-term relationships between
00:22:15the functions
00:22:18whoo okay that was that was the
00:22:22complicated part you now know
00:22:24ninety-five percent of the building
00:22:25blocks that everyone uses for
00:22:27state-of-the-art deep learning with just
00:22:29these billing box you could probably do
00:22:30new state-of-the-art things on new
00:22:32domains so congratulations you ready for
00:22:35the next part um so in this part I want
00:22:40to talk about what D planning is really
00:22:41good at and what you should use it on
00:22:43the answer is a whole lot so I’m going
00:22:46to cover just the rough themes of where
00:22:47deep learning really shines but there’s
00:22:49just much much more to it which i think
00:22:51is part of the awesomeness because it’s
00:22:53all falls under this extremely simple
00:22:55framework that I’ve just described I
00:22:57don’t think that you could like describe
00:22:58any framework as simple as what I’ve
00:23:00just done and have it solved this many
00:23:02complicated unsolved tasks before 2012
00:23:06basically so convolutional neural
00:23:09networks this is a general architecture
00:23:11commonly referred to as CNN’s this
00:23:14actually means a network in this case
00:23:16and not just a layer the idea is that
00:23:19you take your image you apply
00:23:20convolution you apply your value your
00:23:24rectified linear unit you’re probably
00:23:25convolution you apply lu and you
00:23:27basically repeat this convo you until
00:23:29you solve all the problems in computer
00:23:31vision that isn’t quite true since at
00:23:34the end you need to tack on some sort of
00:23:36outfit layer and the other player
00:23:37depends and what kind of input you’re
00:23:39trying to solve the cannot like a really
00:23:41old school task is that you its face
00:23:46recognition trying to determine like
00:23:48whose face this is and this is a really
00:23:51cool task because makes the
00:23:52representations very visual and you can
00:23:54see how the network learns over time so
00:23:57at the first layer you when you start
00:23:59with the pixels at the first layer your
00:24:01filters tend to just match for edges and
00:24:05very simple things so convolutions can
00:24:07match edges and other very simple shapes
00:24:08and as you get deeper and deeper into
00:24:10network you learn more complicated
00:24:11functions of the input so after that you
00:24:14can start combining edges into corners
00:24:16or blobs so this is still extremely
00:24:19simple but after you get to another
00:24:21layer somehow like combining two corners
00:24:23the right way becomes kind of like an
00:24:24eye like shape or if you have like two
00:24:26corners in a blob that becomes more I
00:24:28like and you can build up from
00:24:30edges to corners to object parts and
00:24:33eventually into the objects you care
00:24:34about and as you get really really deep
00:24:36networks you actually have intermediates
00:24:38that are extremely semantic objects for
00:24:41example people have made a lot of tools
00:24:43for visualization of neural networks
00:24:44where they visualize what these in with
00:24:48the neural networks learn and you have
00:24:51for example if you have a neural network
00:24:52that doesn’t learn to classify books at
00:24:54all but lanes classified bookshelves
00:24:55some of the intermediate features
00:24:57actually become book classifiers which
00:24:59is really interesting like it can learn
00:25:01or I like a hierarchical representation
00:25:04of your input space such that these are
00:25:07useful things to combine together in
00:25:10order to make a robust classifier and by
00:25:12combining so maybe if you combine like
00:25:15three books together as well as a square
00:25:16this becomes a bookshelf so these are
00:25:18kind of like what the local operations
00:25:20do with each neural network and the
00:25:23beauty of it is that it’s all learned
00:25:24automatically for you don’t need to
00:25:25program like I have a book shelf
00:25:27bookshelf normally have books they have
00:25:29books sorry they have like square stuff
00:25:31maybe they’re often decide flowers this
00:25:33all can like happen in a data set
00:25:35automatically for you and these
00:25:37convolutional neural networks are
00:25:39absolutely amazing they just when I
00:25:41wasn’t joking when they save basically
00:25:43all of computer vision right now it all
00:25:46started with imagenet this was in 2012
00:25:50this is when deep learning actually the
00:25:52entire hype train started where you had
00:25:56traditional machine learning solving
00:25:58this very hard very large computer
00:26:00vision data set and it was kind of
00:26:01plateauing over the years and all of a
00:26:03sudden deep learning comes in and it
00:26:05just blows everything away and ever
00:26:08since then everything has been
00:26:10everything in computer vision has been
00:26:12deep learning like nothing can even
00:26:14compare and recently we’ve been even
00:26:16being able to get superhuman results
00:26:18which is pretty impressive because
00:26:21humans are pretty good at seeing things
00:26:23it’s kind of what we’ve evolved to do
00:26:26and the same architectures can do all
00:26:28sorts of really interesting structured
00:26:31tasks so using almost the same
00:26:33architecture you can use a concept to
00:26:36determine you know like you can break up
00:26:39your input space into a what’s called a
00:26:40semantic segmentation of like all of the
00:26:43relevant parts that you
00:26:44have and using basically the same
00:26:46architecture as well you can do crazy
00:26:49things like super resolution where you
00:26:51takin like a low-resolution image and
00:26:52make it you can fill in the details
00:26:54which is pretty is that’s a pretty not
00:26:59only is it incredible even though it
00:27:00sounds pretty easy it’s incredible that
00:27:04like that you can use the same
00:27:05architecture that takes an image and
00:27:08tells you whether or not there’s a dog
00:27:09in it to take an image and return like a
00:27:12new higher resolution image and this is
00:27:14basically the same library the same
00:27:15components it’s just very very
00:27:19composable and that’s really good
00:27:20awesome you can also use this to solve
00:27:23really hard medical tasks tasks that
00:27:25people could not solve before here we’re
00:27:27detecting classifying lung cancer in CT
00:27:30scans these are the kinds of things that
00:27:32I like to work on and it’s not only
00:27:36limited division there’s been a lot of
00:27:38work in language understanding so is
00:27:41something that deep learning is really
00:27:42good at this language modeling roughly
00:27:45this means how probable is a how much
00:27:50sense this statement-making a certain
00:27:52language so it might have to do with a
00:27:54question response how are you I’m fine
00:27:56it might have other things such as what
00:27:59would be a weird thing my laptop is
00:28:02squishy might be a very improbable
00:28:04sentence to say so a neural network
00:28:06could probably determine squishy is a
00:28:08very bad adjective for a laptop this is
00:28:11a very improbable sentence but if I said
00:28:13my laptop is hot that would probably be
00:28:16a much more likely sentence and this
00:28:18already has some human-like seal to it
00:28:21because language was designed for humans
00:28:23and being able to have like if you can
00:28:26do language understanding as in
00:28:28determining the probability of like any
00:28:30sentence given a context you can and if
00:28:32you do this perfectly you can solve
00:28:33basically any task and this is a it’s
00:28:36really interesting domain where it’s
00:28:38being applied because previous if you
00:28:41look at what how language understanding
00:28:42was done before deep learning was around
00:28:44it was just incredibly simplistic tons
00:28:47and tons of rules no robustness two data
00:28:49sets you’d have to make custom rules for
00:28:51every language and now you could you can
00:28:54use the same tricks for English as you
00:28:57can
00:28:58for Chinese characters as you can for
00:28:59byte code so that is just pretty
00:29:02incredible they’ve obviously been much
00:29:05more complicated tasks a pretty popular
00:29:09use for machine for deep learning that
00:29:12this people are really putting a lot of
00:29:14effort in is aunt went language
00:29:16understanding from scratch so the idea
00:29:18is you use an RN to compress a sentence
00:29:23in your source language into a vector
00:29:25like I described in the RNN section and
00:29:28then you use a different RNN to decode
00:29:32it into a target language and while it’s
00:29:36not surprising that you can design a
00:29:37neural network that plausibly can output
00:29:40this it is quite surprising that it
00:29:41works so well and you’ve been able to
00:29:44have neural networks that in the man in
00:29:49a span of a few grad student months
00:29:51match the performance of systems that
00:29:54people have spent decades engineering
00:29:57and / happened nowadays I think that
00:30:00deep learning systems are not into and
00:30:05deploying systems are not what’s used
00:30:06for this right now but they’re a very
00:30:08important component so people still use
00:30:09a bit of hard coded stuff but it’s only
00:30:12a matter of time and the beauty is that
00:30:13but if we have a new task or a new
00:30:15language now it can just automatically
00:30:17work like what if we you know we find
00:30:21out some lost language from a thousand
00:30:25years ago and we have like a good amount
00:30:27of their texts can we actually learn how
00:30:31to translate it or understand it without
00:30:34any knowledge of this and it seems like
00:30:36purely from data we can and that’s
00:30:38really cool we don’t need an
00:30:39understanding of something in order to
00:30:41we don’t need a understanding prior to
00:30:44applying our machine learning models in
00:30:45order to have an understanding
00:30:45afterwards and that is just really
00:30:49really awesome I’ve actually been
00:30:50chatting with the people at SETI the
00:30:52search for extraterrestrial intelligence
00:30:54and one of the tasks that they’re doing
00:30:57is trying to understand dolphins the
00:31:02rationale is that if we can dolphins of
00:31:05language aliens might have language if
00:31:08we if we see alien communication we
00:31:10probably won’t understand it
00:31:12perhaps we can use dolphins to the proxy
00:31:14for aliens to try and understand them so
00:31:17there’s some really cool tasks that are
00:31:18happening there it’s not limited to that
00:31:21there’s some really cool things being
00:31:23done with art in deep learning actually
00:31:25I think that companies have started up
00:31:27that their entire business model is
00:31:30creating awesome deep learning art and
00:31:31they seem to be doing well from what
00:31:34I’ve heard in this case this is a
00:31:37hallucination purely from a conf lap
00:31:39trained to do image classification so an
00:31:43image that continent you know something
00:31:44that takes an image tells you like what
00:31:46breed of dog it is with objects or in it
00:31:47you can use it with a few tricks to
00:31:50create this kind of crazy art and this
00:31:53was a pretty big splash it’s a very
00:31:56unintuitive that a neural network that
00:31:58isn’t even made trained to make art
00:32:01actually can turn out making this kind
00:32:02of thing they’ve been more popular use
00:32:06cases such as style transfer the idea
00:32:09would be you can take a neural network
00:32:10still train for classification the idea
00:32:13would be classification has some priors
00:32:15about what images some priors about the
00:32:19natural world so the what you do then is
00:32:21you say i want my image to kind of match
00:32:24the distribution from a different image
00:32:27and then you get this kind of style
00:32:29transfer where you can mix together
00:32:32these kinds of components and while this
00:32:35is actually pretty ugly example there’s
00:32:38there’s some good ones i promise there’s
00:32:41some much more complicated things you
00:32:42can do it’s not just like taking two
00:32:43images together and merging them
00:32:45together you can do things like
00:32:47transforming a perhaps not super great
00:32:51drawing something that you could
00:32:53probably do in paint fairly quickly into
00:32:56something that looks like an artist did
00:32:59or something that’s really awesome and
00:33:01the idea would be that you can actually
00:33:03take these arbitrary doodles and convert
00:33:06them into these things that look like
00:33:07paintings and this kind of stuff is
00:33:10really awesome and I think it’s just the
00:33:11beginning of the kind of stuff that we
00:33:12can do with neural network art but after
00:33:16basically less than a year of work on
00:33:18this you’re making applications that are
00:33:20already very tangible very awesome very
00:33:25this is already something that if I made
00:33:27this I would probably hang up in my
00:33:29living room and this has only been one
00:33:31year of work imagine what happens that
00:33:32in 10 years I saved the best for last in
00:33:36terms of art we can combine our pictures
00:33:38without of Pokemon so clearly the future
00:33:40is here um this is one of my crowning
00:33:43achievements I think primarily because
00:33:47I’ve done this with like dozens of
00:33:48people and only mine turned out well but
00:33:52yeah the I think this is really awesome
00:33:54there’s like just so many things to do
00:33:57here and so few people are working it on
00:33:59it and that the sky is really the limit
00:34:01so it’s just really exciting what on the
00:34:05kinds of stuff that we can be created
00:34:07here there’s been other huge achievement
00:34:11game playing has been really big if
00:34:13anyone saw deep mines 500 million dollar
00:34:17acquisition in 2013 roughly the only
00:34:21paper that they had at the time was
00:34:23learning to play Atari games from pixels
00:34:26which is might be harder than it sounds
00:34:28because humans have a prior of how to
00:34:31play the game right like they have a
00:34:33prior that this is maybe a ball and
00:34:36that’s a paddle and I want to destroy
00:34:37certain things where they were prior
00:34:38that a key opens doors or that roads are
00:34:42something I want to stay on in a driving
00:34:44game but neural networks not given any
00:34:47of these priors it’s literally only
00:34:49given the pixels given these images it
00:34:51learns to play at what is on median a
00:34:54superhuman level and the techniques have
00:34:57been continuing to get better and this
00:34:59kind of stuff very similar tricks have
00:35:01been applied to the much more recent
00:35:04result of google deepmind alfa GO
00:35:09network which was not that huge of a
00:35:11deal in the West but if you ever talk to
00:35:13people from the more Eastern world you
00:35:16can talk to them about here are the
00:35:18achievements of deep learning you talk
00:35:19about smart inbox and they’re like oh
00:35:22that’s pretty okay you talk about image
00:35:25search yeah that’s pretty okay and then
00:35:27you talk tell them about like oh yeah
00:35:29it’d also be the world champion it go
00:35:31and they’re like whoa we’d beat plays go
00:35:33that’s amazing and people predicted that
00:35:37even beating human
00:35:39go would probably be depending on the
00:35:42expert 10 to 100 years off and it
00:35:44happened it just happened it’s already
00:35:47done it’s already that like humans have
00:35:49lost that go and as a side effect goez
00:35:53also caused more fear over AI safety
00:35:56than any other neural network I believe
00:35:58and this is probably a good
00:36:02representation of that I don’t know how
00:36:03yes let’s medium clear this is an XKCD
00:36:07of like how hard people used to think
00:36:09these games were and you can see go as
00:36:12basically being the last on the level of
00:36:16computer still lose to top humans and
00:36:18then not all of these are solved but
00:36:21that is just pretty incredible that
00:36:24that’s now solved people have been
00:36:26trying to ask like if it can do this
00:36:28what can’t it do because go is a task
00:36:30that requires a lot of reasoning and
00:36:34these kinds of achievements have been
00:36:37being transferred into the physical
00:36:39world as well this is a google has like
00:36:42a farm with like a bunch of robots that
00:36:44have learned on their own to grasp
00:36:46objects and basically robotics control
00:36:50is usually pretty hard especially when
00:36:51you’re trying to make it generalize and
00:36:53they’ve been able to do that just by you
00:36:56know throwing the robots into a dark
00:36:57warehouse having a train for a while
00:36:59designing a cute objective function and
00:37:01it just learned to grasp things better
00:37:04than their hand design controllers did
00:37:06which was pretty awesome and more
00:37:10recently actually I think there was a
00:37:11video like that came out last week of
00:37:13nvidia using just deep learning for
00:37:16self-driving cars so the idea was like
00:37:18with just a single camera in front of
00:37:20your car now your car can learn to drive
00:37:22can can drive itself from learning from
00:37:25how other people drove and this is a
00:37:28very interesting result because even
00:37:30google has been working for i don’t know
00:37:32if it might have been a decade already
00:37:34that they’ve been working on
00:37:35self-driving cars using you know lidar
00:37:37and slam and all of that stuff and
00:37:40Nvidia’s by some measures caught up to
00:37:45them entirely within i think it’s been
00:37:47less than a year since they’ve been
00:37:48investing in this so a lot of thing it
00:37:51seems to be changing a lot of things
00:37:53especially these kinds of perception
00:37:55tasks because research is moving so fast
00:37:59I also have to spend some time and
00:38:01things that are not yet practical but
00:38:03may very well soon be as a disclaimer
00:38:06I’ve been traveling this weekend so i’m
00:38:07not sure if some of these things belong
00:38:09in the already solved category
00:38:12generation is a big one there’s tons and
00:38:14tons of stuff happening generation so i
00:38:15definitely can’t give it justice there’s
00:38:17really cool stuff and like just
00:38:18generating images from scratch and
00:38:20generating arbitrary other domains from
00:38:22scratch images are just the most visual
00:38:24so i have them here but some of the
00:38:27coolest and perhaps most practical
00:38:28examples are conditional generation
00:38:31something i’m really excited about is
00:38:33image to text so the idea is you’ve
00:38:37taken an input image and the output is
00:38:39not like yes or no whether or not the
00:38:41dogs engine but you output a description
00:38:44of the image and that’s like an
00:38:45extremely human task it to be extremely
00:38:48useful if you do this task right it
00:38:51seems like this all the whole ton of
00:38:53possibilities I’m very excited about
00:38:55like taking in a medical image and like
00:38:57outputting like a pleat report of it
00:38:58which would be really awesome and some
00:39:01people that are really excited about
00:39:02this that has applications in the very
00:39:04short term is I don’t know the right way
00:39:07to say it but like the poor eyesight
00:39:08community so web pages nowadays have
00:39:12been pretty bad about stuff for people
00:39:16with disabilities and imagine if you had
00:39:18a neural network that can just describe
00:39:20an image for you describe a page for you
00:39:22tell you what’s on the page in a very
00:39:23semantic summarized way and there’s also
00:39:27a really cool opposite problem which is
00:39:29instead of taking an image and
00:39:31outputting a description you take in a
00:39:32description and not put an image which
00:39:35as a terrible idea artist I’m probably a
00:39:37bit more excited about because instead
00:39:39of like I can describe pictures I can’t
00:39:41really draw them and like these are much
00:39:43better already than I can draw but
00:39:45that’s probably a low bar but in this
00:39:48kind of network you actually take in
00:39:49like a sentences text and all of these
00:39:52images are generated from that network
00:39:53and that’s pretty incredible some of
00:39:56them are not super great but like these
00:39:58birds are actually there’s I believe
00:40:02they’re real the flower is not the
00:40:05purple ones
00:40:06but they actually see him close like if
00:40:10I if it was zoomed out enough I could
00:40:12see this is being pretty real and can
00:40:16you imagine in a future where instead of
00:40:18having to spend millions of dollars in a
00:40:20movie you just like type it up and then
00:40:22a neural network just generates the
00:40:23movie for you we’re quite away from that
00:40:26but perhaps not that far away especially
00:40:29like with some focused work and this
00:40:33could name like all sorts of like new
00:40:34forms of creativity that people don’t
00:40:36even know about while language
00:40:39understanding does quite well there is a
00:40:42deeper language understanding which we
00:40:45can kind of solvent oi tasks but it’s
00:40:47kind of harder for real task so QA so
00:40:51cute question answering that requires
00:40:52more complicated reasoning such as if
00:40:54you have like a story here and you ask
00:40:56something question complicated like
00:40:58where is the football then yes like go
00:41:00back in the store and figure out where
00:41:02that kind of thing happened that thing’s
00:41:03kind of complicated people are very good
00:41:06at this task models can solve these
00:41:09simple ones quite well but they can’t
00:41:11real do real question answering yet
00:41:13which is unfortunate but something
00:41:15people really care about and we’re not
00:41:18quite there yet but I also love how
00:41:20awesome this problem sounds is that like
00:41:22our machines that we have like basically
00:41:25spent no work on only automatically
00:41:27learn a shallow level of reasoning like
00:41:30that’s like such a first real problem
00:41:33while there’s like language
00:41:35understanding there’s also visual
00:41:37understanding that is um kind of
00:41:39unsolved there is a there’s some awesome
00:41:42data set that involves images and
00:41:44questions and the goal is to find an
00:41:46answer and the models there are models
00:41:49that can do pretty okay at this task but
00:41:51still very not good and still like face
00:41:56significantly worse than people do so
00:41:59this kind of thing is something that we
00:42:01can’t do just yet while game playing is
00:42:06solved harder game playing is still an
00:42:08open problem and you might think harder
00:42:12game playing my five-year-old brother
00:42:14can play minecraft and he almost
00:42:16certainly can’t beat the world champion
00:42:17at go
00:42:19the harder in this case means stateful
00:42:21it turns out that humans are really good
00:42:24at remembering something while neural
00:42:26networks have some difficulty with it so
00:42:28the neural networks that people have
00:42:30been using for playing games have been
00:42:32completely stateless so when you have a
00:42:34partially observed world like Minecraft
00:42:36where you like only have one direction
00:42:37that you’re looking at if you like look
00:42:38to the left it forgets what was on the
00:42:40right and this is something that people
00:42:42are still working to solve it’s the same
00:42:44thing with doom and the work has been
00:42:47done a bit but it’s far from being a
00:42:49solved problem and I do believe it
00:42:50they’re still subhuman at this task
00:42:54there’s some really cool stuff with
00:42:56automatically discovering hierarchical
00:42:58structure so in language the
00:43:01hierarchical structures may be clear to
00:43:02us because we use language like
00:43:04character a word limit of character your
00:43:06senses are made of words paragraphs are
00:43:08made of senses this is like this
00:43:10semantic hierarchy which makes it easy
00:43:12to break down a problem into simpler
00:43:13problems but this is not the case in
00:43:17many domains and there have been people
00:43:19who’ve designed neural networks that can
00:43:20actually automatically discover this
00:43:22hierarchy and this could be really
00:43:24useful for tasks where we don’t know how
00:43:26to interpret that so something I’ve
00:43:28worked a bit on is genomics and we
00:43:30really don’t even know how to read
00:43:32genomics right but if a neural network
00:43:33and automatically break it up into like
00:43:35this part goes together with that part
00:43:36you know there’s connections between
00:43:38here and here this could actually help a
00:43:41whole lot with all sorts of different
00:43:42kinds of scientific tasks just purely
00:43:45from data this is when it gets a little
00:43:48bit computery but these are things that
00:43:50I’m excited as a computer scientist
00:43:52there’s this model called neural turing
00:43:54machines which learns to use like a big
00:43:56memory buffer which is very cool so you
00:43:58can actually see like how the network
00:44:00reads and writes and reads in order to
00:44:02copy an input there’s ways to implement
00:44:06differentiable data structures so things
00:44:10that you thought where instead of having
00:44:13like this black box of like arbitrage
00:44:15activations with matrix multiplies you
00:44:17can actually plug in a data structure
00:44:18into a network and now your network can
00:44:20learn to do things like pushing and
00:44:22popping to a stack you know getting from
00:44:24both ends of a queue and all of these
00:44:26kinds of things and this could
00:44:28potentially enable all sorts of very
00:44:30cool use cases
00:44:31as learning to program people have done
00:44:34some work where you can create models
00:44:37that not only can like have simple input
00:44:40output mappings but as an intermediate
00:44:43in this input output mapping they can
00:44:44learn subroutines and play with pointers
00:44:46and this actually makes them a very very
00:44:48general computing like it potentially
00:44:51could do all the problems we care about
00:44:53if you can learn subroutines and play
00:44:55with pointers it’s like that could learn
00:44:57abstraction automatically for you and by
00:45:00putting these things together people
00:45:01have been able to do things like
00:45:03learning to actually execute code so the
00:45:06idea would be given like code is a
00:45:09string and targets for that code like
00:45:11what the output is you can actually
00:45:12learn an interpreter for that language
00:45:14and this is really exciting to me as a
00:45:17programming language guy like maybe I
00:45:19could design a programming language not
00:45:21by implementing it but by just showing a
00:45:23whole bunch of examples and the
00:45:25implementation automatically happening
00:45:27for me or perhaps I could just write the
00:45:28test cases for the language and a neural
00:45:31network can generate an efficient
00:45:32language for me and something else that
00:45:37is related to all of this stuff is this
00:45:40is really early but I think a lot of
00:45:41people are really excited about that
00:45:42which are neural module networks we’re
00:45:46instead of having a single architecture
00:45:47that you play with you can have
00:45:50architectures that are you can have a
00:45:53library of components and that F for
00:45:56every single example you make a custom
00:45:57architecture and you output it so for
00:46:00example if you have the question
00:46:02answering task and you have an image and
00:46:04you have a question where is the dog
00:46:05instead of using an arbitrary network
00:46:07that takes in the question and the
00:46:10answer you actually convert this
00:46:11question into a custom neural network
00:46:13which combines a dog module with a where
00:46:17module and outputs the answer and this
00:46:22kind of thing is very early but really
00:46:24promising so that that’s it for the
00:46:28future of it I hopefully you guys are
00:46:30pumped to deep learning some problems
00:46:33there’s a lot of software to help you
00:46:35I’m not going to talk about that right
00:46:37now because there’s a lot of tutorials
00:46:38out there and I think the high level
00:46:39understanding is much more important my
00:46:43recommendation is that if you want to
00:46:44customize a lot of things
00:46:45theano intensive floor the best because
00:46:47it allows you to get this automatic
00:46:49differentiation that I was talking about
00:46:51then you never have to worry about the
00:46:52backwards fast basically and if you want
00:46:55to just use like the modules that I
00:46:57talked about as well as a few others
00:46:58Karis can solve that and you can do a
00:47:00lot of these things with Karis if you
00:47:03want to do this there’s a lot more
00:47:05learning to do and the devil’s really in
00:47:08the details so I was super high level
00:47:09with lots of the stuff but all like
00:47:12there’s so many little things that you
00:47:13need to know such as how do you take how
00:47:15you perform the updates in a way that
00:47:18doesn’t cause your parameters to grow
00:47:20too large how do you initialize the
00:47:22parameter is to not be a trivial
00:47:23function how do you not over fit your
00:47:26training set so there’s a lot of
00:47:28resources out there my favorite one is
00:47:31this stanford class by Andre Carpathia
00:47:34cs2 31 n it is specifically incontinence
00:47:38but it’s constantly updated with
00:47:40state-of-the-art stuff and it’s
00:47:42generally very high quality so I think
00:47:44it’s very approachable for anyone like
00:47:46beginner to very advanced and if you
00:47:50want to do this you probably need GPU or
00:47:5250 I think that’s it for time so sorry I
00:47:57was rushing at the end with any
00:47:58questions also I have these slides which
00:48:01is slide should I leave it on some
00:48:04questions here go so one is how can we
00:48:08avoid that autonomous cars pick up the
00:48:11human bad habits how can we avoid that
00:48:14autonomous cars pick up human bad habits
00:48:16that is a very interesting question it’s
00:48:18very dependent on how the cars are
00:48:20trained so if you train a car to copy
00:48:23the UN bad habits so if you train a car
00:48:26to copy humans which is by far the
00:48:27easiest thing to do it’s not the most
00:48:30correct thing to do because the most
00:48:33correct thing to do would be to learn
00:48:34how to drive optimally from scratch that
00:48:37unfortunately involves trial and error
00:48:39but you probably don’t want that in
00:48:41self-driving cars so we can skip that or
00:48:43hard coded rules so what content happen
00:48:47is if you’re training it to learn from
00:48:49humans it’ll mimic those humans but the
00:48:51idea is that if humans make mistakes you
00:48:54like let’s hope let’s say you want to
00:48:56make mistakes and let’s say humans don’t
00:48:58make consistent mistakes if they don’t
00:49:00make consistent mistakes and different
00:49:01humans make different kinds of mistakes
00:49:03are the same human-like only makes it
00:49:05makes mistakes sometimes and you have a
00:49:07neural network that neural network can
00:49:08predict the expectation of what the
00:49:11human can do rather than the worst-case
00:49:13scenario so if you’re kind you can think
00:49:16of humans as an in Samba in this case
00:49:17that if you’re predicting what the
00:49:18average of a bunch of humans can do you
00:49:20can drive better than a human can but if
00:49:22humans are consistently make if humans
00:49:24consistently make mistakes then there’s
00:49:26nothing you can do about that other than
00:49:28get more data I think we have time for
00:49:32one more what do you think about chat
00:49:34BOTS is it possible to build only with
00:49:36deep learning yes there’s actually many
00:49:38startups that are doing this right now
00:49:40so this seems to be the next wave in
00:49:44startups are like the hot thing right
00:49:46now where people are trying to use chat
00:49:49BOTS to do all sorts of things for like
00:49:52very specific domains it has some really
00:49:54nice properties from a business point of
00:49:56view because your goal is to replace
00:49:59humans who chap so it’s very easy to
00:50:03like replace them with an algorithm
00:50:04because the humans when you have if you
00:50:06have a bunch of them they generate a
00:50:07bunch of data so it is very plausible
00:50:09it’s still hard for chat BOTS it’s kind
00:50:13of like the game-playing problem where
00:50:14it’s hard for chat BOTS to have a memory
00:50:16of what you said so if you talk about
00:50:18like Oh try you know opening this menu
00:50:21and you know go here and here and here
00:50:23and you could like might have five
00:50:24sentences later the chat BOTS might say
00:50:26the same thing because the neuron lyrics
00:50:28i still have memory issues cool I think
00:50:33that’s it so please remember to vote and
00:50:36let’s yeah give a big applause to do
00:50:43thank you