GOTO 2016 • Deep Learning - What is it and What It Can Do For You • Diogo Moitinho de Almeida

00:00:11cool thank you I’m do go as a bit of

00:00:15background I have a very math and

00:00:17computer background which is very good

00:00:19for deep learning list of achievements

00:00:22on why you should listen to me is that I

00:00:24currently work at a super cool company

00:00:26that does uses deep learning to make

00:00:28faster and more accurate medical

00:00:29diagnosis and in the past lives I’ve one

00:00:33whole lot of like international math

00:00:34competitions and some programming

00:00:36competitions as well and today I’m going

00:00:38to chat about what deep learning is and

00:00:40what it can do for you feel free to ask

00:00:42questions at any time if I can ask that

00:00:44right they can ask that right Simon cool

00:00:47so yeah feel free to ask questions any

00:00:48time just seal them out especially if

00:00:51you think I’m lying to you so what is

00:00:55deep learning gonna start with a

00:00:57disclaimer deep learning is actually

00:00:59pretty complicated it’s hard to be very

00:01:02general about everything and be correct

00:01:03so when in doubt I’m going to favor

00:01:05generality so if you’re familiar with

00:01:07the appointing already it’s going to

00:01:08sound like I’m lying a lot but in

00:01:10reality like this is just to like kind

00:01:12of give the hot really high level of it

00:01:14and also whenever possible going to

00:01:16favor some of the shortcuts that might

00:01:18not be a hundred percent correct but

00:01:19should give the correct mental model of

00:01:21how these things work but if you have

00:01:23any questions feel free to ask so from

00:01:26super high level there’s a very lace a

00:01:29lot of like different levels of

00:01:31hierarchy here in the ecosystem there’s

00:01:34artificial intelligence which is a

00:01:35superset of like everything an example

00:01:38would be IBM Watson where lots of hand

00:01:40coded rules and uses extremely large

00:01:42amounts of expert manpower built to do a

00:01:45specific task there’s machine learning

00:01:48which is subset of that an example of

00:01:50this would be like Google ad click

00:01:51prediction and how you do this is rather

00:01:54than using tons and tons of hard coded

00:01:56rules you start using more examples to

00:01:58figure out how to combine some hand

00:02:00coded statistics to predict the

00:02:01probability of for example an ad click

00:02:03at a slightly deeper level you have

00:02:06representation learning which is

00:02:08sometimes seen as one layer deep

00:02:11learning so sometimes referred like

00:02:14these levels are called shallow learning

00:02:15if you’re trying to start a fight

00:02:17and an example of this would be netflix

00:02:21movie recommendation where the

00:02:23statistics of what you even know about

00:02:25each movie is learned from data but

00:02:27you’re still learning a simple

00:02:28combination of how these features go

00:02:29together and after a few levels you get

00:02:33into deep learning so this would be an

00:02:34example of figuring out diseases from

00:02:36images where instead of you know having

00:02:39a layer of manual statistics that are

00:02:42learned and then combining together you

00:02:43might learn all of these statistics at

00:02:45the same time in tens hundreds or even

00:02:47thousands of steps which is what some

00:02:49people use nowadays this is probably not

00:02:53a common view of what deep learning is

00:02:55but I think the easiest view of how to

00:02:58see it is deep line exists an interface

00:03:00and this interface has roughly two

00:03:03methods as the first method you have a

00:03:06forward pass and this is definitely the

00:03:09easy part given arbitrary input make

00:03:11arbitrary output anyone can do this part

00:03:13this is really easy the the trick that

00:03:16makes it work is the backwards pass so

00:03:19given a desired change in the output you

00:03:22want to be able to transform this into a

00:03:24desired change in the input and once you

00:03:27have these you can make arbitrarily

00:03:29complex things by chaining thing up

00:03:31changing them up into a directed acyclic

00:03:32graph and if this sounds too good to be

00:03:34true especially with how we design the

00:03:36forward pass because if you just say

00:03:38arbitrary input an arbitrary output of

00:03:40course you can do anything you want but

00:03:42the hard part is how you define that

00:03:44backwards pass because as you make your

00:03:47forward pass more and more complicated

00:03:49like if you have like some really crazy

00:03:50function it becomes hard to define how

00:03:53to map the inputs back into outputs so

00:03:56by keeping these things simple and

00:03:58combining them together it gives us this

00:04:00almost composable language of modules

00:04:02that allow us to do the things we want

00:04:04to do so once you have this interface

00:04:07you just can build up from this once you

00:04:10once you have that you could have a

00:04:12bunch of these modules that satisfy this

00:04:14interface as a side note a bunch of

00:04:19these modules will be parametric which

00:04:21means that they have parameters which

00:04:24roughly means that they’re stateful and

00:04:26they’re stateful means that once you

00:04:30have the state

00:04:31state changes and it’s this change in

00:04:33state that allows you to change this

00:04:34function from something that you just

00:04:36cobbled together to something that’s it

00:04:38gets closer and closer to what you want

00:04:39to do and once you have a frame rail

00:04:43language of what you want to do now you

00:04:45can start doing the tasks that you care

00:04:46about you and deepen you always define a

00:04:49loss or a cost depending on how you want

00:04:51to define this and this is something you

00:04:52want to minimize for reasons that I’d

00:04:55happily explain this has always be a

00:04:57scalar so it can’t be several costs the

00:05:00same time you have to squash it down

00:05:02into a single thing that you care about

00:05:04and once you squish this number like

00:05:06everything care about in the world into

00:05:08a single number now you can start using

00:05:09deep learning to optimize this you

00:05:12create an architecture which is the

00:05:14function that you want to do and this

00:05:16would be how you compose together these

00:05:18modules that I talked about so the way

00:05:21you connect them together changes the

00:05:22function that you have and the kind of

00:05:24representation power it has and that

00:05:26becomes the hard part after that you

00:05:28initialize the parameters and you train

00:05:30this architecture by repeatedly updating

00:05:33the parameters to minimize the cost so

00:05:35you go forward through the network to

00:05:36get the things you care about and you go

00:05:39backwards through the network to change

00:05:40the parameters to be slightly better for

00:05:42your costs and you repeat these many

00:05:43times until you get a function that

00:05:44you’re really really happy with and

00:05:46solves whatever problem you want and at

00:05:48the end you just use that function just

00:05:50the forward pass how to implement the

00:05:53backwards pass is in general we almost

00:05:57always use the chain rule this is really

00:05:59nice because it makes implementing the

00:06:01backwards pass easy how this works is if

00:06:04you have your output as sum function of

00:06:07some X and you have the partial

00:06:09derivative of your output dld f you can

00:06:12get dl DX by simply multiplying the

00:06:15partial derivative of DF DX and the nice

00:06:18part about this is that dl DF is gotten

00:06:21from the rest of your network and DF DX

00:06:24is gotten just from your module so this

00:06:26allows you to chain these things

00:06:28together in a way that only requires

00:06:29local information in order to get this

00:06:32backward pass it’s very nice there’s

00:06:35theoretical reasons of why this is a

00:06:36good way to do this and perhaps the best

00:06:40part about this is some frameworks make

00:06:42us completely automatic by the

00:06:44finding a forward pass using automatic

00:06:46differentiation you can figure out how

00:06:48to make a backward pass automatically so

00:06:50it becomes basically as easy as defining

00:06:52arbitrary functions as so you actually

00:06:57do get this benefit of just define

00:07:00arbitrary things return arbitrary things

00:07:02as long as all the operations you do are

00:07:04differentiable you can just make it work

00:07:06like magic and optimize it and this is

00:07:08literally how people do this in practice

00:07:11updating the parameters these are just

00:07:14minor details to like get an

00:07:15understanding of how this works is that

00:07:17once you have your existing parameters

00:07:20you get your gradient and you take a

00:07:22step in the opposite direction of the

00:07:24gradient and the partial derivatives

00:07:26tells us how to change the parameters to

00:07:28increase or decrease the cost that we

00:07:30care about an important word to note to

00:07:34know though is and that people always

00:07:36use I think kind of makes it more

00:07:38complicated it’s a big word is back

00:07:40propagation or back prop for short this

00:07:44is has a longer name called reverse mode

00:07:47automatic differentiation which sounds

00:07:51pretty complicated but this is just the

00:07:52chain rule plus dynamic programming I

00:07:54assume that I just talked about change

00:07:56were like some people are familiar with

00:07:57dynamic programming but this is just

00:07:59cashing and the idea would be when you

00:08:02have a computation graph this is a very

00:08:04simple computation graph y equals C

00:08:06times the C equals a plus B D equals B

00:08:08plus 1 the idea would be um you traverse

00:08:12the graph from the top to the bottom and

00:08:15by doing it from the top to the bottom

00:08:16instead of the bottom to the top you can

00:08:18cash the intermediate so that I used

00:08:20many times in the graph and by cashing

00:08:22these intermediates you get something

00:08:23that’s much more efficient than if you

00:08:25were going to do a naive solution and

00:08:26this allows you to get gradients that

00:08:29are computable in linear time in the

00:08:32size of your graph so you basically

00:08:33evaluate each node once and this is a

00:08:35really nice property this makes it all

00:08:38really efficient and that’s basically it

00:08:40for the basics from a high level deep

00:08:43learning is just composing optimized

00:08:45abul subcomponents optimize the Ville

00:08:47almost always means differentiable

00:08:48differentiable means that you can do

00:08:50backdrop backdrop is just the chain rule

00:08:52in dynamic programming when once you get

00:08:56to practical deep learning normally

00:08:57you have to combine this with gradient

00:08:59descent software and a data set but you

00:09:01care about and the space of software is

00:09:05there’s a very rich space of software

00:09:07that will talk a little bit about in the

00:09:09future but it this these things are

00:09:11solved for you so you can do deep

00:09:12learning without even knowing how to

00:09:14calculate the gradient yourself so while

00:09:18we can do arbitrarily complicated things

00:09:20there are a few standard modules that

00:09:22are the main workhorse of deep learning

00:09:23today and the goal of this section is

00:09:26going to be to get a high level

00:09:28understanding of each since all of them

00:09:29can be very incredibly nuanced but these

00:09:32standard modules will cover almost all

00:09:34of what’s happening in papers the

00:09:37simplest of them is perhaps the simplest

00:09:40is just matrix multiplication it has

00:09:42many names the fully connected layer

00:09:44sometimes shortened to FC sometimes

00:09:47called dense because you have lots of

00:09:48connections linear layer because it’s a

00:09:50linear transformation or a fine because

00:09:52sometimes there’s a bias and the matrix

00:09:55multiplication is basically every time

00:09:56you have a neural network diagram all of

00:09:58these arrows correspond to the matrix

00:10:00multiplication so when you have a

00:10:02diagram that looks complicated that’s

00:10:03from that’s from this kind of thing and

00:10:05you can interpret this as a weight from

00:10:07every input to every output so if you

00:10:09have em inputs and you have n outputs

00:10:12you have M by n way to the transform

00:10:14your inputs to outputs and its

00:10:16implementation is literally a matrix

00:10:18multiplication and W in this case it’s

00:10:21generally a parameter which means you

00:10:23learn the connections from inputs to

00:10:25outputs this on its own is not powerful

00:10:29enough so you need at least one more

00:10:31thing which is non linearity the

00:10:35original non-linearity is called a

00:10:36sigmoid it’s just this function it has

00:10:39the nice property that it Maps reals

00:10:42into the the space 0 1 and it can be

00:10:46interpreted as a probability but that’s

00:10:49not as important that’s just being

00:10:50nonlinear and the reason the

00:10:52non-linearity is important is if you

00:10:54have this kind of like neural network

00:10:56when you stack up the layers

00:10:57back-to-back if you had no non-linearity

00:11:00in the middle this would just be two

00:11:02matrix multiplies back-to-back and what

00:11:04would happen is you could just combine

00:11:05this into a single matrix multiply so if

00:11:07you have that 100 layer purely linear

00:11:10network of just matrix multiplications

00:11:11while this thing is pretty complicated

00:11:13and you do all the work of a real neural

00:11:15network you could actually flatten it

00:11:18into a single weight matrix because of a

00:11:20linear composition of linearity so this

00:11:23was the original one people like it

00:11:25because it’s very similar to what people

00:11:27use before they really got how machine

00:11:29learning words which was just binary

00:11:32threshold injustice fired back in the

00:11:35day and the cool part is with just those

00:11:39two units you know how to make a neural

00:11:41network you can just simply do you get

00:11:43your input you apply the matrix multiply

00:11:45you apply a sigmoid you apply another

00:11:47matrix multiply and you have one and

00:11:50these are called multi-layer perceptrons

00:11:52when you only have matrix multiplies and

00:11:54nonlinearities and the cool part is that

00:11:57there’s a serum on this that this simple

00:11:59architecture like literally three

00:12:01functions can solve can approximate

00:12:04arbitrary functions which means it can

00:12:06solve any problem that you care about

00:12:08there’s a cool theorem on this the idea

00:12:12is that if you make the middle big

00:12:14enough you can calculate basically any

00:12:16function the downside is that just

00:12:19because it can it doesn’t mean it will

00:12:20and a single layer multi-layer

00:12:23perceptron often causes more problems

00:12:25than it solves so this is why there was

00:12:27an AI winter in the 90s just because

00:12:30these things are your kind of terrible

00:12:33but people have gotten a lot better it

00:12:35it’s and that now now neural networks

00:12:36are cool and as a disclaimer these are

00:12:40these are neural networks when you have

00:12:43so this is would be a multi-layer

00:12:45perceptron these are neural networks

00:12:47everything I’m talking about today is

00:12:48still a neural network but these are

00:12:51specifically when you talk about the

00:12:53multi-layer perceptron that is what this

00:12:54is so since then people have made better

00:12:59nonlinearities this is probably majority

00:13:03of the improvement between 1990 and 2012

00:13:06unfortunately which is you have like a

00:13:09kind of smarter non-linearity so instead

00:13:11of taking this weird squiggly function

00:13:13you just do a threshold so anything

00:13:16that’s negative you just turn it into 0

00:13:19this is actually the most popular

00:13:21non-linearity nowadays it does

00:13:23incredibly well

00:13:25some really nice optimization properties

00:13:27in particular when you have zero like is

00:13:32this not only is very linear so it works

00:13:34very well with a chain rule and this

00:13:36thing is used almost everywhere nowadays

00:13:38especially in the middle of a neural

00:13:40network there is a softmax which you can

00:13:45think of as converting a bunch of

00:13:47numbers into a discrete probability

00:13:48distribution so the math of it is p

00:13:53equals you exponentiate your input and

00:13:56then you divide it by the sum of the

00:13:57inputs you can think of the explanation

00:13:59is turning it all the numbers into

00:14:01positive and the dividing by the sum is

00:14:03a normalization term there’s some very

00:14:04nice properties about this and it’s used

00:14:07as the final layer for classification

00:14:08problems and it’s used in almost every

00:14:11neural network cool that was the easy

00:14:14part this gets complicated I like feel

00:14:20free to ask questions during this I

00:14:22normally explain this with a whiteboard

00:14:23and it normally is complicated even with

00:14:25a whiteboard but i’ll try to go through

00:14:28this so a convolution is a the main

00:14:31workhorse for deep learning on images

00:14:34and deep learning and images is

00:14:35basically it’s kind of where this

00:14:37revolution started so it’s very very

00:14:39important it’s probably the place where

00:14:41deep learning is the most advanced so

00:14:42it’s a very important primitive and I

00:14:44think the very cool primitive to

00:14:45understand because you really realize

00:14:48like how beautiful the framework is when

00:14:49you see like wow this thing sounds

00:14:51pretty complicated but it you can just

00:14:53plug it in and you’re doing that need to

00:14:54know how it works when someone has coded

00:14:56up for you which is what I do so this is

00:15:01a linear operation for 2d images so once

00:15:03you have a multi-layer perceptron you

00:15:05have a mapping from every input to every

00:15:07output but in the case of images your

00:15:09inputs are structured so you have like

00:15:11this spatial relationship between your

00:15:13inputs and if you have a mapping from

00:15:15every input every output you kind of

00:15:16throw away the spatial relationship so

00:15:19the idea would be what if rather than

00:15:21having a connection from every input to

00:15:23every output what if every output the

00:15:25output look like an image as well and

00:15:27every output was only locally connected

00:15:29to the things that corresponds to so

00:15:31that is in sight number one so local

00:15:33connections and inside number two is

00:15:36every output is

00:15:38a local function of its input what if

00:15:41instead of having every output be its

00:15:43own function which would be the general

00:15:45case what if every output was the same

00:15:46function of its input so what this then

00:15:49becomes is equivalent to like a

00:15:51well-known function in computer vision

00:15:52which is a convolution which is you have

00:15:55a kernel which is you can think of it

00:15:58like a local weight matrix so it’s

00:16:00represented yeah oh cool my mouse is

00:16:04here so it’s often represented as the

00:16:06square in an image like thing which

00:16:09means that they’re capturing that local

00:16:11input you just do a matrix multiply

00:16:13between all of the weights of the kernel

00:16:15which would be something like this you

00:16:18do that multiply for everything the

00:16:20local region you sum up the results so

00:16:22this is just a dot product and then you

00:16:24do that at every single location at the

00:16:26input so it’s kind of like tiling your

00:16:28input with the same function or it can

00:16:30be interpreted extracting the same

00:16:32features at every location which is the

00:16:34more common way to interpret it this is

00:16:37it’s very powerful it’s very parameter

00:16:41efficient because you have a lot of

00:16:42weight sharing between the parameters

00:16:44and you can end up having much larger

00:16:48outputs then you can have with a normal

00:16:50matrix multiplication and you also don’t

00:16:52lose spatial information which is a very

00:16:54important structure of images so these

00:16:57are some really nice properties and as a

00:17:00side effect you might think that this

00:17:01thing is really complicated how do I

00:17:04take a gradient of it because I maybe

00:17:06the whole thing is kind of complicated

00:17:08but this is actually equivalent to a

00:17:11very constrained matrix multiplication

00:17:13so if you take your input image and you

00:17:16unroll it because with a matrix moulton

00:17:18you lose that spatial structure and you

00:17:20unroll your input you basically have a

00:17:22few connection like every input every so

00:17:25every output is connected to maybe like

00:17:27nine of your inputs and that just

00:17:30becomes equivalent to the really like

00:17:32all of the diagram with lots of arrows

00:17:33but most of the arrows being zero or

00:17:35missing so this is still completely

00:17:37differentiable and still fits very

00:17:40nicely into this framework that you can

00:17:41plug in with all the other

00:17:42nonlinearities cool it’s going to get a

00:17:47little bit harder

00:17:49another very fundamental building block

00:17:52is called a recurrent neural network I

00:17:54don’t know why the building block is

00:17:56called a network when everything else is

00:17:57called layer but that’s just kind of

00:18:01convention and this is solving a a

00:18:04problem that is basically has not been

00:18:06solved in machine learning before which

00:18:08is we want functions to take to take in

00:18:11variable size input but they can only

00:18:13take in fixed size input and this is

00:18:15becomes a problem when you’re a function

00:18:17is parametric like a fully connected

00:18:20layer is because if you want a

00:18:21connection from every input to every

00:18:22output but your input size changes that

00:18:25means you’re the number of weights you

00:18:26have changes and that means that if you

00:18:28get a longer example at the inference

00:18:31time you now don’t know what to do with

00:18:32it and this also might be inefficient

00:18:34because you might have like really

00:18:36really big a really large number of

00:18:38inputs and you might not need all of the

00:18:40power of having like every connection

00:18:41there so a recurrent neural network is a

00:18:44way to solve this problem and the

00:18:47solution to this problem is recursion so

00:18:49what you have is an initial state which

00:18:51would just be let’s just call it h in

00:18:53this example and you have a bunch of

00:18:55these inputs X and there’s a variable

00:18:57number of them so you don’t really know

00:18:59what like this capital T is and you can

00:19:02make a function that takes in a fixed

00:19:03size and because each X is fixed size

00:19:06you can make that function have taken

00:19:07both h + X and now you can recurse

00:19:10through this list by saying h of t

00:19:13equals the function of the previous

00:19:15state sorry the new state is a function

00:19:16of the previous state and the current

00:19:18input and then you just return the final

00:19:19one and what this allows you to do is it

00:19:22allows you to with a fixed size input

00:19:25you can have it operate on sorry with a

00:19:28fixed function that takes fixed size

00:19:29input you can now turn it into a

00:19:31function that takes in a variable sized

00:19:32input by applying that function of

00:19:34variable number of times this is not the

00:19:37this is like a pretty obvious insight

00:19:40and you could do that with any kind of

00:19:42machine learning algorithm you could

00:19:43like apply a random forest like an

00:19:45arbitrary number of times but the cool

00:19:47part about this is that because this

00:19:49function is differentiable this

00:19:50recursive function is also

00:19:52differentiable so you can take the

00:19:53derivatives of each of the inputs you

00:19:55can take even take the derivative of the

00:19:57weight matrices you use at each step you

00:19:59can use that f reach up and you get a

00:20:02diagram

00:20:03it looks kind of like this now you can

00:20:06think of it as applying an FC layer for

00:20:08each input that takes the input and the

00:20:09state so far and this diagram might not

00:20:13be very clear but there are many

00:20:15different diagrams for RN ends and

00:20:16they’re all equally confusing if you’re

00:20:18unfamiliar with them so this is kind of

00:20:21the one on the left is my favorite one

00:20:23because you can kind of think of it as a

00:20:25stateful function except you the state

00:20:29only lasts for the duration of your

00:20:31input but the unrolled version is the

00:20:33version you use if you’re taking

00:20:35gradients so this is equivalent just

00:20:37passing the gradients through this very

00:20:39long graph a last complicated slide long

00:20:43short term memory units this puts me in

00:20:46a really hard position because I can’t

00:20:48not talk about them because they are so

00:20:50big it is but they’re also extremely

00:20:51complicated and there they take more

00:20:55building blocks and I’ve UNIX even

00:20:56explain but there is this great blog

00:20:58post I think slides will be published so

00:21:00you don’t have to worry about that this

00:21:02great great blog post tries to explain

00:21:04it but I’m going to try to give like a

00:21:05high-level intuition of them just so

00:21:07like even higher than what I’ve said so

00:21:09far um just so that you can kind of

00:21:11understand where where it’s coming from

00:21:16when I talk about these things being

00:21:17used and the idea would be its kind of

00:21:21like an RNN and in practice no one uses

00:21:24the RNN that I’ve just described it’s a

00:21:26very simple function and there’s much

00:21:28more complicated versions it’s an RN n

00:21:30where the function is just really

00:21:32complicated so this entire thing here is

00:21:35a representation of that function I’m

00:21:38not going to get into the details of it

00:21:40but it involves a lot of different

00:21:44mechanisms in order to make optimization

00:21:46easier and the idea is that if you’d

00:21:50apply if you designed this function well

00:21:53the function is applied at each time

00:21:54step it can make the problem much much

00:21:56easier to optimize and you can have like

00:21:57a much much more powerful function and

00:21:59the key is that by having a path which

00:22:02is relatively simple so this is what

00:22:04represents with the top path where these

00:22:06variable operations being done to it it

00:22:08makes it easier to stack these things

00:22:10want back to back and that makes easier

00:22:13to learn long-term relationships between

00:22:15the functions

00:22:18whoo okay that was that was the

00:22:22complicated part you now know

00:22:24ninety-five percent of the building

00:22:25blocks that everyone uses for

00:22:27state-of-the-art deep learning with just

00:22:29these billing box you could probably do

00:22:30new state-of-the-art things on new

00:22:32domains so congratulations you ready for

00:22:35the next part um so in this part I want

00:22:40to talk about what D planning is really

00:22:41good at and what you should use it on

00:22:43the answer is a whole lot so I’m going

00:22:46to cover just the rough themes of where

00:22:47deep learning really shines but there’s

00:22:49just much much more to it which i think

00:22:51is part of the awesomeness because it’s

00:22:53all falls under this extremely simple

00:22:55framework that I’ve just described I

00:22:57don’t think that you could like describe

00:22:58any framework as simple as what I’ve

00:23:00just done and have it solved this many

00:23:02complicated unsolved tasks before 2012

00:23:06basically so convolutional neural

00:23:09networks this is a general architecture

00:23:11commonly referred to as CNN’s this

00:23:14actually means a network in this case

00:23:16and not just a layer the idea is that

00:23:19you take your image you apply

00:23:20convolution you apply your value your

00:23:24rectified linear unit you’re probably

00:23:25convolution you apply lu and you

00:23:27basically repeat this convo you until

00:23:29you solve all the problems in computer

00:23:31vision that isn’t quite true since at

00:23:34the end you need to tack on some sort of

00:23:36outfit layer and the other player

00:23:37depends and what kind of input you’re

00:23:39trying to solve the cannot like a really

00:23:41old school task is that you its face

00:23:46recognition trying to determine like

00:23:48whose face this is and this is a really

00:23:51cool task because makes the

00:23:52representations very visual and you can

00:23:54see how the network learns over time so

00:23:57at the first layer you when you start

00:23:59with the pixels at the first layer your

00:24:01filters tend to just match for edges and

00:24:05very simple things so convolutions can

00:24:07match edges and other very simple shapes

00:24:08and as you get deeper and deeper into

00:24:10network you learn more complicated

00:24:11functions of the input so after that you

00:24:14can start combining edges into corners

00:24:16or blobs so this is still extremely

00:24:19simple but after you get to another

00:24:21layer somehow like combining two corners

00:24:23the right way becomes kind of like an

00:24:24eye like shape or if you have like two

00:24:26corners in a blob that becomes more I

00:24:28like and you can build up from

00:24:30edges to corners to object parts and

00:24:33eventually into the objects you care

00:24:34about and as you get really really deep

00:24:36networks you actually have intermediates

00:24:38that are extremely semantic objects for

00:24:41example people have made a lot of tools

00:24:43for visualization of neural networks

00:24:44where they visualize what these in with

00:24:48the neural networks learn and you have

00:24:51for example if you have a neural network

00:24:52that doesn’t learn to classify books at

00:24:54all but lanes classified bookshelves

00:24:55some of the intermediate features

00:24:57actually become book classifiers which

00:24:59is really interesting like it can learn

00:25:01or I like a hierarchical representation

00:25:04of your input space such that these are

00:25:07useful things to combine together in

00:25:10order to make a robust classifier and by

00:25:12combining so maybe if you combine like

00:25:15three books together as well as a square

00:25:16this becomes a bookshelf so these are

00:25:18kind of like what the local operations

00:25:20do with each neural network and the

00:25:23beauty of it is that it’s all learned

00:25:24automatically for you don’t need to

00:25:25program like I have a book shelf

00:25:27bookshelf normally have books they have

00:25:29books sorry they have like square stuff

00:25:31maybe they’re often decide flowers this

00:25:33all can like happen in a data set

00:25:35automatically for you and these

00:25:37convolutional neural networks are

00:25:39absolutely amazing they just when I

00:25:41wasn’t joking when they save basically

00:25:43all of computer vision right now it all

00:25:46started with imagenet this was in 2012

00:25:50this is when deep learning actually the

00:25:52entire hype train started where you had

00:25:56traditional machine learning solving

00:25:58this very hard very large computer

00:26:00vision data set and it was kind of

00:26:01plateauing over the years and all of a

00:26:03sudden deep learning comes in and it

00:26:05just blows everything away and ever

00:26:08since then everything has been

00:26:10everything in computer vision has been

00:26:12deep learning like nothing can even

00:26:14compare and recently we’ve been even

00:26:16being able to get superhuman results

00:26:18which is pretty impressive because

00:26:21humans are pretty good at seeing things

00:26:23it’s kind of what we’ve evolved to do

00:26:26and the same architectures can do all

00:26:28sorts of really interesting structured

00:26:31tasks so using almost the same

00:26:33architecture you can use a concept to

00:26:36determine you know like you can break up

00:26:39your input space into a what’s called a

00:26:40semantic segmentation of like all of the

00:26:43relevant parts that you

00:26:44have and using basically the same

00:26:46architecture as well you can do crazy

00:26:49things like super resolution where you

00:26:51takin like a low-resolution image and

00:26:52make it you can fill in the details

00:26:54which is pretty is that’s a pretty not

00:26:59only is it incredible even though it

00:27:00sounds pretty easy it’s incredible that

00:27:04like that you can use the same

00:27:05architecture that takes an image and

00:27:08tells you whether or not there’s a dog

00:27:09in it to take an image and return like a

00:27:12new higher resolution image and this is

00:27:14basically the same library the same

00:27:15components it’s just very very

00:27:19composable and that’s really good

00:27:20awesome you can also use this to solve

00:27:23really hard medical tasks tasks that

00:27:25people could not solve before here we’re

00:27:27detecting classifying lung cancer in CT

00:27:30scans these are the kinds of things that

00:27:32I like to work on and it’s not only

00:27:36limited division there’s been a lot of

00:27:38work in language understanding so is

00:27:41something that deep learning is really

00:27:42good at this language modeling roughly

00:27:45this means how probable is a how much

00:27:50sense this statement-making a certain

00:27:52language so it might have to do with a

00:27:54question response how are you I’m fine

00:27:56it might have other things such as what

00:27:59would be a weird thing my laptop is

00:28:02squishy might be a very improbable

00:28:04sentence to say so a neural network

00:28:06could probably determine squishy is a

00:28:08very bad adjective for a laptop this is

00:28:11a very improbable sentence but if I said

00:28:13my laptop is hot that would probably be

00:28:16a much more likely sentence and this

00:28:18already has some human-like seal to it

00:28:21because language was designed for humans

00:28:23and being able to have like if you can

00:28:26do language understanding as in

00:28:28determining the probability of like any

00:28:30sentence given a context you can and if

00:28:32you do this perfectly you can solve

00:28:33basically any task and this is a it’s

00:28:36really interesting domain where it’s

00:28:38being applied because previous if you

00:28:41look at what how language understanding

00:28:42was done before deep learning was around

00:28:44it was just incredibly simplistic tons

00:28:47and tons of rules no robustness two data

00:28:49sets you’d have to make custom rules for

00:28:51every language and now you could you can

00:28:54use the same tricks for English as you

00:28:57can

00:28:58for Chinese characters as you can for

00:28:59byte code so that is just pretty

00:29:02incredible they’ve obviously been much

00:29:05more complicated tasks a pretty popular

00:29:09use for machine for deep learning that

00:29:12this people are really putting a lot of

00:29:14effort in is aunt went language

00:29:16understanding from scratch so the idea

00:29:18is you use an RN to compress a sentence

00:29:23in your source language into a vector

00:29:25like I described in the RNN section and

00:29:28then you use a different RNN to decode

00:29:32it into a target language and while it’s

00:29:36not surprising that you can design a

00:29:37neural network that plausibly can output

00:29:40this it is quite surprising that it

00:29:41works so well and you’ve been able to

00:29:44have neural networks that in the man in

00:29:49a span of a few grad student months

00:29:51match the performance of systems that

00:29:54people have spent decades engineering

00:29:57and / happened nowadays I think that

00:30:00deep learning systems are not into and

00:30:05deploying systems are not what’s used

00:30:06for this right now but they’re a very

00:30:08important component so people still use

00:30:09a bit of hard coded stuff but it’s only

00:30:12a matter of time and the beauty is that

00:30:13but if we have a new task or a new

00:30:15language now it can just automatically

00:30:17work like what if we you know we find

00:30:21out some lost language from a thousand

00:30:25years ago and we have like a good amount

00:30:27of their texts can we actually learn how

00:30:31to translate it or understand it without

00:30:34any knowledge of this and it seems like

00:30:36purely from data we can and that’s

00:30:38really cool we don’t need an

00:30:39understanding of something in order to

00:30:41we don’t need a understanding prior to

00:30:44applying our machine learning models in

00:30:45order to have an understanding

00:30:45afterwards and that is just really

00:30:49really awesome I’ve actually been

00:30:50chatting with the people at SETI the

00:30:52search for extraterrestrial intelligence

00:30:54and one of the tasks that they’re doing

00:30:57is trying to understand dolphins the

00:31:02rationale is that if we can dolphins of

00:31:05language aliens might have language if

00:31:08we if we see alien communication we

00:31:10probably won’t understand it

00:31:12perhaps we can use dolphins to the proxy

00:31:14for aliens to try and understand them so

00:31:17there’s some really cool tasks that are

00:31:18happening there it’s not limited to that

00:31:21there’s some really cool things being

00:31:23done with art in deep learning actually

00:31:25I think that companies have started up

00:31:27that their entire business model is

00:31:30creating awesome deep learning art and

00:31:31they seem to be doing well from what

00:31:34I’ve heard in this case this is a

00:31:37hallucination purely from a conf lap

00:31:39trained to do image classification so an

00:31:43image that continent you know something

00:31:44that takes an image tells you like what

00:31:46breed of dog it is with objects or in it

00:31:47you can use it with a few tricks to

00:31:50create this kind of crazy art and this

00:31:53was a pretty big splash it’s a very

00:31:56unintuitive that a neural network that

00:31:58isn’t even made trained to make art

00:32:01actually can turn out making this kind

00:32:02of thing they’ve been more popular use

00:32:06cases such as style transfer the idea

00:32:09would be you can take a neural network

00:32:10still train for classification the idea

00:32:13would be classification has some priors

00:32:15about what images some priors about the

00:32:19natural world so the what you do then is

00:32:21you say i want my image to kind of match

00:32:24the distribution from a different image

00:32:27and then you get this kind of style

00:32:29transfer where you can mix together

00:32:32these kinds of components and while this

00:32:35is actually pretty ugly example there’s

00:32:38there’s some good ones i promise there’s

00:32:41some much more complicated things you

00:32:42can do it’s not just like taking two

00:32:43images together and merging them

00:32:45together you can do things like

00:32:47transforming a perhaps not super great

00:32:51drawing something that you could

00:32:53probably do in paint fairly quickly into

00:32:56something that looks like an artist did

00:32:59or something that’s really awesome and

00:33:01the idea would be that you can actually

00:33:03take these arbitrary doodles and convert

00:33:06them into these things that look like

00:33:07paintings and this kind of stuff is

00:33:10really awesome and I think it’s just the

00:33:11beginning of the kind of stuff that we

00:33:12can do with neural network art but after

00:33:16basically less than a year of work on

00:33:18this you’re making applications that are

00:33:20already very tangible very awesome very

00:33:25this is already something that if I made

00:33:27this I would probably hang up in my

00:33:29living room and this has only been one

00:33:31year of work imagine what happens that

00:33:32in 10 years I saved the best for last in

00:33:36terms of art we can combine our pictures

00:33:38without of Pokemon so clearly the future

00:33:40is here um this is one of my crowning

00:33:43achievements I think primarily because

00:33:47I’ve done this with like dozens of

00:33:48people and only mine turned out well but

00:33:52yeah the I think this is really awesome

00:33:54there’s like just so many things to do

00:33:57here and so few people are working it on

00:33:59it and that the sky is really the limit

00:34:01so it’s just really exciting what on the

00:34:05kinds of stuff that we can be created

00:34:07here there’s been other huge achievement

00:34:11game playing has been really big if

00:34:13anyone saw deep mines 500 million dollar

00:34:17acquisition in 2013 roughly the only

00:34:21paper that they had at the time was

00:34:23learning to play Atari games from pixels

00:34:26which is might be harder than it sounds

00:34:28because humans have a prior of how to

00:34:31play the game right like they have a

00:34:33prior that this is maybe a ball and

00:34:36that’s a paddle and I want to destroy

00:34:37certain things where they were prior

00:34:38that a key opens doors or that roads are

00:34:42something I want to stay on in a driving

00:34:44game but neural networks not given any

00:34:47of these priors it’s literally only

00:34:49given the pixels given these images it

00:34:51learns to play at what is on median a

00:34:54superhuman level and the techniques have

00:34:57been continuing to get better and this

00:34:59kind of stuff very similar tricks have

00:35:01been applied to the much more recent

00:35:04result of google deepmind alfa GO

00:35:09network which was not that huge of a

00:35:11deal in the West but if you ever talk to

00:35:13people from the more Eastern world you

00:35:16can talk to them about here are the

00:35:18achievements of deep learning you talk

00:35:19about smart inbox and they’re like oh

00:35:22that’s pretty okay you talk about image

00:35:25search yeah that’s pretty okay and then

00:35:27you talk tell them about like oh yeah

00:35:29it’d also be the world champion it go

00:35:31and they’re like whoa we’d beat plays go

00:35:33that’s amazing and people predicted that

00:35:37even beating human

00:35:39go would probably be depending on the

00:35:42expert 10 to 100 years off and it

00:35:44happened it just happened it’s already

00:35:47done it’s already that like humans have

00:35:49lost that go and as a side effect goez

00:35:53also caused more fear over AI safety

00:35:56than any other neural network I believe

00:35:58and this is probably a good

00:36:02representation of that I don’t know how

00:36:03yes let’s medium clear this is an XKCD

00:36:07of like how hard people used to think

00:36:09these games were and you can see go as

00:36:12basically being the last on the level of

00:36:16computer still lose to top humans and

00:36:18then not all of these are solved but

00:36:21that is just pretty incredible that

00:36:24that’s now solved people have been

00:36:26trying to ask like if it can do this

00:36:28what can’t it do because go is a task

00:36:30that requires a lot of reasoning and

00:36:34these kinds of achievements have been

00:36:37being transferred into the physical

00:36:39world as well this is a google has like

00:36:42a farm with like a bunch of robots that

00:36:44have learned on their own to grasp

00:36:46objects and basically robotics control

00:36:50is usually pretty hard especially when

00:36:51you’re trying to make it generalize and

00:36:53they’ve been able to do that just by you

00:36:56know throwing the robots into a dark

00:36:57warehouse having a train for a while

00:36:59designing a cute objective function and

00:37:01it just learned to grasp things better

00:37:04than their hand design controllers did

00:37:06which was pretty awesome and more

00:37:10recently actually I think there was a

00:37:11video like that came out last week of

00:37:13nvidia using just deep learning for

00:37:16self-driving cars so the idea was like

00:37:18with just a single camera in front of

00:37:20your car now your car can learn to drive

00:37:22can can drive itself from learning from

00:37:25how other people drove and this is a

00:37:28very interesting result because even

00:37:30google has been working for i don’t know

00:37:32if it might have been a decade already

00:37:34that they’ve been working on

00:37:35self-driving cars using you know lidar

00:37:37and slam and all of that stuff and

00:37:40Nvidia’s by some measures caught up to

00:37:45them entirely within i think it’s been

00:37:47less than a year since they’ve been

00:37:48investing in this so a lot of thing it

00:37:51seems to be changing a lot of things

00:37:53especially these kinds of perception

00:37:55tasks because research is moving so fast

00:37:59I also have to spend some time and

00:38:01things that are not yet practical but

00:38:03may very well soon be as a disclaimer

00:38:06I’ve been traveling this weekend so i’m

00:38:07not sure if some of these things belong

00:38:09in the already solved category

00:38:12generation is a big one there’s tons and

00:38:14tons of stuff happening generation so i

00:38:15definitely can’t give it justice there’s

00:38:17really cool stuff and like just

00:38:18generating images from scratch and

00:38:20generating arbitrary other domains from

00:38:22scratch images are just the most visual

00:38:24so i have them here but some of the

00:38:27coolest and perhaps most practical

00:38:28examples are conditional generation

00:38:31something i’m really excited about is

00:38:33image to text so the idea is you’ve

00:38:37taken an input image and the output is

00:38:39not like yes or no whether or not the

00:38:41dogs engine but you output a description

00:38:44of the image and that’s like an

00:38:45extremely human task it to be extremely

00:38:48useful if you do this task right it

00:38:51seems like this all the whole ton of

00:38:53possibilities I’m very excited about

00:38:55like taking in a medical image and like

00:38:57outputting like a pleat report of it

00:38:58which would be really awesome and some

00:39:01people that are really excited about

00:39:02this that has applications in the very

00:39:04short term is I don’t know the right way

00:39:07to say it but like the poor eyesight

00:39:08community so web pages nowadays have

00:39:12been pretty bad about stuff for people

00:39:16with disabilities and imagine if you had

00:39:18a neural network that can just describe

00:39:20an image for you describe a page for you

00:39:22tell you what’s on the page in a very

00:39:23semantic summarized way and there’s also

00:39:27a really cool opposite problem which is

00:39:29instead of taking an image and

00:39:31outputting a description you take in a

00:39:32description and not put an image which

00:39:35as a terrible idea artist I’m probably a

00:39:37bit more excited about because instead

00:39:39of like I can describe pictures I can’t

00:39:41really draw them and like these are much

00:39:43better already than I can draw but

00:39:45that’s probably a low bar but in this

00:39:48kind of network you actually take in

00:39:49like a sentences text and all of these

00:39:52images are generated from that network

00:39:53and that’s pretty incredible some of

00:39:56them are not super great but like these

00:39:58birds are actually there’s I believe

00:40:02they’re real the flower is not the

00:40:05purple ones

00:40:06but they actually see him close like if

00:40:10I if it was zoomed out enough I could

00:40:12see this is being pretty real and can

00:40:16you imagine in a future where instead of

00:40:18having to spend millions of dollars in a

00:40:20movie you just like type it up and then

00:40:22a neural network just generates the

00:40:23movie for you we’re quite away from that

00:40:26but perhaps not that far away especially

00:40:29like with some focused work and this

00:40:33could name like all sorts of like new

00:40:34forms of creativity that people don’t

00:40:36even know about while language

00:40:39understanding does quite well there is a

00:40:42deeper language understanding which we

00:40:45can kind of solvent oi tasks but it’s

00:40:47kind of harder for real task so QA so

00:40:51cute question answering that requires

00:40:52more complicated reasoning such as if

00:40:54you have like a story here and you ask

00:40:56something question complicated like

00:40:58where is the football then yes like go

00:41:00back in the store and figure out where

00:41:02that kind of thing happened that thing’s

00:41:03kind of complicated people are very good

00:41:06at this task models can solve these

00:41:09simple ones quite well but they can’t

00:41:11real do real question answering yet

00:41:13which is unfortunate but something

00:41:15people really care about and we’re not

00:41:18quite there yet but I also love how

00:41:20awesome this problem sounds is that like

00:41:22our machines that we have like basically

00:41:25spent no work on only automatically

00:41:27learn a shallow level of reasoning like

00:41:30that’s like such a first real problem

00:41:33while there’s like language

00:41:35understanding there’s also visual

00:41:37understanding that is um kind of

00:41:39unsolved there is a there’s some awesome

00:41:42data set that involves images and

00:41:44questions and the goal is to find an

00:41:46answer and the models there are models

00:41:49that can do pretty okay at this task but

00:41:51still very not good and still like face

00:41:56significantly worse than people do so

00:41:59this kind of thing is something that we

00:42:01can’t do just yet while game playing is

00:42:06solved harder game playing is still an

00:42:08open problem and you might think harder

00:42:12game playing my five-year-old brother

00:42:14can play minecraft and he almost

00:42:16certainly can’t beat the world champion

00:42:17at go

00:42:19the harder in this case means stateful

00:42:21it turns out that humans are really good

00:42:24at remembering something while neural

00:42:26networks have some difficulty with it so

00:42:28the neural networks that people have

00:42:30been using for playing games have been

00:42:32completely stateless so when you have a

00:42:34partially observed world like Minecraft

00:42:36where you like only have one direction

00:42:37that you’re looking at if you like look

00:42:38to the left it forgets what was on the

00:42:40right and this is something that people

00:42:42are still working to solve it’s the same

00:42:44thing with doom and the work has been

00:42:47done a bit but it’s far from being a

00:42:49solved problem and I do believe it

00:42:50they’re still subhuman at this task

00:42:54there’s some really cool stuff with

00:42:56automatically discovering hierarchical

00:42:58structure so in language the

00:43:01hierarchical structures may be clear to

00:43:02us because we use language like

00:43:04character a word limit of character your

00:43:06senses are made of words paragraphs are

00:43:08made of senses this is like this

00:43:10semantic hierarchy which makes it easy

00:43:12to break down a problem into simpler

00:43:13problems but this is not the case in

00:43:17many domains and there have been people

00:43:19who’ve designed neural networks that can

00:43:20actually automatically discover this

00:43:22hierarchy and this could be really

00:43:24useful for tasks where we don’t know how

00:43:26to interpret that so something I’ve

00:43:28worked a bit on is genomics and we

00:43:30really don’t even know how to read

00:43:32genomics right but if a neural network

00:43:33and automatically break it up into like

00:43:35this part goes together with that part

00:43:36you know there’s connections between

00:43:38here and here this could actually help a

00:43:41whole lot with all sorts of different

00:43:42kinds of scientific tasks just purely

00:43:45from data this is when it gets a little

00:43:48bit computery but these are things that

00:43:50I’m excited as a computer scientist

00:43:52there’s this model called neural turing

00:43:54machines which learns to use like a big

00:43:56memory buffer which is very cool so you

00:43:58can actually see like how the network

00:44:00reads and writes and reads in order to

00:44:02copy an input there’s ways to implement

00:44:06differentiable data structures so things

00:44:10that you thought where instead of having

00:44:13like this black box of like arbitrage

00:44:15activations with matrix multiplies you

00:44:17can actually plug in a data structure

00:44:18into a network and now your network can

00:44:20learn to do things like pushing and

00:44:22popping to a stack you know getting from

00:44:24both ends of a queue and all of these

00:44:26kinds of things and this could

00:44:28potentially enable all sorts of very

00:44:30cool use cases

00:44:31as learning to program people have done

00:44:34some work where you can create models

00:44:37that not only can like have simple input

00:44:40output mappings but as an intermediate

00:44:43in this input output mapping they can

00:44:44learn subroutines and play with pointers

00:44:46and this actually makes them a very very

00:44:48general computing like it potentially

00:44:51could do all the problems we care about

00:44:53if you can learn subroutines and play

00:44:55with pointers it’s like that could learn

00:44:57abstraction automatically for you and by

00:45:00putting these things together people

00:45:01have been able to do things like

00:45:03learning to actually execute code so the

00:45:06idea would be given like code is a

00:45:09string and targets for that code like

00:45:11what the output is you can actually

00:45:12learn an interpreter for that language

00:45:14and this is really exciting to me as a

00:45:17programming language guy like maybe I

00:45:19could design a programming language not

00:45:21by implementing it but by just showing a

00:45:23whole bunch of examples and the

00:45:25implementation automatically happening

00:45:27for me or perhaps I could just write the

00:45:28test cases for the language and a neural

00:45:31network can generate an efficient

00:45:32language for me and something else that

00:45:37is related to all of this stuff is this

00:45:40is really early but I think a lot of

00:45:41people are really excited about that

00:45:42which are neural module networks we’re

00:45:46instead of having a single architecture

00:45:47that you play with you can have

00:45:50architectures that are you can have a

00:45:53library of components and that F for

00:45:56every single example you make a custom

00:45:57architecture and you output it so for

00:46:00example if you have the question

00:46:02answering task and you have an image and

00:46:04you have a question where is the dog

00:46:05instead of using an arbitrary network

00:46:07that takes in the question and the

00:46:10answer you actually convert this

00:46:11question into a custom neural network

00:46:13which combines a dog module with a where

00:46:17module and outputs the answer and this

00:46:22kind of thing is very early but really

00:46:24promising so that that’s it for the

00:46:28future of it I hopefully you guys are

00:46:30pumped to deep learning some problems

00:46:33there’s a lot of software to help you

00:46:35I’m not going to talk about that right

00:46:37now because there’s a lot of tutorials

00:46:38out there and I think the high level

00:46:39understanding is much more important my

00:46:43recommendation is that if you want to

00:46:44customize a lot of things

00:46:45theano intensive floor the best because

00:46:47it allows you to get this automatic

00:46:49differentiation that I was talking about

00:46:51then you never have to worry about the

00:46:52backwards fast basically and if you want

00:46:55to just use like the modules that I

00:46:57talked about as well as a few others

00:46:58Karis can solve that and you can do a

00:47:00lot of these things with Karis if you

00:47:03want to do this there’s a lot more

00:47:05learning to do and the devil’s really in

00:47:08the details so I was super high level

00:47:09with lots of the stuff but all like

00:47:12there’s so many little things that you

00:47:13need to know such as how do you take how

00:47:15you perform the updates in a way that

00:47:18doesn’t cause your parameters to grow

00:47:20too large how do you initialize the

00:47:22parameter is to not be a trivial

00:47:23function how do you not over fit your

00:47:26training set so there’s a lot of

00:47:28resources out there my favorite one is

00:47:31this stanford class by Andre Carpathia

00:47:34cs2 31 n it is specifically incontinence

00:47:38but it’s constantly updated with

00:47:40state-of-the-art stuff and it’s

00:47:42generally very high quality so I think

00:47:44it’s very approachable for anyone like

00:47:46beginner to very advanced and if you

00:47:50want to do this you probably need GPU or

00:47:5250 I think that’s it for time so sorry I

00:47:57was rushing at the end with any

00:47:58questions also I have these slides which

00:48:01is slide should I leave it on some

00:48:04questions here go so one is how can we

00:48:08avoid that autonomous cars pick up the

00:48:11human bad habits how can we avoid that

00:48:14autonomous cars pick up human bad habits

00:48:16that is a very interesting question it’s

00:48:18very dependent on how the cars are

00:48:20trained so if you train a car to copy

00:48:23the UN bad habits so if you train a car

00:48:26to copy humans which is by far the

00:48:27easiest thing to do it’s not the most

00:48:30correct thing to do because the most

00:48:33correct thing to do would be to learn

00:48:34how to drive optimally from scratch that

00:48:37unfortunately involves trial and error

00:48:39but you probably don’t want that in

00:48:41self-driving cars so we can skip that or

00:48:43hard coded rules so what content happen

00:48:47is if you’re training it to learn from

00:48:49humans it’ll mimic those humans but the

00:48:51idea is that if humans make mistakes you

00:48:54like let’s hope let’s say you want to

00:48:56make mistakes and let’s say humans don’t

00:48:58make consistent mistakes if they don’t

00:49:00make consistent mistakes and different

00:49:01humans make different kinds of mistakes

00:49:03are the same human-like only makes it

00:49:05makes mistakes sometimes and you have a

00:49:07neural network that neural network can

00:49:08predict the expectation of what the

00:49:11human can do rather than the worst-case

00:49:13scenario so if you’re kind you can think

00:49:16of humans as an in Samba in this case

00:49:17that if you’re predicting what the

00:49:18average of a bunch of humans can do you

00:49:20can drive better than a human can but if

00:49:22humans are consistently make if humans

00:49:24consistently make mistakes then there’s

00:49:26nothing you can do about that other than

00:49:28get more data I think we have time for

00:49:32one more what do you think about chat

00:49:34BOTS is it possible to build only with

00:49:36deep learning yes there’s actually many

00:49:38startups that are doing this right now

00:49:40so this seems to be the next wave in

00:49:44startups are like the hot thing right

00:49:46now where people are trying to use chat

00:49:49BOTS to do all sorts of things for like

00:49:52very specific domains it has some really

00:49:54nice properties from a business point of

00:49:56view because your goal is to replace

00:49:59humans who chap so it’s very easy to

00:50:03like replace them with an algorithm

00:50:04because the humans when you have if you

00:50:06have a bunch of them they generate a

00:50:07bunch of data so it is very plausible

00:50:09it’s still hard for chat BOTS it’s kind

00:50:13of like the game-playing problem where

00:50:14it’s hard for chat BOTS to have a memory

00:50:16of what you said so if you talk about

00:50:18like Oh try you know opening this menu

00:50:21and you know go here and here and here

00:50:23and you could like might have five

00:50:24sentences later the chat BOTS might say

00:50:26the same thing because the neuron lyrics

00:50:28i still have memory issues cool I think

00:50:33that’s it so please remember to vote and

00:50:36let’s yeah give a big applause to do

00:50:43thank you

”

GOTO 2016 • Deep Learning – What is it and What It Can Do For You • Diogo Moitinho de Almeida