00:00:10hello everyone.welcome this is modern
00:00:14fraud detection prevention using deep
00:00:16learning that title was submitted quite
00:00:19a long time ago so I’d say actually the
00:00:22talk is probably bit more about machine
00:00:24learning in general now we had a good
00:00:28talk earlier on in the day introducing
00:00:30some of the concepts we time it behind
00:00:32learning and I’m hoping to sort of build
00:00:35on them really so this talk is going to
00:00:37be a bit more technical there’s there’s
00:00:40no maths which you’ll be glad to hear
00:00:42there’s there’s also no code I’ve tried
00:00:44to you know explain myself using
00:00:45diagrams and pictures wherever I can but
00:00:48it is a much more technical talk so
00:00:50hopefully you can get you your teeth
00:00:52into it we’ve got the usual slides at
00:00:54the front saying please rate and engage
00:00:57so yeah my name’s Phil I’m with try fork
00:01:00but we’re in try fot leads so we’re
00:01:02quite distinct from the Danish
00:01:04mothership yeah I actually am a software
00:01:10engineer in my in my professional life
00:01:12machine learning is a just a bit more of
00:01:15a hobby I’m currently working on there
00:01:16and Apache meat sauce framework for
00:01:18elasticsearch yeah if you’d if you’d
00:01:21like to talk more about any of the
00:01:22subjects that I’m about to discuss then
00:01:25please see me or I’ll see some of my
00:01:27colleagues listed at the bottom there
00:01:28I’m gonna skip the marketing slides
00:01:31because you will not try and we’re
00:01:35split into three or four topics the the
00:01:39final one the architectures that is is
00:01:42more about how we would do this in
00:01:44production how we would do this in real
00:01:46life it’s it’s interesting but it’s not
00:01:49really the core thing of my my talk so
00:01:51I’m going to go through the first three
00:01:53sections and if we have time we might do
00:01:55the fourth but I’ll probably end up
00:01:57speaking for too long and I’ll probably
00:01:58drop that section I’m going to introduce
00:02:02the the reasons why we want to provide
00:02:05some new tools and techniques to apply
00:02:07to fraud to try and make the case to the
00:02:10business users as why you should pick up
00:02:12on some of these ideas and start
00:02:13to run with them I’m gonna then
00:02:15introduce the topic of machine learning
00:02:17and you’ve probably had quite a bit of
00:02:20experience already but if you haven’t
00:02:21that that’ll be the section that really
00:02:23explains what’s going on and and why it
00:02:25happens and I’ve also got quite a lot of
00:02:28demos as well some of the demos are
00:02:30quite simple and very general just to
00:02:32explain the concepts but the rest of the
00:02:35demos are all focused towards fraud
00:02:37prevention focus towards finance and
00:02:38specifically mortgages okay so let’s
00:02:43crack on so in order to to do any of
00:02:47this work we need to persuade some
00:02:49people to give us some money and there’s
00:02:51no better reason to get people to give
00:02:54us some money if there’s other money at
00:02:56risk in the UK we’ve got some UK
00:02:59specific facts here in the UK financial
00:03:01crime is defined as I can’t even read
00:03:04that screen so I have to read from here
00:03:05sorry fraud is an act of deception
00:03:07intended for personal gain or to cause a
00:03:10loss to another party so all of these
00:03:12facts and figures it specific to the UK
00:03:15but they’re they’re applicable to pretty
00:03:17much every country in the world anybody
00:03:20that’s trying to do wrong to do harm for
00:03:23their own financial gain is considered
00:03:26fraud we’ve got a UK mortgage fraud
00:03:29listed there in 2014 a 1.2 million
00:03:33properties bought and sold in the UK and
00:03:3983 in every 10,000 of those applications
00:03:42were fraudulent so that’s not quite 1%
00:03:460.8 3% and when we say when you say
00:03:50fraud in that that aspect it’s not
00:03:52necessarily people being like hugely
00:03:55devious we’re going from the small scale
00:03:57where somebody’s maybe telling a few
00:03:58fibs about their employment history or
00:04:00how much they earn all the way up to
00:04:02huge huge you know international fraud
00:04:05in 2013 there was a story of two guys
00:04:08who had invented there a whole series of
00:04:12companies that invented estate agents
00:04:15that invented surveyors they’ve invented
00:04:17property businesses and builders and
00:04:20they had supposedly bought a huge tract
00:04:23of land which they were going to build
00:04:24you know lots of new houses on the
00:04:27invented or stole the identities of
00:04:29other people to take out mortgages on
00:04:31those respective houses so it turns out
00:04:34there were tens you know tens to
00:04:36hundreds of mortgage applications all
00:04:38going in for houses that hadn’t been
00:04:40built yet
00:04:40but as it turned out they just took that
00:04:42money paid off the original land the
00:04:45original debt they worked they owned the
00:04:46land and then just liked it they just
00:04:47ran off they completely invented a
00:04:49village bought loads and mortgages based
00:04:52upon that and then ran off how how can
00:04:55that even so that the total cost finally
00:04:57came to it was about 53 million pounds
00:04:59and managed to to run away with and they
00:05:01did finally get caught but they very
00:05:03nearly got away with it because it was
00:05:05just so embarrassing
00:05:06you know the mortgage company was so
00:05:08embarrassed to say that this had
00:05:09happened it almost never even got caught
00:05:11so it does it does get to quite a large
00:05:14scale and this this actually equates to
00:05:17approximately 1 billion pounds worth of
00:05:20fraudulent applications so it’s a huge
00:05:22huge number but my interestingly it’s
00:05:26not actually the worst case of fraud in
00:05:28the UK the worst is actually credit
00:05:30current account fraud so traditionally
00:05:33what would what people would do is to
00:05:35steal somebody’s information open a
00:05:37standard bank account current account
00:05:38some sort from from a traditional bank
00:05:42which you can do quite easily in the UK
00:05:43and then use the overdraft or use some
00:05:47facilities to actually withdraw some
00:05:48money and then and then run a runoff
00:05:51so that actually constitutes the most
00:05:53fraud in the UK but we’re talking a
00:05:56little bit about mortgages today and
00:05:59finally we’ve got UK real retail fraud
00:06:03much of the business in the UK is
00:06:05actually made up of small to medium
00:06:07sized enterprise
00:06:09the big guys actually don’t they make a
00:06:12significant part of the market but not
00:06:13not a huge part small to medium-sized
00:06:16businesses they’re estimated losing
00:06:18eighteen billion pounds every year to
00:06:21fraudulent transactions so that’s when
00:06:23somebody goes online buy some clothes or
00:06:25buy some food or buy some shopping of
00:06:27some kind on a credit card and then
00:06:30maybe they cancel a credit card as soon
00:06:32as I place the order so the the guys on
00:06:35the retail side of having to ship all of
00:06:36this stuff only to find that the person
00:06:38you know doesn’t exist or
00:06:40card stolen or stuff like that and that
00:06:43amounts to a huge amount as well another
00:06:46reason why businesses might want to look
00:06:49at some of these ideas is that
00:06:51legislation so we’ve got one end of the
00:06:53spectrum where there’s people actually
00:06:54doing wrong to their businesses you
00:06:56might want to try and protect yourself
00:06:57but also this legislation legal
00:06:59requirements that need to be put in
00:07:00place in order to comply two more in
00:07:052017 there’s new anti money laundering
00:07:07legislation coming in within the EU so
00:07:09it applies to all EU countries it’s
00:07:12extending extending money laundering
00:07:14rules that are already in place but the
00:07:17main changes are that the out of scope
00:07:20limit has dropped to a thousand euros
00:07:22the previously it was fifteen thousand
00:07:24euros and this applies to businesses
00:07:28that are handling financial transactions
00:07:30so it applies to banks obviously
00:07:33financial institutions credit agencies
00:07:35stuff like that it also applies to legal
00:07:38services in the state services it also
00:07:42applies to to gambling services
00:07:43basically anybody that’s handling and
00:07:45moving money around has to comply with
00:07:47this legislation and what this is saying
00:07:50is that anybody that has a transaction
00:07:52of over a thousand euros they need to
00:07:54prove to the authorities that they’re
00:07:56doing their due diligence in to prove
00:07:59that that person is a not being
00:08:01fraudulent and be not using the money
00:08:04for nefarious means like terrorism or
00:08:06something like that and finally then
00:08:08they’re they’re required to submit their
00:08:11information to a central registry of
00:08:13information and this there’s a well
00:08:17there’s obviously previously concerns
00:08:19there but that’s a bit unclear and how
00:08:21that’s actually going to be implemented
00:08:22so there’s financial reasons direct
00:08:27financial reasons why you want might you
00:08:28might want to do it is also legal
00:08:29reasons so how do we do it at the moment
00:08:33well if a traditional company was goat
00:08:36would go to a software house and ask for
00:08:39some software to do this they would
00:08:40probably come up with these some
00:08:43combination of these four general ideas
00:08:45we’ve got the origination based
00:08:47technique so most countries have a law
00:08:50that requires financial services to
00:08:53prove the
00:08:54they’re talking to the real person
00:08:56origination is that’s it that’s what
00:08:58origination is I won one thing I get
00:09:03really really annoyed about is banks in
00:09:05the UK they’ve got this awful technique
00:09:07of using automated phone systems to try
00:09:10and prove you are who you say you are so
00:09:12you go through the whole series of you
00:09:14know police typing your ID number please
00:09:16type in your address please type in your
00:09:17password please do this please do that
00:09:19and that takes about three and a half
00:09:20minutes and then as soon as you finally
00:09:22speak to a real person which is all you
00:09:24wanted to do in the first place
00:09:25as soon as you’ve written speak to a
00:09:26real person they ask all the same
00:09:28questions again and it turns out they do
00:09:30this because these businesses aren’t
00:09:32quite sure that the automated method
00:09:34really is proof enough that the personal
00:09:38methods are actually going through a
00:09:39variety does my head in and some some
00:09:44may be less secure instances such as
00:09:47insurance agencies and people that are
00:09:49not necessarily as interested in
00:09:51protecting security they can use some
00:09:53really quite dodgy methods like I’ve had
00:09:55some cases where people have asked me
00:09:57just for my date of birth or just for my
00:09:59postcode or something like that and
00:10:01they’re completely not secure your date
00:10:02of birth is basically a password you
00:10:04were given at birth you can’t change
00:10:06it’s fixed and you have to live with it
00:10:08so it’s the worst password that that can
00:10:10ever exist the next group of
00:10:14technologies are rules based so these
00:10:17are static rules that are usually
00:10:18provided by analysts saying that you
00:10:20know no transaction must be bigger than
00:10:22X or you can’t have so many transactions
00:10:25within a certain period of time
00:10:26something like that and they’re and
00:10:29they’re great and they’re okay and they
00:10:31catch a reasonable amount of fraud it’s
00:10:34usually the the accidental types and the
00:10:38basically the not so intelligent
00:10:41fraudsters would try and do something
00:10:43silly like this but also it u also catch
00:10:47all the good guys as well like like when
00:10:49you’re abroad you cards always declined
00:10:51the first time because they think it’s
00:10:53fraudulent or you know any trying to buy
00:10:55a new car from a guy and he you know
00:10:59takes cash and you try and pull out 1500
00:11:01pounds out of the cash machine you can’t
00:11:03do it because it’s you know it’s against
00:11:05their static rules credit checks
00:11:08lots of agencies will gladly accept your
00:11:10money to provide you with a number
00:11:12that’s it and these numbers are supposed
00:11:16to represent the worthiness or the the
00:11:19risk that that person provides to your
00:11:23business and there is certainly a case
00:11:26there’s an argument to use them how
00:11:29accurate they are is another question
00:11:34aggregation and monitoring so this is
00:11:36more of a reactive type of solution
00:11:39where analysts would be provided with
00:11:41the data and they you know perform some
00:11:45query or ask a question and try and do
00:11:47something based upon that so for example
00:11:49you can have some guys that find a
00:11:52pattern between you know one cash
00:11:54machine for example gave up a large
00:11:56amount of money so the analyst will when
00:11:58they check it out so they’re the types
00:12:01of things that exist in the wild at the
00:12:03moment but now I’m going to start
00:12:06talking about machine learning and how
00:12:07we can use machine learning to improve
00:12:10some of those technologies and try and
00:12:13remove some of the bias or the
00:12:14redundancy or the error out of those
00:12:16technologies okay so following on from
00:12:22our excellent presentation this morning
00:12:25I forgot the first name miss Pitt sorry
00:12:28if you hear she was talking about how we
00:12:31learn I I also have a couple of slides
00:12:33but it’s not it’s it’s a bit more basic
00:12:36I like to introduce my my daughter here
00:12:38she’s she’s 18 months old and she’s
00:12:41currently going through this process of
00:12:43learning and it’s really fascinating to
00:12:45watch how she does this because there’s
00:12:47there’s lots of parallels between this
00:12:48and between the state machine learning
00:12:50algorithms at the moment and if we can
00:12:52understand how how we learn it actually
00:12:55helps us to write better algorithms and
00:12:56it helps you to understand the
00:12:57algorithms as well so this is my
00:13:00daughter with her her mother my wife
00:13:02making some yummy rice crispy crispy
00:13:05chocolate square things and in the top
00:13:08picture there she’s doing exactly what
00:13:09mom told her please take the rice
00:13:11krispies and put them in some baskets
00:13:13and then we can eat them later on but
00:13:15somewhere along the line she decided to
00:13:17perform some tests
00:13:18she decided if I put this thing in my
00:13:20mouth
00:13:21it gonna be good or is he gonna be bad
00:13:23so she put it in her mouth and he was
00:13:24good
00:13:25so she completely ignored any
00:13:27instructions from there not because
00:13:28she’d learned that eating chocolate with
00:13:30Rice Krispies was a good thing so that’s
00:13:32a very simple example of how children
00:13:35learn and how algorithms learn in
00:13:37general you you provide them with some
00:13:39tests with some input and then they
00:13:41evaluate that input and decide on some
00:13:43outcome
00:13:46it takes time however Shoei she’s 18
00:13:49months and she’s still pretty stupid you
00:13:52know she can’t work she’s struggling to
00:13:54put sentences together she she can when
00:13:56she walks she falls flat in her face she
00:13:59gets spatulas and misses a mouth and
00:14:00hits her eye and it’s too late it’s not
00:14:02good so it does take time for this to
00:14:04happen this applies to to algorithms as
00:14:06well it take time to learn we’ve got
00:14:09this great game that she loves which are
00:14:11index cards and this is an example of
00:14:13how she gets things wrong I mean she’s
00:14:15she’s very good I yeah she’s really good
00:14:17I don’t give you the impression that I’m
00:14:19a bad father I’m saying she’s rubbish
00:14:21and get rid of it but no she’s very good
00:14:22but in some cases she does get it wrong
00:14:24the first example on the left there is a
00:14:27door however she thinks it’s a house and
00:14:30she thinks it’s a house because it’s got
00:14:31four walls and it’s got these features
00:14:33in the middle which are like squares
00:14:34which kind of look like windows but what
00:14:37she hasn’t learned yet is that a house
00:14:38actually needs a triangle on the top and
00:14:40so this is a this is an example of a
00:14:43misuse of features so there are features
00:14:45there but she’s misusing them to come to
00:14:47the wrong conclusion the second one she
00:14:50calls this a chicken because she doesn’t
00:14:52quite understand the concept of a bird I
00:14:54think she she struggles to to to
00:14:57understand classes of things she’s quite
00:14:59happy to learn that that thing is
00:15:00definitely a bird and that thing is
00:15:02definitely a teddy and that thing is
00:15:03definitely mommy and that thing is his
00:15:05dad went it around but she struggles
00:15:10with things so that’s a chicken so
00:15:11that’s so that’s okay but that’s just an
00:15:13example of a Mis classification and then
00:15:16finally we’ve got the third picture and
00:15:17apparently that’s a tiger now I went out
00:15:22when I show this cat she kind of looks
00:15:24at me and goes I’m not sure what it is
00:15:28and then I look at the car go
00:15:29I’m not sure that is either idea I think
00:15:32sometimes she goes for a cat sometimes
00:15:34she goes for
00:15:35there sometimes I don’t know I don’t
00:15:37even know what it is it looks like
00:15:38something sort of ran over it it’s like
00:15:41a cat that’s been ran over basically and
00:15:44that’s a great example of just bad data
00:15:46so in real life you will get that data
00:15:48and there’s a big cleaning method that’s
00:15:49required to try and prevent you from
00:15:51getting this bad data because you will
00:15:52come to the wrong result so just to
00:15:55prove that it’s not just her age I’ve
00:15:58got an example for all of you so take a
00:16:00look at this picture and I’m just going
00:16:02to watch you for a second right so so
00:16:09for all the programmers out there this
00:16:10is like a human equivalent of like a
00:16:13stack overflow so what you start doing
00:16:15is you try and focus in on their eyes
00:16:17but then you realize that she’s got eyes
00:16:19in a different place so you kind of jump
00:16:20across and then you realize the mouth is
00:16:22in the wrong place so you jump again and
00:16:23you’re up and down and up and down and
00:16:25if you stare at it long enough you start
00:16:26to feel sick so to that and but but all
00:16:30this is proving is that you’ve learnt
00:16:32some specific things over time you have
00:16:34you know decade’s worth of experience to
00:16:36say what a face which should look like
00:16:38and when it doesn’t look like that you
00:16:39don’t quite know how to process it and
00:16:42we can get it wrong no humans are
00:16:45completely infallible fallible sorry
00:16:48they’re wrong choice of words they’re
00:16:50completely fallible ok so moving on to
00:16:54the more technical topics here machine
00:16:56learning comprises a four-ish sort of
00:17:00distinct components they’re all trying
00:17:01to do slightly separate different things
00:17:03the first item is dimensionality
00:17:06reduction so when we think of data it
00:17:08has a number of dimensions and by
00:17:10dimensions are basically mean like a
00:17:13single point of information so if you
00:17:16imagine a 10 by 10 grayscale picture
00:17:19that has like a hundred dimensions a
00:17:22hundred pixels in there which all
00:17:23represent a distinct piece of data the
00:17:26problem with that is that with images
00:17:28it’s ok but for many other types of data
00:17:31it’s really hard to try and visualize
00:17:32what’s going on so you’ve got to
00:17:33compress that space down into two or
00:17:36three dimensions in order to actually
00:17:38see what’s going on so that’s the act of
00:17:40dimensionality reduction we’ve got
00:17:43clustering where we’re trying to assign
00:17:46an output to a certain class
00:17:49quite often we know what class it should
00:17:51belong to or at least we should know how
00:17:54many classes there are at least so
00:17:56clustering is the process of trying to
00:17:57group things together into distinct
00:17:59classes we’ve got classification which
00:18:01is linked to clustering where that’s
00:18:03more asking the question exactly where
00:18:06do I put the line to say that’s Class A
00:18:08and that’s Class B and finally
00:18:11regression which is trying to predict a
00:18:13value based upon their previous inputs
00:18:16we’ve also got different types of
00:18:18learning as well learning is the key
00:18:20thing that’s this really enabled deep
00:18:22learning to to come to the forefront is
00:18:23that the new training techniques that
00:18:25have been developed are so much more
00:18:27powerful than they were in the past
00:18:29training can be split into supervised
00:18:31and unsupervised learning supervised
00:18:33learning is where you have an expected
00:18:36result so it’s a it’s labeled so you say
00:18:39that this raw data is supposed to belong
00:18:41to Class A this is supposed to be the
00:18:43number one or this person is fraudulent
00:18:48the algorithm is then trained the
00:18:51parameters of the algorithm and then
00:18:52tuned to try and produce that same
00:18:56result and the the measure of
00:18:59performance for that algorithm is
00:19:01compared to the true result versus the
00:19:03predicted Frizzle and then when you were
00:19:07to use this in in real life if you had
00:19:09new data coming in then you would use
00:19:10those pre learnt weights and you would
00:19:13predict an output based upon that for
00:19:17unsupervised
00:19:17you’ve got no results so you don’t know
00:19:19exactly what class it’s supposed to
00:19:21belong to algorithms are trained in you
00:19:25need to decide on on what’s going to
00:19:28provide you with a measure of how good
00:19:31your algorithms be trained so some some
00:19:33of them deciding whether data are close
00:19:36or far away so since this measure of
00:19:38distance between data the there’s also
00:19:42may be other reasons why you want to do
00:19:44it as well and you can provide your own
00:19:45we’re talking about
00:19:47customized or personalized customized
00:19:51functions to actually cost whether your
00:19:54output is going to be labeled as class 1
00:19:56or class 2 if something is important but
00:19:59in the real in the real world most data
00:20:01is usually semi-supervised
00:20:02you usually start off with some label
00:20:05data and usually a lot more that is
00:20:08unlabeled so you can kind of combine
00:20:10these two things together to maybe you
00:20:12can use the labeled stuff to start to
00:20:14bring out some of the clusters and then
00:20:16apply the unlabeled data to you know
00:20:18really filling the pattern a bit more so
00:20:23let’s talk about some specific
00:20:24algorithms I’m going to talk about to
00:20:27every every guy’s got his own favorite
00:20:30algorithm this first one is called a
00:20:34decision tree and there’s various
00:20:35different types of decision tree but
00:20:37we’re going to stick to the simple one
00:20:38for now and they can be used for
00:20:40classification and regression and the
00:20:43idea is that they predict the target of
00:20:46the target value of a class or a value
00:20:49or something based upon some very simple
00:20:51decision rules so is it less than 10 or
00:20:54bigger than 10 is it is it labeled a or
00:20:57labeled B the example we’ve got there on
00:21:01the right is quite morbid actually this
00:21:03is a decision tree that’s been learned
00:21:05from the data provided from the Titanic
00:21:08manifests and this is predicting whether
00:21:11you’re going to survive if you were on
00:21:12the Titanic or not so the first question
00:21:15it asks is is the sex male so if it was
00:21:20yes then it goes down to one side of the
00:21:22tree on the Left if it was no it goes
00:21:23down the right side of the tree so if
00:21:25you were female you had a pretty good
00:21:28chance of 0.73 so 73% chance of
00:21:32surviving and that represents 36% of the
00:21:35entire population inside the Titanic or
00:21:38as if you were male and if you were
00:21:40above 9.5 then you’ve got a fairly big
00:21:43chance that you’re going to die
00:21:45unfortunately 61% of all males of a 9.5
00:21:48died and you can see that you can go
00:21:52down the tree and you can make a
00:21:53decision based upon these rules so the
00:21:55idea of the algorithm is to train these
00:21:57parameters these rules these decision
00:21:59points to optimally make the right
00:22:02decision
00:22:03so it’s conceptually quite simple it can
00:22:06handle categorical data which is great
00:22:08because some algorithms can’t but it
00:22:10well decision trees specifically can
00:22:13ooph it quite badly but there are lots
00:22:15of methods
00:22:15to to use decision trees in a different
00:22:18way to prevent the overfitting so don’t
00:22:19worry about that too much and decision
00:22:21trees are usually one of the simplest
00:22:24and sometimes effective enough to solve
00:22:28a problem the next algorithm and what’s
00:22:33surrounded by lots of hype at the moment
00:22:35is deep learning so deep learning is
00:22:39it’s really good because you remember
00:22:42those classes of types of algorithms at
00:22:45the start there he actually does all of
00:22:47them he does the dimensionality
00:22:48reduction the classification the
00:22:50regression and the clustering it could
00:22:51do all of it it’s a holy grail of
00:22:53algorithms no other algorithm can
00:22:55actually do all the same things the idea
00:22:59is that it’s actually trying to model
00:23:00our learning process in our brain
00:23:03basically it seems to model the neurons
00:23:05and the synapses in your brain to do the
00:23:07similar sort of tasks it’s it’s
00:23:10simplified somewhat but that’s that’s
00:23:12the general idea so the hope here is
00:23:14that if we can produce a model that of
00:23:16our brain that then we can merit right
00:23:18algorithms to perform things that our
00:23:21brain can do quite easily like
00:23:22recognition classification things like
00:23:24that so the pros and cons again it’s
00:23:29very versatile can be used for lots of
00:23:31different tasks
00:23:32the key improvement really is that it
00:23:36begins to remove the requirement of
00:23:39feature engineering so with all of the
00:23:41other algorithms your algorithm will
00:23:43live or die based upon what features you
00:23:46give the input you need to work really
00:23:48hard with other algorithms to to say
00:23:50that this is the most important feature
00:23:51I’m going to keep that and use that but
00:23:53those are the ones are completely
00:23:54redundant I’m going to remove them and
00:23:56that takes a significant amount of time
00:23:58with deep learning it has the ability of
00:24:01internally during the training stage of
00:24:03either completely removing parameters or
00:24:06completely keeping parameters purely
00:24:09based upon how well it fits the data how
00:24:11well the training process goes so it
00:24:14removes the bias that comes from
00:24:15removing data or adding data that you’re
00:24:17not sure it should be there or not the
00:24:21the main con actually there’s a suppose
00:24:24there’s a couple of cons the biggest one
00:24:25is it can be hard to visualize as soon
00:24:27as you start getting into
00:24:29neural network sizes that are quite deep
00:24:32it can be quite hard to visualize and
00:24:34conceptualize I’m hopefully going to try
00:24:36and prove that wrong in a little bit but
00:24:38um that’s that’s the problem number one
00:24:41and problem number two can be quite
00:24:42computationally expensive but that’s
00:24:44that’s true for kind of lots of these
00:24:46algorithms really so how do they
00:24:49actually work well they all it works
00:24:52primarily by trying to conceptualize
00:24:54things so there’s this idea that that
00:24:58neural networks are acting like a
00:25:00hierarchy of of concepts and the the
00:25:05whole goal really is to take those
00:25:06images also take your data and produce a
00:25:09concept something that accurately
00:25:11describes what is provided at the input
00:25:13so we’ve got the couple of the concepts
00:25:16on the left there we’ve got a street an
00:25:18animal and a person but you can see that
00:25:20you don’t
00:25:21the to the bottom ones the person and
00:25:24the animal there they’re actually linked
00:25:25by another concept you know they’re both
00:25:28animals is just one of them’s human so
00:25:30the great thing about the delayering
00:25:33concept is that you can actually start
00:25:36to tag things that are similar but not
00:25:39quite the same based upon your training
00:25:40data so to be more specific this says is
00:25:44a an example of how you would go about
00:25:49conceptualizing an image so each pixel
00:25:53within the image that’s the dashed lines
00:25:55there that would be passed into the
00:25:57input of our deep learning and it would
00:25:59start to reduce concepts around those
00:26:01pixels so the first layer might decide
00:26:04that there’s a you know part of a tire
00:26:06or a pile of a rim or an end plate or
00:26:08something like that usually very small
00:26:10discreet kind of local things within the
00:26:13image the next layer might start to
00:26:15build in that concept and build a
00:26:17concept of a tire or a full wing or a
00:26:19real wing and then finally we get to the
00:26:21classification and in this case is an f1
00:26:24car but you can imagine that if you then
00:26:27showed the algorithm a normal car it
00:26:30could reuse some of those concepts they
00:26:32all they still have wheels they still
00:26:34have you know cockpits or our bodies
00:26:36away probably don’t have wings I don’t
00:26:38know maybe maybe in Leeds I don’t don’t
00:26:40about Denmark
00:26:42but you can reuse some of these concepts
00:26:45and that kind of shows the applicability
00:26:47to not just not just problems that it’s
00:26:49already seen but also future problems
00:26:51that it hasn’t seen and so just to
00:26:55finish this section off really just
00:26:56machine learning in the news or deep
00:26:57learning in them in the news the the one
00:27:00I really like that’s accessible to
00:27:02anybody really is the Google the new
00:27:04Google Translate app that takes pictures
00:27:06of signs or text in a different language
00:27:08and it translates that text but the real
00:27:11the cool USP of the whole thing is that
00:27:14it actually takes the image and replaces
00:27:17the image with the correct text in your
00:27:19language so here we’ve got a Russian
00:27:21sign and it’s replaced it with the
00:27:24English here actually I say he says
00:27:26access the city but according to my
00:27:29friend who who speaks Russian it
00:27:31actually means exit to village so not
00:27:34access to City exit to village but it’s
00:27:36not quite as grandiose if we showed if
00:27:38Google showed us science and exit to
00:27:39village so it’s probably why they
00:27:41changed it and then we’ve got the the
00:27:44images at the bottom and this is a new
00:27:46chip developed by IBM it’s been a few
00:27:48years in the making actually but
00:27:50effectively it’s a a deep learning
00:27:53neural network type infrastructure
00:27:56inside a chip so obviously you’ve got
00:27:58the cause and you used to the cause
00:28:00imagine the cause parallelized massively
00:28:03so instead of having you know one call
00:28:05we’ve got tens of thousands in this case
00:28:07is actually a million there’s a a
00:28:08million neurons in this chip so it’s
00:28:10able to do a million parallel tasks all
00:28:13at the same time and when we go through
00:28:17some of the examples in in a minute
00:28:18we’re going to be talking about like
00:28:20image sizes like they’re 10 10 by 10 100
00:28:23input pixels that go down to maybe 2 to
00:28:262 outputs on there 2 dimensions on the
00:28:29output so that’s kind of nothing in
00:28:32comparison to what this could do and
00:28:34this is actually in hardware as well so
00:28:36it’s super fast super low power and
00:28:38should produce some really interesting
00:28:40applications ok so it’s just to solidify
00:28:45the howdy learning works I’m going to
00:28:48take you through an example which is a
00:28:54description
00:28:55of some some numbers here so the the
00:28:58idea of this task is to recognize some
00:29:02handwritten digits and to classify them
00:29:04as a number from 0 to 9 so it’s a really
00:29:07classic here machine learning example
00:29:09but it’s really great to use in the
00:29:11example as an example because it’s very
00:29:14easy to understand very very easy for
00:29:15everybody to understand it’s just trying
00:29:17to recognize what that number is and the
00:29:20first thing we notice when we start
00:29:21looking at the data so the first step in
00:29:23any in any data analysis job is to have
00:29:25a look at the data and the first thing
00:29:27we notice is that if you actually if you
00:29:29look at that that top left number there
00:29:31so I’m not not completely sure whether
00:29:35that’s a 5 or a that’s 3 and this
00:29:39immediately brings problems because this
00:29:41data is actually labeled so every one of
00:29:43these examples you’ll see so each each
00:29:46number is an example you can see that
00:29:48it’s been inverted from maybe you’ve
00:29:49somebody written pen on white paper and
00:29:52it’s being inverted and then reduced to
00:29:56a fixed pixel size and then sent it as
00:29:58well and the first thing that we can see
00:30:00is we’re already not sure whether that’s
00:30:02a 3 or a 5 and so somebody’s gone
00:30:04through and labeled this data as being a
00:30:063 or a 5 but I’m not convinced that
00:30:08that’s actually correct so we’re giving
00:30:10our algorithm potentially dodgy data
00:30:13already so there are in mind whenever
00:30:15you’re trying to train data that your
00:30:17your label data might not be right in
00:30:19the first place because it’s usually
00:30:20it’s usually labeled by by humans so
00:30:25what we then do with each example is we
00:30:27feed it into an input layer so I’m
00:30:29trying to stay away from the term neural
00:30:32network although I’ve mentioned it a
00:30:33couple of times because that it’s been
00:30:35around since the 80s but it it sounds
00:30:38complicated but it’s really not all the
00:30:39neural network is you have a node where
00:30:42some data goes in and then you have have
00:30:45links to an annexe subset of nodes and
00:30:48those are those links all have weights
00:30:50that it’s as simple as that all we do is
00:30:52we alter the weights within the the
00:30:55network in order to perform a task so
00:30:58I’ll try and refrain from using that
00:31:00terminology so our input layer is
00:31:02usually the same size as the size of the
00:31:05data so here we’ve got made maybe 10 by
00:31:0710 pixels so we’ve got 100 inputs
00:31:09have one input for each pixel we then
00:31:15pass that data through to what’s known
00:31:17as a hidden layer and we call it hidden
00:31:19layer a bit basically because it’s not
00:31:20an input or an output it’s something in
00:31:22the middle it’s not directly observable
00:31:25and the way in which they’re connected
00:31:27is with a weight and during the training
00:31:30process those weights could be you know
00:31:32completely removed by setting it to zero
00:31:34or you know completely kept by sitting
00:31:36it’s all one and that’s all the training
00:31:38process is doing so what’s really great
00:31:46at this point is that those weights
00:31:48actually they combine in the next layer
00:31:51so you might have learnt that the
00:31:54weights that have been learned for that
00:31:57one particular neuron in the hidden
00:31:59layer can actually be treated as like a
00:32:00feature this is this is the beginnings
00:32:02of a concept so it’s saying that given
00:32:05that one neuron that one item in the
00:32:08hidden layer there that has that has
00:32:12certain weights on each of the input
00:32:14pixels so if we if if we were to make
00:32:18that the output layer there we could
00:32:20imagine that if that was the the output
00:32:22layer for the number one the weights
00:32:24would represent a shape that looks
00:32:26something like the number one generally
00:32:29in hidden layers you have multiple
00:32:30hidden layers so you’re trying to get
00:32:31the algorithm to learn these small steps
00:32:33these small increments of of concept and
00:32:38what we can actually do is to say that
00:32:40for for that one hidden layer we can go
00:32:42back and say what does the input layer
00:32:43have to look like in order to fully
00:32:45activate that one neuron and only that
00:32:47one neuron so this is an example of that
00:32:50hidden feature layer here and it might
00:32:53look a bit abstract but you you can just
00:32:56about start to make out that it’s
00:32:57starting to learn this kind of ghostly
00:33:00images of numbers in there and that’s
00:33:02because it’s starting to learn some of
00:33:03these concepts if you were to use a
00:33:05number of hidden layers and say you know
00:33:07don’t don’t try and learn the number all
00:33:09in one go it might come up with features
00:33:11that are like edges maybe it could learn
00:33:14the edge of the stick of a7 or maybe you
00:33:16can start to learn some curves of a nine
00:33:18or something like that and these are the
00:33:20hidden features that are in the middle
00:33:21of all these these networks
00:33:24so then finally we would produce an
00:33:26output layer which usually amounts to
00:33:29the number of possible classifications
00:33:32that we want to make so for our output
00:33:35layer we would have 10 we would have 0
00:33:37to 9 and each one of those nodes would
00:33:39represent a number and at the output
00:33:43layer if we were to actually put one of
00:33:44these examples in you’d never get 100%
00:33:48you always get this the we’re talking
00:33:52earlier about how they’re they’re not
00:33:53deterministic but you kind of they are
00:33:56deterministic in the sense that they
00:33:57have fixed weight so you can follow the
00:33:59path of those weights through the data
00:34:00however we’re never quite sure like
00:34:03going back to that previous example
00:34:05we’re never quite sure whether it’s a 5
00:34:07or a 3 so we’re going to the algorithm
00:34:09will probably decide that I’m 50 percent
00:34:12sure that it’s a 5 but there’s a 40%
00:34:14chance there could be a 3 so all of the
00:34:17numbers that are generated basically the
00:34:19the classification is made by picking
00:34:22the highest of those numbers so in this
00:34:23case would say that the 5 is the
00:34:26classification for this example because
00:34:28that add the highest value at the output
00:34:32but what’s really cool as well is that
00:34:35we can actually rather than try and tell
00:34:39it to classify the objects by only
00:34:41having 10 outputs we can actually
00:34:44produce the same number of outputs and
00:34:46inputs and say ask the algorithm please
00:34:49try and reconstruct the image based upon
00:34:51your hidden you know concepts and
00:34:54representations so what we can do here
00:34:56is given a certain output please reduce
00:35:00reproduce that input and then we could
00:35:02do some comparison to see how well it’s
00:35:04performed so this is an example of what
00:35:07a reconstruction actually looks like and
00:35:09if I just flick backwards or forwards
00:35:11between what was real what was the real
00:35:14input and what was the learned concepts
00:35:16about that you can kind of see that the
00:35:18learned concepts are kind of like a
00:35:19drunk blurred version of the real number
00:35:22and that’s because they’re kind of
00:35:24learning they did what the most likely
00:35:27look is for that particular number and
00:35:29and what’s really interesting is in the
00:35:32real data with what we won’t show
00:35:34whether that’s 3 or 5 but if you look at
00:35:36the drunk verse
00:35:37it actually looks a little bit more than
00:35:40a five and this is saying that the
00:35:41algorithm was decided um well but it’s
00:35:43probably been labeled as a five so that
00:35:45so the algorithm has has learnt that of
00:35:47those features as a five so when you try
00:35:49and reconstruct it it looks more like a
00:35:51five and then finally we talked about
00:35:55dimensionality reduction so what we can
00:35:57do is take that high dimensional output
00:36:00so in this case we have ten discrete
00:36:03classes from zero to nine and we can
00:36:05flatten them into space so we don’t have
00:36:07ten dimensions to plot all our data so
00:36:09we can’t we can’t plot the 50% of the
00:36:11five to thirty percent of the for the
00:36:13twenty percent of the three and so on
00:36:15and so on all on a graph because we
00:36:16don’t have that many dimensions so what
00:36:18we can do is flatten all of that into
00:36:20two dimensions and this is what this
00:36:21process is here and what it shows you is
00:36:24how well the data are clustering
00:36:27together so we can see if I have stand
00:36:30very close to my screen I can see that
00:36:32the number Seven’s at the bottom are
00:36:34quite well clustered there the number of
00:36:36eights are okay in the top left but then
00:36:39we’ve also got some very strange
00:36:41features like so let’s take the five and
00:36:43a three example you see the fives in the
00:36:45orange in the middle they’re pretty well
00:36:47mixed with the three and that’s kind of
00:36:51because there must be quite a lot of
00:36:52examples that look like a five or look
00:36:54like a three so they’re quite well mixed
00:36:56so that means to actually perform the
00:36:58classification the algorithm is gonna
00:36:59have to work really hard to try and you
00:37:01know pull those apart so this is what
00:37:04you would generally do on the output is
00:37:06you would you would try and visualize
00:37:08the data in such a way that we as humans
00:37:11can couldn’t understand it that could be
00:37:12in 2d or in 3d okay so hopefully that
00:37:18that section kind of introduced you to
00:37:20two deep learning and some of the ideas
00:37:22and some of the terminology so when I
00:37:24come to some of the financial demos
00:37:27there this should be much easier to
00:37:30understand so first example is a
00:37:35traditional example using a rules-based
00:37:39approach and in this case we’ve been a
00:37:42little bit fancy we use in graph
00:37:43database typically graphed over it
00:37:45databases aren’t used as much as we’d
00:37:48like but they do perform really well in
00:37:50a
00:37:51in a fraud based scenario so just
00:37:54quickly recap if you don’t know a graph
00:37:56database is a another new SQL database
00:37:59but its power really is the description
00:38:02of the data so the data can only ever be
00:38:04either a node or a relationship a node
00:38:07is like a thing or a noun whereas a
00:38:09relationship is is a link or a
00:38:12relationship or a or a verb that
00:38:14basically connects two concepts together
00:38:16and the key selling point really is that
00:38:21sometimes you’ve got data that is just
00:38:22better described in a graph like
00:38:24structure so for example when we’re
00:38:26talking about fraud and and finance and
00:38:29stuff
00:38:29you’ve got the concepts of people and
00:38:31accounts and those people and accounts
00:38:33are all linked to different things
00:38:34they’re linked to an address a link to a
00:38:35current account and so on so for example
00:38:40we’ve got the traditional the
00:38:42traditional social media use case where
00:38:46we’ve got bobs these Bobby’s friends
00:38:48with Jane we’ve got a chair contained
00:38:50within a room Jane bought a book and so
00:38:53on but the real power is that once
00:38:56you’ve modeled it in this way you can
00:38:58perform complex queries that you
00:39:00wouldn’t be able to do in a traditional
00:39:02relational database so when you wanted
00:39:05to do so to go back to the social media
00:39:06example again when you wanted to do like
00:39:08who is friends with my friend you have
00:39:10to do some crazy joined with your SQL in
00:39:13order to get that to work with a graph
00:39:14database you can just pop you can just
00:39:16hop through the graph it makes it really
00:39:18really fast so in their fraud situation
00:39:24we might model our data to something
00:39:26like this we might have an account
00:39:27holder in the middle and they have
00:39:28relationships with phone numbers or
00:39:30national insurance numbers things like
00:39:32that and then we can perform queries on
00:39:34that if we would like to but when you
00:39:37start viewing that in detail and
00:39:38actually viewing how these connections
00:39:40are connecting things together
00:39:41interesting patterns start to come out
00:39:43and especially if you’re visualizing it
00:39:44in this way as well it’s much easier to
00:39:46visualize data in this way than it is in
00:39:48a table for example so in this example
00:39:51we’ve got three account holders in red
00:39:53having the red yep they’re red and
00:39:55they’re linked in various different ways
00:39:57we’ve got all three of them are sharing
00:39:59the same address so who could be dodgy I
00:40:02actually had a person in another talk
00:40:03excuse me
00:40:05that III was suggesting that all three
00:40:07people sharing the same address that
00:40:08could be dodgy and and she was like no
00:40:10no no no when thousands of people are
00:40:12sharing the same address then it’s dodgy
00:40:14three is fine don’t worry about it so
00:40:16I’m like okay so but we could set up a
00:40:18rule there to say you know how many
00:40:21people are using the same address and
00:40:22you could do that in the traditional
00:40:23database but where the power really
00:40:25comes in is when you start linking these
00:40:27these things together and searching for
00:40:29these larger rings and groups within the
00:40:31data so if we imagine that directly two
00:40:35people aren’t sharing the same national
00:40:37insurance number for example which is
00:40:38illegal in the UK maybe there’s a third
00:40:41party which is linking these National
00:40:43Insurance numbers together so you
00:40:45actually start to form these rings
00:40:46within the data which are kind of not
00:40:48not natural this shouldn’t really be
00:40:49rings in the data and graph databases
00:40:52are really good at viewing and spotting
00:40:54these rings so that’s the kind of
00:40:56technology that would exist in the wild
00:40:58today if we were asked to to perform a
00:41:01job like this but where we’re really
00:41:05interested in is bringing some machine
00:41:07learning techniques to some of these
00:41:09ideas so the first idea I had was quite
00:41:14a typical one really and that’s why
00:41:16that’s why I did it because it was quite
00:41:18easy to do but basically if we could use
00:41:21vocal fingerprints for origination it
00:41:24would just solve just the the main
00:41:26reasons really it would save the user a
00:41:30significant amount of time the user
00:41:31experience would would you know be huge
00:41:34hugely improved not having to wait on
00:41:37the phone for 20 minutes just because
00:41:38some stupid automated system took you to
00:41:40the wrong place so if we can use their
00:41:43person’s voice as a form of
00:41:45authentication origination then we’ll be
00:41:49able to save time be able to save
00:41:51machines and be able to save their the
00:41:54power of people on the other end of the
00:41:55phone so to do this what we’d have to do
00:41:58is to record the customers voice
00:42:01we then pre-process the data in some way
00:42:03to clean it up and put it in a format
00:42:05that’s that’s capable of being put into
00:42:08an algorithm in this case we would trade
00:42:11a deep learning model but it could be
00:42:12any algorithm and then we’d store that
00:42:14fingerprint for future verification in
00:42:16the online scenario so once you’ve got
00:42:18set up the user would come on you’d
00:42:20rerecord his voice again maybe against
00:42:22the preset phrase maybe against new
00:42:24phrase and then you’d compare that
00:42:26result of the fingerprint and that would
00:42:28prove whether that person is you know
00:42:30really who they say they are so this is
00:42:34the pre-processing stage in action so
00:42:37this is a bit of signal processing which
00:42:39is converting the the time signature of
00:42:42the the audio file into frequency into
00:42:45the frequency domain so what you’re
00:42:47seeing there is a plot of the frequency
00:42:49components versus time so red is strong
00:42:52and that green blue a color is is weak
00:42:54so it’s saying that you know you can see
00:42:57there the gaps in between the data
00:42:59they’re a kind of where that paused to
00:43:01say the words and I think if we’re if it
00:43:04works yeah so this is some example data
00:43:09that I used in my learning and this is
00:43:14three examples of three people saying
00:43:17the same phrase don’t ask me what that
00:43:22phrase actually means I don’t know what
00:43:24anything but anyway you can tell
00:43:27yourself that those three voices sounded
00:43:29sometimes a little bit different but in
00:43:31that last example completely different
00:43:33and what we’re trying to do is to to
00:43:35make the deep learning think the same
00:43:37okay so once we’ve put it into our deep
00:43:42learning model we’ve done the training
00:43:44and we’ve produced an output our output
00:43:46in this case is between these three
00:43:48different people so you could have three
00:43:50outputs and then again we’ve compressed
00:43:52that we’ve squashed that under the
00:43:53screen into two dimensions and this is a
00:43:56plot that shows how close all of those
00:43:59voices were between so we’ve got a
00:44:01couple of different points in there and
00:44:02the the different colors there – Bob
00:44:05Steve and Dave they correspond to the
00:44:07three different examples the three
00:44:09different people giving the example
00:44:10sorry and each individual point is a
00:44:13specific phrase that they said so we had
00:44:15ten ten different phrases that they said
00:44:18and you can see that all of these
00:44:20examples are clustering together quite
00:44:21well so if we then took another they’re
00:44:26the same people but using a different
00:44:28spoken example so not the same examples
00:44:31how would that perform
00:44:32new data so I think we go again so the
00:44:39top line now in the results that was the
00:44:42the raw result the raw output of those
00:44:44three neurons throught for that file and
00:44:46it’s saying that one of the new your
00:44:48honors have 0.98
00:44:50the 10.1 another 100.1 as well and
00:44:53that’s saying that you know Bob
00:44:55definitely pretty sure 19 percent sure
00:44:57that that was definitely Bob
00:45:01there you go 97 percent chance that was
00:45:03Steve there 96 percent it was Dave so
00:45:10that was that example quite a simple
00:45:14example in sense that it only used a
00:45:16very small data set but it’s you know
00:45:18it’s instructive and it kind of points
00:45:23towards things that we could do in the
00:45:25future given much more data I mean like
00:45:27every phone call we pick up these days
00:45:28there’s always a we are recording your
00:45:30voice for verification training purposes
00:45:32so there must be huge vast databases of
00:45:35people’s voices out there ok so next is
00:45:39ample decision trees so this is an
00:45:42example of decision tree that we showed
00:45:43earlier on and this is predicting
00:45:46mortgage default so amazingly two banks
00:45:50– – sorry two mortgage providers in the
00:45:53u.s. went bust as usual of course and
00:45:56were bailed out by the US taxpayer so we
00:45:59owned by the US government so Freddie
00:46:01but Freddie Mac and Fannie Mae and as
00:46:04part of their I don’t know as part of
00:46:07their reprisal basically a slap on the
00:46:09wrist the the government forced them to
00:46:11release lots of their data to the public
00:46:13and amazingly they they publicized a
00:46:17whole data set of mortgage applications
00:46:19and also historical accounts of what
00:46:21happened to those mortgage applications
00:46:22so you can say that they did told us
00:46:26whether that person then defaulted in
00:46:27the future so the task here is given
00:46:32some given some oh dear I’m running over
00:46:35time off to speed up given some data is
00:46:38it possible to predict whether that
00:46:39person’s going to default so the first
00:46:42the first problem is the whole data
00:46:45cleaning problem like we saw
00:46:46the previous talk it’s the vast majority
00:46:49of time to spend cleaning data
00:46:51I’m gonna skip over that so if we were
00:46:54to flatten all of the data that was
00:46:56recovered into a an image before we put
00:46:58it through the algorithm this is kind of
00:46:59what it looks like it’s very
00:47:00intermingled and mixed can’t quite
00:47:03understand what’s going on so a decision
00:47:05tree is is learning all of these rules
00:47:08and based upon the outcome of those
00:47:10rules is rather yes the person defaulted
00:47:12no they didn’t default so we had
00:47:15approximately 20,000 samples total 50-50
00:47:18split a random forest classifier so it’s
00:47:22a type of decision tree algorithm but is
00:47:25better does not over fit as much only 11
00:47:30input features so the main problem here
00:47:32is I don’t actually think we’ve got
00:47:33enough data to do a really good job but
00:47:35we’ll see what we can do and the one
00:47:38great thing about decision trees is that
00:47:40actually gives you a measure of
00:47:42importance for all of those variables so
00:47:45here we’ve got the variables that were
00:47:47inputted to the algorithm at the bottom
00:47:49and it shows their respective importance
00:47:53of those variables on there on the
00:47:56left-hand side so you can see actually
00:47:57the credit score is in second place so
00:48:00I’m not sure that the credit reference
00:48:01agencies would be too happy that you
00:48:03know they could only explain 0.25 of the
00:48:06data so 25% of the data could only be
00:48:09explained by the credit score alone so
00:48:13not not a great result for them and
00:48:14actually the most important measure was
00:48:17the HPI origination which was the house
00:48:19price index origination for that local
00:48:21area so this is saying that a person who
00:48:24took out a mortgage in a very local area
00:48:26it’s very dependent on the prices within
00:48:28that area as to whether they’re going to
00:48:30default or not and this is kind of a
00:48:31typical really in the US you can see
00:48:33like vast tracts of like places like
00:48:35Detroit that you know as soon as some of
00:48:37the jobs left everybody just lost their
00:48:39jobs in the whole house price area then
00:48:41crashed and then people couldn’t afford
00:48:43to sell because they couldn’t sell it so
00:48:47that’s kind of why that’s so important
00:48:50interesting result and then final
00:48:52example I’m having to move rather
00:48:54quickly here because I’ve only got two
00:48:55minutes left but is it possible to take
00:48:58that data
00:49:00and try and see whether there’s
00:49:02something strange going on without in
00:49:03the data so basically this is an
00:49:05unlabeled example we’re not telling it
00:49:07what to learn here so how do we do that
00:49:10well there’s a deep learning technique
00:49:12called an autoencoder which basically it
00:49:16takes the inputs and it restricts the
00:49:18number of hidden neurons to only a few
00:49:20concepts he’s saying you’ve really got a
00:49:22pick and choose what data you use and
00:49:24generate some concepts that are really
00:49:26quite strict and then we try and
00:49:28reproduce the output again and we’re
00:49:30comparing the output against the input
00:49:32as a measure of how well we’ll have done
00:49:35so basically those restrictions in the
00:49:37middle maybe only two neurons you know
00:49:39yes and no something like that is that
00:49:41possible to reconstruct the data so we
00:49:45can do that so there’s the same data as
00:49:47before slightly it’s a different random
00:49:49sample so it might look slightly
00:49:50different we’ve got an input layer a
00:49:52number of hidden layers that are
00:49:54compressing the data down into smaller
00:49:55and smaller neurons and then we’re
00:49:57reconstructing again back to the input
00:50:00layer and doing a comparison to see how
00:50:01well we did but what we can do then is
00:50:04plot in two or three D one of those
00:50:07hidden layers to actually view those
00:50:08concepts and what we’ve learnt and
00:50:10finally this is the result of that
00:50:12process and the left-hand side we’ve got
00:50:14a 2d representation and you can start to
00:50:17see there’s actually some structure
00:50:18within that data so most generally you
00:50:22can see that the people that defaulted
00:50:24the each ruse on that graph or on the on
00:50:26the left-hand side and the people that
00:50:28didn’t default on the right-hand side
00:50:29and within there if you look on the
00:50:31right-hand side there’s a couple of
00:50:33orange dots and that’s saying that the
00:50:35vast majority of people in there didn’t
00:50:37default but one or two people did now an
00:50:39analyst might start to ask why so it
00:50:42could be something quite innocent you
00:50:43know maybe the person lost his
00:50:45high-powered job went to prison
00:50:47something like that but it’s kind of
00:50:49indicative that something else is going
00:50:51on and this is where the analyst would
00:50:52come in and start investigating that
00:50:54data so these are completely unlabeled
00:50:56and the algorithm has absolutely no idea
00:50:59what it means
00:51:00and it still takes human to do some
00:51:02analysis and to do some investigation to
00:51:04figure out what has happened but these
00:51:07kinds of tools lead the analysts in the
00:51:09right direction as opposed to just
00:51:11taking a random Sam
00:51:12and then finally on the right hand side
00:51:14we’ve got a 3d representation of the
00:51:15same data and this is where it becomes
00:51:17really really powerful you can imagine
00:51:19like if you could get that graph and you
00:51:21can like look into it and and move it
00:51:24and turn it around and you can start to
00:51:26see clusters in 3d space and that’s when
00:51:28it starts to become immersive and given
00:51:31enough time it takes it takes a certain
00:51:32amount of time for any analyst to
00:51:34analyze data but given enough time they
00:51:36will be able to learn to see patterns
00:51:38within that data which will help them to
00:51:41investigate things that they haven’t
00:51:43seen before and I think I better stop
00:51:45there because I’ve completely run out of
00:51:46time so thank you very much for
00:51:47listening
00:52:00you