GOTO 2015 • Modern Fraud Prevention using Deep Learning • Phil Winder

00:00:10hello everyone.welcome this is modern

00:00:14fraud detection prevention using deep

00:00:16learning that title was submitted quite

00:00:19a long time ago so I’d say actually the

00:00:22talk is probably bit more about machine

00:00:24learning in general now we had a good

00:00:28talk earlier on in the day introducing

00:00:30some of the concepts we time it behind

00:00:32learning and I’m hoping to sort of build

00:00:35on them really so this talk is going to

00:00:37be a bit more technical there’s there’s

00:00:40no maths which you’ll be glad to hear

00:00:42there’s there’s also no code I’ve tried

00:00:44to you know explain myself using

00:00:45diagrams and pictures wherever I can but

00:00:48it is a much more technical talk so

00:00:50hopefully you can get you your teeth

00:00:52into it we’ve got the usual slides at

00:00:54the front saying please rate and engage

00:00:57so yeah my name’s Phil I’m with try fork

00:01:00but we’re in try fot leads so we’re

00:01:02quite distinct from the Danish

00:01:04mothership yeah I actually am a software

00:01:10engineer in my in my professional life

00:01:12machine learning is a just a bit more of

00:01:15a hobby I’m currently working on there

00:01:16and Apache meat sauce framework for

00:01:18elasticsearch yeah if you’d if you’d

00:01:21like to talk more about any of the

00:01:22subjects that I’m about to discuss then

00:01:25please see me or I’ll see some of my

00:01:27colleagues listed at the bottom there

00:01:28I’m gonna skip the marketing slides

00:01:31because you will not try and we’re

00:01:35split into three or four topics the the

00:01:39final one the architectures that is is

00:01:42more about how we would do this in

00:01:44production how we would do this in real

00:01:46life it’s it’s interesting but it’s not

00:01:49really the core thing of my my talk so

00:01:51I’m going to go through the first three

00:01:53sections and if we have time we might do

00:01:55the fourth but I’ll probably end up

00:01:57speaking for too long and I’ll probably

00:01:58drop that section I’m going to introduce

00:02:02the the reasons why we want to provide

00:02:05some new tools and techniques to apply

00:02:07to fraud to try and make the case to the

00:02:10business users as why you should pick up

00:02:12on some of these ideas and start

00:02:13to run with them I’m gonna then

00:02:15introduce the topic of machine learning

00:02:17and you’ve probably had quite a bit of

00:02:20experience already but if you haven’t

00:02:21that that’ll be the section that really

00:02:23explains what’s going on and and why it

00:02:25happens and I’ve also got quite a lot of

00:02:28demos as well some of the demos are

00:02:30quite simple and very general just to

00:02:32explain the concepts but the rest of the

00:02:35demos are all focused towards fraud

00:02:37prevention focus towards finance and

00:02:38specifically mortgages okay so let’s

00:02:43crack on so in order to to do any of

00:02:47this work we need to persuade some

00:02:49people to give us some money and there’s

00:02:51no better reason to get people to give

00:02:54us some money if there’s other money at

00:02:56risk in the UK we’ve got some UK

00:02:59specific facts here in the UK financial

00:03:01crime is defined as I can’t even read

00:03:04that screen so I have to read from here

00:03:05sorry fraud is an act of deception

00:03:07intended for personal gain or to cause a

00:03:10loss to another party so all of these

00:03:12facts and figures it specific to the UK

00:03:15but they’re they’re applicable to pretty

00:03:17much every country in the world anybody

00:03:20that’s trying to do wrong to do harm for

00:03:23their own financial gain is considered

00:03:26fraud we’ve got a UK mortgage fraud

00:03:29listed there in 2014 a 1.2 million

00:03:33properties bought and sold in the UK and

00:03:3983 in every 10,000 of those applications

00:03:42were fraudulent so that’s not quite 1%

00:03:460.8 3% and when we say when you say

00:03:50fraud in that that aspect it’s not

00:03:52necessarily people being like hugely

00:03:55devious we’re going from the small scale

00:03:57where somebody’s maybe telling a few

00:03:58fibs about their employment history or

00:04:00how much they earn all the way up to

00:04:02huge huge you know international fraud

00:04:05in 2013 there was a story of two guys

00:04:08who had invented there a whole series of

00:04:12companies that invented estate agents

00:04:15that invented surveyors they’ve invented

00:04:17property businesses and builders and

00:04:20they had supposedly bought a huge tract

00:04:23of land which they were going to build

00:04:24you know lots of new houses on the

00:04:27invented or stole the identities of

00:04:29other people to take out mortgages on

00:04:31those respective houses so it turns out

00:04:34there were tens you know tens to

00:04:36hundreds of mortgage applications all

00:04:38going in for houses that hadn’t been

00:04:40built yet

00:04:40but as it turned out they just took that

00:04:42money paid off the original land the

00:04:45original debt they worked they owned the

00:04:46land and then just liked it they just

00:04:47ran off they completely invented a

00:04:49village bought loads and mortgages based

00:04:52upon that and then ran off how how can

00:04:55that even so that the total cost finally

00:04:57came to it was about 53 million pounds

00:04:59and managed to to run away with and they

00:05:01did finally get caught but they very

00:05:03nearly got away with it because it was

00:05:05just so embarrassing

00:05:06you know the mortgage company was so

00:05:08embarrassed to say that this had

00:05:09happened it almost never even got caught

00:05:11so it does it does get to quite a large

00:05:14scale and this this actually equates to

00:05:17approximately 1 billion pounds worth of

00:05:20fraudulent applications so it’s a huge

00:05:22huge number but my interestingly it’s

00:05:26not actually the worst case of fraud in

00:05:28the UK the worst is actually credit

00:05:30current account fraud so traditionally

00:05:33what would what people would do is to

00:05:35steal somebody’s information open a

00:05:37standard bank account current account

00:05:38some sort from from a traditional bank

00:05:42which you can do quite easily in the UK

00:05:43and then use the overdraft or use some

00:05:47facilities to actually withdraw some

00:05:48money and then and then run a runoff

00:05:51so that actually constitutes the most

00:05:53fraud in the UK but we’re talking a

00:05:56little bit about mortgages today and

00:05:59finally we’ve got UK real retail fraud

00:06:03much of the business in the UK is

00:06:05actually made up of small to medium

00:06:07sized enterprise

00:06:09the big guys actually don’t they make a

00:06:12significant part of the market but not

00:06:13not a huge part small to medium-sized

00:06:16businesses they’re estimated losing

00:06:18eighteen billion pounds every year to

00:06:21fraudulent transactions so that’s when

00:06:23somebody goes online buy some clothes or

00:06:25buy some food or buy some shopping of

00:06:27some kind on a credit card and then

00:06:30maybe they cancel a credit card as soon

00:06:32as I place the order so the the guys on

00:06:35the retail side of having to ship all of

00:06:36this stuff only to find that the person

00:06:38you know doesn’t exist or

00:06:40card stolen or stuff like that and that

00:06:43amounts to a huge amount as well another

00:06:46reason why businesses might want to look

00:06:49at some of these ideas is that

00:06:51legislation so we’ve got one end of the

00:06:53spectrum where there’s people actually

00:06:54doing wrong to their businesses you

00:06:56might want to try and protect yourself

00:06:57but also this legislation legal

00:06:59requirements that need to be put in

00:07:00place in order to comply two more in

00:07:052017 there’s new anti money laundering

00:07:07legislation coming in within the EU so

00:07:09it applies to all EU countries it’s

00:07:12extending extending money laundering

00:07:14rules that are already in place but the

00:07:17main changes are that the out of scope

00:07:20limit has dropped to a thousand euros

00:07:22the previously it was fifteen thousand

00:07:24euros and this applies to businesses

00:07:28that are handling financial transactions

00:07:30so it applies to banks obviously

00:07:33financial institutions credit agencies

00:07:35stuff like that it also applies to legal

00:07:38services in the state services it also

00:07:42applies to to gambling services

00:07:43basically anybody that’s handling and

00:07:45moving money around has to comply with

00:07:47this legislation and what this is saying

00:07:50is that anybody that has a transaction

00:07:52of over a thousand euros they need to

00:07:54prove to the authorities that they’re

00:07:56doing their due diligence in to prove

00:07:59that that person is a not being

00:08:01fraudulent and be not using the money

00:08:04for nefarious means like terrorism or

00:08:06something like that and finally then

00:08:08they’re they’re required to submit their

00:08:11information to a central registry of

00:08:13information and this there’s a well

00:08:17there’s obviously previously concerns

00:08:19there but that’s a bit unclear and how

00:08:21that’s actually going to be implemented

00:08:22so there’s financial reasons direct

00:08:27financial reasons why you want might you

00:08:28might want to do it is also legal

00:08:29reasons so how do we do it at the moment

00:08:33well if a traditional company was goat

00:08:36would go to a software house and ask for

00:08:39some software to do this they would

00:08:40probably come up with these some

00:08:43combination of these four general ideas

00:08:45we’ve got the origination based

00:08:47technique so most countries have a law

00:08:50that requires financial services to

00:08:53prove the

00:08:54they’re talking to the real person

00:08:56origination is that’s it that’s what

00:08:58origination is I won one thing I get

00:09:03really really annoyed about is banks in

00:09:05the UK they’ve got this awful technique

00:09:07of using automated phone systems to try

00:09:10and prove you are who you say you are so

00:09:12you go through the whole series of you

00:09:14know police typing your ID number please

00:09:16type in your address please type in your

00:09:17password please do this please do that

00:09:19and that takes about three and a half

00:09:20minutes and then as soon as you finally

00:09:22speak to a real person which is all you

00:09:24wanted to do in the first place

00:09:25as soon as you’ve written speak to a

00:09:26real person they ask all the same

00:09:28questions again and it turns out they do

00:09:30this because these businesses aren’t

00:09:32quite sure that the automated method

00:09:34really is proof enough that the personal

00:09:38methods are actually going through a

00:09:39variety does my head in and some some

00:09:44may be less secure instances such as

00:09:47insurance agencies and people that are

00:09:49not necessarily as interested in

00:09:51protecting security they can use some

00:09:53really quite dodgy methods like I’ve had

00:09:55some cases where people have asked me

00:09:57just for my date of birth or just for my

00:09:59postcode or something like that and

00:10:01they’re completely not secure your date

00:10:02of birth is basically a password you

00:10:04were given at birth you can’t change

00:10:06it’s fixed and you have to live with it

00:10:08so it’s the worst password that that can

00:10:10ever exist the next group of

00:10:14technologies are rules based so these

00:10:17are static rules that are usually

00:10:18provided by analysts saying that you

00:10:20know no transaction must be bigger than

00:10:22X or you can’t have so many transactions

00:10:25within a certain period of time

00:10:26something like that and they’re and

00:10:29they’re great and they’re okay and they

00:10:31catch a reasonable amount of fraud it’s

00:10:34usually the the accidental types and the

00:10:38basically the not so intelligent

00:10:41fraudsters would try and do something

00:10:43silly like this but also it u also catch

00:10:47all the good guys as well like like when

00:10:49you’re abroad you cards always declined

00:10:51the first time because they think it’s

00:10:53fraudulent or you know any trying to buy

00:10:55a new car from a guy and he you know

00:10:59takes cash and you try and pull out 1500

00:11:01pounds out of the cash machine you can’t

00:11:03do it because it’s you know it’s against

00:11:05their static rules credit checks

00:11:08lots of agencies will gladly accept your

00:11:10money to provide you with a number

00:11:12that’s it and these numbers are supposed

00:11:16to represent the worthiness or the the

00:11:19risk that that person provides to your

00:11:23business and there is certainly a case

00:11:26there’s an argument to use them how

00:11:29accurate they are is another question

00:11:34aggregation and monitoring so this is

00:11:36more of a reactive type of solution

00:11:39where analysts would be provided with

00:11:41the data and they you know perform some

00:11:45query or ask a question and try and do

00:11:47something based upon that so for example

00:11:49you can have some guys that find a

00:11:52pattern between you know one cash

00:11:54machine for example gave up a large

00:11:56amount of money so the analyst will when

00:11:58they check it out so they’re the types

00:12:01of things that exist in the wild at the

00:12:03moment but now I’m going to start

00:12:06talking about machine learning and how

00:12:07we can use machine learning to improve

00:12:10some of those technologies and try and

00:12:13remove some of the bias or the

00:12:14redundancy or the error out of those

00:12:16technologies okay so following on from

00:12:22our excellent presentation this morning

00:12:25I forgot the first name miss Pitt sorry

00:12:28if you hear she was talking about how we

00:12:31learn I I also have a couple of slides

00:12:33but it’s not it’s it’s a bit more basic

00:12:36I like to introduce my my daughter here

00:12:38she’s she’s 18 months old and she’s

00:12:41currently going through this process of

00:12:43learning and it’s really fascinating to

00:12:45watch how she does this because there’s

00:12:47there’s lots of parallels between this

00:12:48and between the state machine learning

00:12:50algorithms at the moment and if we can

00:12:52understand how how we learn it actually

00:12:55helps us to write better algorithms and

00:12:56it helps you to understand the

00:12:57algorithms as well so this is my

00:13:00daughter with her her mother my wife

00:13:02making some yummy rice crispy crispy

00:13:05chocolate square things and in the top

00:13:08picture there she’s doing exactly what

00:13:09mom told her please take the rice

00:13:11krispies and put them in some baskets

00:13:13and then we can eat them later on but

00:13:15somewhere along the line she decided to

00:13:17perform some tests

00:13:18she decided if I put this thing in my

00:13:20mouth

00:13:21it gonna be good or is he gonna be bad

00:13:23so she put it in her mouth and he was

00:13:24good

00:13:25so she completely ignored any

00:13:27instructions from there not because

00:13:28she’d learned that eating chocolate with

00:13:30Rice Krispies was a good thing so that’s

00:13:32a very simple example of how children

00:13:35learn and how algorithms learn in

00:13:37general you you provide them with some

00:13:39tests with some input and then they

00:13:41evaluate that input and decide on some

00:13:43outcome

00:13:46it takes time however Shoei she’s 18

00:13:49months and she’s still pretty stupid you

00:13:52know she can’t work she’s struggling to

00:13:54put sentences together she she can when

00:13:56she walks she falls flat in her face she

00:13:59gets spatulas and misses a mouth and

00:14:00hits her eye and it’s too late it’s not

00:14:02good so it does take time for this to

00:14:04happen this applies to to algorithms as

00:14:06well it take time to learn we’ve got

00:14:09this great game that she loves which are

00:14:11index cards and this is an example of

00:14:13how she gets things wrong I mean she’s

00:14:15she’s very good I yeah she’s really good

00:14:17I don’t give you the impression that I’m

00:14:19a bad father I’m saying she’s rubbish

00:14:21and get rid of it but no she’s very good

00:14:22but in some cases she does get it wrong

00:14:24the first example on the left there is a

00:14:27door however she thinks it’s a house and

00:14:30she thinks it’s a house because it’s got

00:14:31four walls and it’s got these features

00:14:33in the middle which are like squares

00:14:34which kind of look like windows but what

00:14:37she hasn’t learned yet is that a house

00:14:38actually needs a triangle on the top and

00:14:40so this is a this is an example of a

00:14:43misuse of features so there are features

00:14:45there but she’s misusing them to come to

00:14:47the wrong conclusion the second one she

00:14:50calls this a chicken because she doesn’t

00:14:52quite understand the concept of a bird I

00:14:54think she she struggles to to to

00:14:57understand classes of things she’s quite

00:14:59happy to learn that that thing is

00:15:00definitely a bird and that thing is

00:15:02definitely a teddy and that thing is

00:15:03definitely mommy and that thing is his

00:15:05dad went it around but she struggles

00:15:10with things so that’s a chicken so

00:15:11that’s so that’s okay but that’s just an

00:15:13example of a Mis classification and then

00:15:16finally we’ve got the third picture and

00:15:17apparently that’s a tiger now I went out

00:15:22when I show this cat she kind of looks

00:15:24at me and goes I’m not sure what it is

00:15:28and then I look at the car go

00:15:29I’m not sure that is either idea I think

00:15:32sometimes she goes for a cat sometimes

00:15:34she goes for

00:15:35there sometimes I don’t know I don’t

00:15:37even know what it is it looks like

00:15:38something sort of ran over it it’s like

00:15:41a cat that’s been ran over basically and

00:15:44that’s a great example of just bad data

00:15:46so in real life you will get that data

00:15:48and there’s a big cleaning method that’s

00:15:49required to try and prevent you from

00:15:51getting this bad data because you will

00:15:52come to the wrong result so just to

00:15:55prove that it’s not just her age I’ve

00:15:58got an example for all of you so take a

00:16:00look at this picture and I’m just going

00:16:02to watch you for a second right so so

00:16:09for all the programmers out there this

00:16:10is like a human equivalent of like a

00:16:13stack overflow so what you start doing

00:16:15is you try and focus in on their eyes

00:16:17but then you realize that she’s got eyes

00:16:19in a different place so you kind of jump

00:16:20across and then you realize the mouth is

00:16:22in the wrong place so you jump again and

00:16:23you’re up and down and up and down and

00:16:25if you stare at it long enough you start

00:16:26to feel sick so to that and but but all

00:16:30this is proving is that you’ve learnt

00:16:32some specific things over time you have

00:16:34you know decade’s worth of experience to

00:16:36say what a face which should look like

00:16:38and when it doesn’t look like that you

00:16:39don’t quite know how to process it and

00:16:42we can get it wrong no humans are

00:16:45completely infallible fallible sorry

00:16:48they’re wrong choice of words they’re

00:16:50completely fallible ok so moving on to

00:16:54the more technical topics here machine

00:16:56learning comprises a four-ish sort of

00:17:00distinct components they’re all trying

00:17:01to do slightly separate different things

00:17:03the first item is dimensionality

00:17:06reduction so when we think of data it

00:17:08has a number of dimensions and by

00:17:10dimensions are basically mean like a

00:17:13single point of information so if you

00:17:16imagine a 10 by 10 grayscale picture

00:17:19that has like a hundred dimensions a

00:17:22hundred pixels in there which all

00:17:23represent a distinct piece of data the

00:17:26problem with that is that with images

00:17:28it’s ok but for many other types of data

00:17:31it’s really hard to try and visualize

00:17:32what’s going on so you’ve got to

00:17:33compress that space down into two or

00:17:36three dimensions in order to actually

00:17:38see what’s going on so that’s the act of

00:17:40dimensionality reduction we’ve got

00:17:43clustering where we’re trying to assign

00:17:46an output to a certain class

00:17:49quite often we know what class it should

00:17:51belong to or at least we should know how

00:17:54many classes there are at least so

00:17:56clustering is the process of trying to

00:17:57group things together into distinct

00:17:59classes we’ve got classification which

00:18:01is linked to clustering where that’s

00:18:03more asking the question exactly where

00:18:06do I put the line to say that’s Class A

00:18:08and that’s Class B and finally

00:18:11regression which is trying to predict a

00:18:13value based upon their previous inputs

00:18:16we’ve also got different types of

00:18:18learning as well learning is the key

00:18:20thing that’s this really enabled deep

00:18:22learning to to come to the forefront is

00:18:23that the new training techniques that

00:18:25have been developed are so much more

00:18:27powerful than they were in the past

00:18:29training can be split into supervised

00:18:31and unsupervised learning supervised

00:18:33learning is where you have an expected

00:18:36result so it’s a it’s labeled so you say

00:18:39that this raw data is supposed to belong

00:18:41to Class A this is supposed to be the

00:18:43number one or this person is fraudulent

00:18:48the algorithm is then trained the

00:18:51parameters of the algorithm and then

00:18:52tuned to try and produce that same

00:18:56result and the the measure of

00:18:59performance for that algorithm is

00:19:01compared to the true result versus the

00:19:03predicted Frizzle and then when you were

00:19:07to use this in in real life if you had

00:19:09new data coming in then you would use

00:19:10those pre learnt weights and you would

00:19:13predict an output based upon that for

00:19:17unsupervised

00:19:17you’ve got no results so you don’t know

00:19:19exactly what class it’s supposed to

00:19:21belong to algorithms are trained in you

00:19:25need to decide on on what’s going to

00:19:28provide you with a measure of how good

00:19:31your algorithms be trained so some some

00:19:33of them deciding whether data are close

00:19:36or far away so since this measure of

00:19:38distance between data the there’s also

00:19:42may be other reasons why you want to do

00:19:44it as well and you can provide your own

00:19:45we’re talking about

00:19:47customized or personalized customized

00:19:51functions to actually cost whether your

00:19:54output is going to be labeled as class 1

00:19:56or class 2 if something is important but

00:19:59in the real in the real world most data

00:20:01is usually semi-supervised

00:20:02you usually start off with some label

00:20:05data and usually a lot more that is

00:20:08unlabeled so you can kind of combine

00:20:10these two things together to maybe you

00:20:12can use the labeled stuff to start to

00:20:14bring out some of the clusters and then

00:20:16apply the unlabeled data to you know

00:20:18really filling the pattern a bit more so

00:20:23let’s talk about some specific

00:20:24algorithms I’m going to talk about to

00:20:27every every guy’s got his own favorite

00:20:30algorithm this first one is called a

00:20:34decision tree and there’s various

00:20:35different types of decision tree but

00:20:37we’re going to stick to the simple one

00:20:38for now and they can be used for

00:20:40classification and regression and the

00:20:43idea is that they predict the target of

00:20:46the target value of a class or a value

00:20:49or something based upon some very simple

00:20:51decision rules so is it less than 10 or

00:20:54bigger than 10 is it is it labeled a or

00:20:57labeled B the example we’ve got there on

00:21:01the right is quite morbid actually this

00:21:03is a decision tree that’s been learned

00:21:05from the data provided from the Titanic

00:21:08manifests and this is predicting whether

00:21:11you’re going to survive if you were on

00:21:12the Titanic or not so the first question

00:21:15it asks is is the sex male so if it was

00:21:20yes then it goes down to one side of the

00:21:22tree on the Left if it was no it goes

00:21:23down the right side of the tree so if

00:21:25you were female you had a pretty good

00:21:28chance of 0.73 so 73% chance of

00:21:32surviving and that represents 36% of the

00:21:35entire population inside the Titanic or

00:21:38as if you were male and if you were

00:21:40above 9.5 then you’ve got a fairly big

00:21:43chance that you’re going to die

00:21:45unfortunately 61% of all males of a 9.5

00:21:48died and you can see that you can go

00:21:52down the tree and you can make a

00:21:53decision based upon these rules so the

00:21:55idea of the algorithm is to train these

00:21:57parameters these rules these decision

00:21:59points to optimally make the right

00:22:02decision

00:22:03so it’s conceptually quite simple it can

00:22:06handle categorical data which is great

00:22:08because some algorithms can’t but it

00:22:10well decision trees specifically can

00:22:13ooph it quite badly but there are lots

00:22:15of methods

00:22:15to to use decision trees in a different

00:22:18way to prevent the overfitting so don’t

00:22:19worry about that too much and decision

00:22:21trees are usually one of the simplest

00:22:24and sometimes effective enough to solve

00:22:28a problem the next algorithm and what’s

00:22:33surrounded by lots of hype at the moment

00:22:35is deep learning so deep learning is

00:22:39it’s really good because you remember

00:22:42those classes of types of algorithms at

00:22:45the start there he actually does all of

00:22:47them he does the dimensionality

00:22:48reduction the classification the

00:22:50regression and the clustering it could

00:22:51do all of it it’s a holy grail of

00:22:53algorithms no other algorithm can

00:22:55actually do all the same things the idea

00:22:59is that it’s actually trying to model

00:23:00our learning process in our brain

00:23:03basically it seems to model the neurons

00:23:05and the synapses in your brain to do the

00:23:07similar sort of tasks it’s it’s

00:23:10simplified somewhat but that’s that’s

00:23:12the general idea so the hope here is

00:23:14that if we can produce a model that of

00:23:16our brain that then we can merit right

00:23:18algorithms to perform things that our

00:23:21brain can do quite easily like

00:23:22recognition classification things like

00:23:24that so the pros and cons again it’s

00:23:29very versatile can be used for lots of

00:23:31different tasks

00:23:32the key improvement really is that it

00:23:36begins to remove the requirement of

00:23:39feature engineering so with all of the

00:23:41other algorithms your algorithm will

00:23:43live or die based upon what features you

00:23:46give the input you need to work really

00:23:48hard with other algorithms to to say

00:23:50that this is the most important feature

00:23:51I’m going to keep that and use that but

00:23:53those are the ones are completely

00:23:54redundant I’m going to remove them and

00:23:56that takes a significant amount of time

00:23:58with deep learning it has the ability of

00:24:01internally during the training stage of

00:24:03either completely removing parameters or

00:24:06completely keeping parameters purely

00:24:09based upon how well it fits the data how

00:24:11well the training process goes so it

00:24:14removes the bias that comes from

00:24:15removing data or adding data that you’re

00:24:17not sure it should be there or not the

00:24:21the main con actually there’s a suppose

00:24:24there’s a couple of cons the biggest one

00:24:25is it can be hard to visualize as soon

00:24:27as you start getting into

00:24:29neural network sizes that are quite deep

00:24:32it can be quite hard to visualize and

00:24:34conceptualize I’m hopefully going to try

00:24:36and prove that wrong in a little bit but

00:24:38um that’s that’s the problem number one

00:24:41and problem number two can be quite

00:24:42computationally expensive but that’s

00:24:44that’s true for kind of lots of these

00:24:46algorithms really so how do they

00:24:49actually work well they all it works

00:24:52primarily by trying to conceptualize

00:24:54things so there’s this idea that that

00:24:58neural networks are acting like a

00:25:00hierarchy of of concepts and the the

00:25:05whole goal really is to take those

00:25:06images also take your data and produce a

00:25:09concept something that accurately

00:25:11describes what is provided at the input

00:25:13so we’ve got the couple of the concepts

00:25:16on the left there we’ve got a street an

00:25:18animal and a person but you can see that

00:25:20you don’t

00:25:21the to the bottom ones the person and

00:25:24the animal there they’re actually linked

00:25:25by another concept you know they’re both

00:25:28animals is just one of them’s human so

00:25:30the great thing about the delayering

00:25:33concept is that you can actually start

00:25:36to tag things that are similar but not

00:25:39quite the same based upon your training

00:25:40data so to be more specific this says is

00:25:44a an example of how you would go about

00:25:49conceptualizing an image so each pixel

00:25:53within the image that’s the dashed lines

00:25:55there that would be passed into the

00:25:57input of our deep learning and it would

00:25:59start to reduce concepts around those

00:26:01pixels so the first layer might decide

00:26:04that there’s a you know part of a tire

00:26:06or a pile of a rim or an end plate or

00:26:08something like that usually very small

00:26:10discreet kind of local things within the

00:26:13image the next layer might start to

00:26:15build in that concept and build a

00:26:17concept of a tire or a full wing or a

00:26:19real wing and then finally we get to the

00:26:21classification and in this case is an f1

00:26:24car but you can imagine that if you then

00:26:27showed the algorithm a normal car it

00:26:30could reuse some of those concepts they

00:26:32all they still have wheels they still

00:26:34have you know cockpits or our bodies

00:26:36away probably don’t have wings I don’t

00:26:38know maybe maybe in Leeds I don’t don’t

00:26:40about Denmark

00:26:42but you can reuse some of these concepts

00:26:45and that kind of shows the applicability

00:26:47to not just not just problems that it’s

00:26:49already seen but also future problems

00:26:51that it hasn’t seen and so just to

00:26:55finish this section off really just

00:26:56machine learning in the news or deep

00:26:57learning in them in the news the the one

00:27:00I really like that’s accessible to

00:27:02anybody really is the Google the new

00:27:04Google Translate app that takes pictures

00:27:06of signs or text in a different language

00:27:08and it translates that text but the real

00:27:11the cool USP of the whole thing is that

00:27:14it actually takes the image and replaces

00:27:17the image with the correct text in your

00:27:19language so here we’ve got a Russian

00:27:21sign and it’s replaced it with the

00:27:24English here actually I say he says

00:27:26access the city but according to my

00:27:29friend who who speaks Russian it

00:27:31actually means exit to village so not

00:27:34access to City exit to village but it’s

00:27:36not quite as grandiose if we showed if

00:27:38Google showed us science and exit to

00:27:39village so it’s probably why they

00:27:41changed it and then we’ve got the the

00:27:44images at the bottom and this is a new

00:27:46chip developed by IBM it’s been a few

00:27:48years in the making actually but

00:27:50effectively it’s a a deep learning

00:27:53neural network type infrastructure

00:27:56inside a chip so obviously you’ve got

00:27:58the cause and you used to the cause

00:28:00imagine the cause parallelized massively

00:28:03so instead of having you know one call

00:28:05we’ve got tens of thousands in this case

00:28:07is actually a million there’s a a

00:28:08million neurons in this chip so it’s

00:28:10able to do a million parallel tasks all

00:28:13at the same time and when we go through

00:28:17some of the examples in in a minute

00:28:18we’re going to be talking about like

00:28:20image sizes like they’re 10 10 by 10 100

00:28:23input pixels that go down to maybe 2 to

00:28:262 outputs on there 2 dimensions on the

00:28:29output so that’s kind of nothing in

00:28:32comparison to what this could do and

00:28:34this is actually in hardware as well so

00:28:36it’s super fast super low power and

00:28:38should produce some really interesting

00:28:40applications ok so it’s just to solidify

00:28:45the howdy learning works I’m going to

00:28:48take you through an example which is a

00:28:54description

00:28:55of some some numbers here so the the

00:28:58idea of this task is to recognize some

00:29:02handwritten digits and to classify them

00:29:04as a number from 0 to 9 so it’s a really

00:29:07classic here machine learning example

00:29:09but it’s really great to use in the

00:29:11example as an example because it’s very

00:29:14easy to understand very very easy for

00:29:15everybody to understand it’s just trying

00:29:17to recognize what that number is and the

00:29:20first thing we notice when we start

00:29:21looking at the data so the first step in

00:29:23any in any data analysis job is to have

00:29:25a look at the data and the first thing

00:29:27we notice is that if you actually if you

00:29:29look at that that top left number there

00:29:31so I’m not not completely sure whether

00:29:35that’s a 5 or a that’s 3 and this

00:29:39immediately brings problems because this

00:29:41data is actually labeled so every one of

00:29:43these examples you’ll see so each each

00:29:46number is an example you can see that

00:29:48it’s been inverted from maybe you’ve

00:29:49somebody written pen on white paper and

00:29:52it’s being inverted and then reduced to

00:29:56a fixed pixel size and then sent it as

00:29:58well and the first thing that we can see

00:30:00is we’re already not sure whether that’s

00:30:02a 3 or a 5 and so somebody’s gone

00:30:04through and labeled this data as being a

00:30:063 or a 5 but I’m not convinced that

00:30:08that’s actually correct so we’re giving

00:30:10our algorithm potentially dodgy data

00:30:13already so there are in mind whenever

00:30:15you’re trying to train data that your

00:30:17your label data might not be right in

00:30:19the first place because it’s usually

00:30:20it’s usually labeled by by humans so

00:30:25what we then do with each example is we

00:30:27feed it into an input layer so I’m

00:30:29trying to stay away from the term neural

00:30:32network although I’ve mentioned it a

00:30:33couple of times because that it’s been

00:30:35around since the 80s but it it sounds

00:30:38complicated but it’s really not all the

00:30:39neural network is you have a node where

00:30:42some data goes in and then you have have

00:30:45links to an annexe subset of nodes and

00:30:48those are those links all have weights

00:30:50that it’s as simple as that all we do is

00:30:52we alter the weights within the the

00:30:55network in order to perform a task so

00:30:58I’ll try and refrain from using that

00:31:00terminology so our input layer is

00:31:02usually the same size as the size of the

00:31:05data so here we’ve got made maybe 10 by

00:31:0710 pixels so we’ve got 100 inputs

00:31:09have one input for each pixel we then

00:31:15pass that data through to what’s known

00:31:17as a hidden layer and we call it hidden

00:31:19layer a bit basically because it’s not

00:31:20an input or an output it’s something in

00:31:22the middle it’s not directly observable

00:31:25and the way in which they’re connected

00:31:27is with a weight and during the training

00:31:30process those weights could be you know

00:31:32completely removed by setting it to zero

00:31:34or you know completely kept by sitting

00:31:36it’s all one and that’s all the training

00:31:38process is doing so what’s really great

00:31:46at this point is that those weights

00:31:48actually they combine in the next layer

00:31:51so you might have learnt that the

00:31:54weights that have been learned for that

00:31:57one particular neuron in the hidden

00:31:59layer can actually be treated as like a

00:32:00feature this is this is the beginnings

00:32:02of a concept so it’s saying that given

00:32:05that one neuron that one item in the

00:32:08hidden layer there that has that has

00:32:12certain weights on each of the input

00:32:14pixels so if we if if we were to make

00:32:18that the output layer there we could

00:32:20imagine that if that was the the output

00:32:22layer for the number one the weights

00:32:24would represent a shape that looks

00:32:26something like the number one generally

00:32:29in hidden layers you have multiple

00:32:30hidden layers so you’re trying to get

00:32:31the algorithm to learn these small steps

00:32:33these small increments of of concept and

00:32:38what we can actually do is to say that

00:32:40for for that one hidden layer we can go

00:32:42back and say what does the input layer

00:32:43have to look like in order to fully

00:32:45activate that one neuron and only that

00:32:47one neuron so this is an example of that

00:32:50hidden feature layer here and it might

00:32:53look a bit abstract but you you can just

00:32:56about start to make out that it’s

00:32:57starting to learn this kind of ghostly

00:33:00images of numbers in there and that’s

00:33:02because it’s starting to learn some of

00:33:03these concepts if you were to use a

00:33:05number of hidden layers and say you know

00:33:07don’t don’t try and learn the number all

00:33:09in one go it might come up with features

00:33:11that are like edges maybe it could learn

00:33:14the edge of the stick of a7 or maybe you

00:33:16can start to learn some curves of a nine

00:33:18or something like that and these are the

00:33:20hidden features that are in the middle

00:33:21of all these these networks

00:33:24so then finally we would produce an

00:33:26output layer which usually amounts to

00:33:29the number of possible classifications

00:33:32that we want to make so for our output

00:33:35layer we would have 10 we would have 0

00:33:37to 9 and each one of those nodes would

00:33:39represent a number and at the output

00:33:43layer if we were to actually put one of

00:33:44these examples in you’d never get 100%

00:33:48you always get this the we’re talking

00:33:52earlier about how they’re they’re not

00:33:53deterministic but you kind of they are

00:33:56deterministic in the sense that they

00:33:57have fixed weight so you can follow the

00:33:59path of those weights through the data

00:34:00however we’re never quite sure like

00:34:03going back to that previous example

00:34:05we’re never quite sure whether it’s a 5

00:34:07or a 3 so we’re going to the algorithm

00:34:09will probably decide that I’m 50 percent

00:34:12sure that it’s a 5 but there’s a 40%

00:34:14chance there could be a 3 so all of the

00:34:17numbers that are generated basically the

00:34:19the classification is made by picking

00:34:22the highest of those numbers so in this

00:34:23case would say that the 5 is the

00:34:26classification for this example because

00:34:28that add the highest value at the output

00:34:32but what’s really cool as well is that

00:34:35we can actually rather than try and tell

00:34:39it to classify the objects by only

00:34:41having 10 outputs we can actually

00:34:44produce the same number of outputs and

00:34:46inputs and say ask the algorithm please

00:34:49try and reconstruct the image based upon

00:34:51your hidden you know concepts and

00:34:54representations so what we can do here

00:34:56is given a certain output please reduce

00:35:00reproduce that input and then we could

00:35:02do some comparison to see how well it’s

00:35:04performed so this is an example of what

00:35:07a reconstruction actually looks like and

00:35:09if I just flick backwards or forwards

00:35:11between what was real what was the real

00:35:14input and what was the learned concepts

00:35:16about that you can kind of see that the

00:35:18learned concepts are kind of like a

00:35:19drunk blurred version of the real number

00:35:22and that’s because they’re kind of

00:35:24learning they did what the most likely

00:35:27look is for that particular number and

00:35:29and what’s really interesting is in the

00:35:32real data with what we won’t show

00:35:34whether that’s 3 or 5 but if you look at

00:35:36the drunk verse

00:35:37it actually looks a little bit more than

00:35:40a five and this is saying that the

00:35:41algorithm was decided um well but it’s

00:35:43probably been labeled as a five so that

00:35:45so the algorithm has has learnt that of

00:35:47those features as a five so when you try

00:35:49and reconstruct it it looks more like a

00:35:51five and then finally we talked about

00:35:55dimensionality reduction so what we can

00:35:57do is take that high dimensional output

00:36:00so in this case we have ten discrete

00:36:03classes from zero to nine and we can

00:36:05flatten them into space so we don’t have

00:36:07ten dimensions to plot all our data so

00:36:09we can’t we can’t plot the 50% of the

00:36:11five to thirty percent of the for the

00:36:13twenty percent of the three and so on

00:36:15and so on all on a graph because we

00:36:16don’t have that many dimensions so what

00:36:18we can do is flatten all of that into

00:36:20two dimensions and this is what this

00:36:21process is here and what it shows you is

00:36:24how well the data are clustering

00:36:27together so we can see if I have stand

00:36:30very close to my screen I can see that

00:36:32the number Seven’s at the bottom are

00:36:34quite well clustered there the number of

00:36:36eights are okay in the top left but then

00:36:39we’ve also got some very strange

00:36:41features like so let’s take the five and

00:36:43a three example you see the fives in the

00:36:45orange in the middle they’re pretty well

00:36:47mixed with the three and that’s kind of

00:36:51because there must be quite a lot of

00:36:52examples that look like a five or look

00:36:54like a three so they’re quite well mixed

00:36:56so that means to actually perform the

00:36:58classification the algorithm is gonna

00:36:59have to work really hard to try and you

00:37:01know pull those apart so this is what

00:37:04you would generally do on the output is

00:37:06you would you would try and visualize

00:37:08the data in such a way that we as humans

00:37:11can couldn’t understand it that could be

00:37:12in 2d or in 3d okay so hopefully that

00:37:18that section kind of introduced you to

00:37:20two deep learning and some of the ideas

00:37:22and some of the terminology so when I

00:37:24come to some of the financial demos

00:37:27there this should be much easier to

00:37:30understand so first example is a

00:37:35traditional example using a rules-based

00:37:39approach and in this case we’ve been a

00:37:42little bit fancy we use in graph

00:37:43database typically graphed over it

00:37:45databases aren’t used as much as we’d

00:37:48like but they do perform really well in

00:37:50a

00:37:51in a fraud based scenario so just

00:37:54quickly recap if you don’t know a graph

00:37:56database is a another new SQL database

00:37:59but its power really is the description

00:38:02of the data so the data can only ever be

00:38:04either a node or a relationship a node

00:38:07is like a thing or a noun whereas a

00:38:09relationship is is a link or a

00:38:12relationship or a or a verb that

00:38:14basically connects two concepts together

00:38:16and the key selling point really is that

00:38:21sometimes you’ve got data that is just

00:38:22better described in a graph like

00:38:24structure so for example when we’re

00:38:26talking about fraud and and finance and

00:38:29stuff

00:38:29you’ve got the concepts of people and

00:38:31accounts and those people and accounts

00:38:33are all linked to different things

00:38:34they’re linked to an address a link to a

00:38:35current account and so on so for example

00:38:40we’ve got the traditional the

00:38:42traditional social media use case where

00:38:46we’ve got bobs these Bobby’s friends

00:38:48with Jane we’ve got a chair contained

00:38:50within a room Jane bought a book and so

00:38:53on but the real power is that once

00:38:56you’ve modeled it in this way you can

00:38:58perform complex queries that you

00:39:00wouldn’t be able to do in a traditional

00:39:02relational database so when you wanted

00:39:05to do so to go back to the social media

00:39:06example again when you wanted to do like

00:39:08who is friends with my friend you have

00:39:10to do some crazy joined with your SQL in

00:39:13order to get that to work with a graph

00:39:14database you can just pop you can just

00:39:16hop through the graph it makes it really

00:39:18really fast so in their fraud situation

00:39:24we might model our data to something

00:39:26like this we might have an account

00:39:27holder in the middle and they have

00:39:28relationships with phone numbers or

00:39:30national insurance numbers things like

00:39:32that and then we can perform queries on

00:39:34that if we would like to but when you

00:39:37start viewing that in detail and

00:39:38actually viewing how these connections

00:39:40are connecting things together

00:39:41interesting patterns start to come out

00:39:43and especially if you’re visualizing it

00:39:44in this way as well it’s much easier to

00:39:46visualize data in this way than it is in

00:39:48a table for example so in this example

00:39:51we’ve got three account holders in red

00:39:53having the red yep they’re red and

00:39:55they’re linked in various different ways

00:39:57we’ve got all three of them are sharing

00:39:59the same address so who could be dodgy I

00:40:02actually had a person in another talk

00:40:03excuse me

00:40:05that III was suggesting that all three

00:40:07people sharing the same address that

00:40:08could be dodgy and and she was like no

00:40:10no no no when thousands of people are

00:40:12sharing the same address then it’s dodgy

00:40:14three is fine don’t worry about it so

00:40:16I’m like okay so but we could set up a

00:40:18rule there to say you know how many

00:40:21people are using the same address and

00:40:22you could do that in the traditional

00:40:23database but where the power really

00:40:25comes in is when you start linking these

00:40:27these things together and searching for

00:40:29these larger rings and groups within the

00:40:31data so if we imagine that directly two

00:40:35people aren’t sharing the same national

00:40:37insurance number for example which is

00:40:38illegal in the UK maybe there’s a third

00:40:41party which is linking these National

00:40:43Insurance numbers together so you

00:40:45actually start to form these rings

00:40:46within the data which are kind of not

00:40:48not natural this shouldn’t really be

00:40:49rings in the data and graph databases

00:40:52are really good at viewing and spotting

00:40:54these rings so that’s the kind of

00:40:56technology that would exist in the wild

00:40:58today if we were asked to to perform a

00:41:01job like this but where we’re really

00:41:05interested in is bringing some machine

00:41:07learning techniques to some of these

00:41:09ideas so the first idea I had was quite

00:41:14a typical one really and that’s why

00:41:16that’s why I did it because it was quite

00:41:18easy to do but basically if we could use

00:41:21vocal fingerprints for origination it

00:41:24would just solve just the the main

00:41:26reasons really it would save the user a

00:41:30significant amount of time the user

00:41:31experience would would you know be huge

00:41:34hugely improved not having to wait on

00:41:37the phone for 20 minutes just because

00:41:38some stupid automated system took you to

00:41:40the wrong place so if we can use their

00:41:43person’s voice as a form of

00:41:45authentication origination then we’ll be

00:41:49able to save time be able to save

00:41:51machines and be able to save their the

00:41:54power of people on the other end of the

00:41:55phone so to do this what we’d have to do

00:41:58is to record the customers voice

00:42:01we then pre-process the data in some way

00:42:03to clean it up and put it in a format

00:42:05that’s that’s capable of being put into

00:42:08an algorithm in this case we would trade

00:42:11a deep learning model but it could be

00:42:12any algorithm and then we’d store that

00:42:14fingerprint for future verification in

00:42:16the online scenario so once you’ve got

00:42:18set up the user would come on you’d

00:42:20rerecord his voice again maybe against

00:42:22the preset phrase maybe against new

00:42:24phrase and then you’d compare that

00:42:26result of the fingerprint and that would

00:42:28prove whether that person is you know

00:42:30really who they say they are so this is

00:42:34the pre-processing stage in action so

00:42:37this is a bit of signal processing which

00:42:39is converting the the time signature of

00:42:42the the audio file into frequency into

00:42:45the frequency domain so what you’re

00:42:47seeing there is a plot of the frequency

00:42:49components versus time so red is strong

00:42:52and that green blue a color is is weak

00:42:54so it’s saying that you know you can see

00:42:57there the gaps in between the data

00:42:59they’re a kind of where that paused to

00:43:01say the words and I think if we’re if it

00:43:04works yeah so this is some example data

00:43:09that I used in my learning and this is

00:43:14three examples of three people saying

00:43:17the same phrase don’t ask me what that

00:43:22phrase actually means I don’t know what

00:43:24anything but anyway you can tell

00:43:27yourself that those three voices sounded

00:43:29sometimes a little bit different but in

00:43:31that last example completely different

00:43:33and what we’re trying to do is to to

00:43:35make the deep learning think the same

00:43:37okay so once we’ve put it into our deep

00:43:42learning model we’ve done the training

00:43:44and we’ve produced an output our output

00:43:46in this case is between these three

00:43:48different people so you could have three

00:43:50outputs and then again we’ve compressed

00:43:52that we’ve squashed that under the

00:43:53screen into two dimensions and this is a

00:43:56plot that shows how close all of those

00:43:59voices were between so we’ve got a

00:44:01couple of different points in there and

00:44:02the the different colors there – Bob

00:44:05Steve and Dave they correspond to the

00:44:07three different examples the three

00:44:09different people giving the example

00:44:10sorry and each individual point is a

00:44:13specific phrase that they said so we had

00:44:15ten ten different phrases that they said

00:44:18and you can see that all of these

00:44:20examples are clustering together quite

00:44:21well so if we then took another they’re

00:44:26the same people but using a different

00:44:28spoken example so not the same examples

00:44:31how would that perform

00:44:32new data so I think we go again so the

00:44:39top line now in the results that was the

00:44:42the raw result the raw output of those

00:44:44three neurons throught for that file and

00:44:46it’s saying that one of the new your

00:44:48honors have 0.98

00:44:50the 10.1 another 100.1 as well and

00:44:53that’s saying that you know Bob

00:44:55definitely pretty sure 19 percent sure

00:44:57that that was definitely Bob

00:45:01there you go 97 percent chance that was

00:45:03Steve there 96 percent it was Dave so

00:45:10that was that example quite a simple

00:45:14example in sense that it only used a

00:45:16very small data set but it’s you know

00:45:18it’s instructive and it kind of points

00:45:23towards things that we could do in the

00:45:25future given much more data I mean like

00:45:27every phone call we pick up these days

00:45:28there’s always a we are recording your

00:45:30voice for verification training purposes

00:45:32so there must be huge vast databases of

00:45:35people’s voices out there ok so next is

00:45:39ample decision trees so this is an

00:45:42example of decision tree that we showed

00:45:43earlier on and this is predicting

00:45:46mortgage default so amazingly two banks

00:45:50– – sorry two mortgage providers in the

00:45:53u.s. went bust as usual of course and

00:45:56were bailed out by the US taxpayer so we

00:45:59owned by the US government so Freddie

00:46:01but Freddie Mac and Fannie Mae and as

00:46:04part of their I don’t know as part of

00:46:07their reprisal basically a slap on the

00:46:09wrist the the government forced them to

00:46:11release lots of their data to the public

00:46:13and amazingly they they publicized a

00:46:17whole data set of mortgage applications

00:46:19and also historical accounts of what

00:46:21happened to those mortgage applications

00:46:22so you can say that they did told us

00:46:26whether that person then defaulted in

00:46:27the future so the task here is given

00:46:32some given some oh dear I’m running over

00:46:35time off to speed up given some data is

00:46:38it possible to predict whether that

00:46:39person’s going to default so the first

00:46:42the first problem is the whole data

00:46:45cleaning problem like we saw

00:46:46the previous talk it’s the vast majority

00:46:49of time to spend cleaning data

00:46:51I’m gonna skip over that so if we were

00:46:54to flatten all of the data that was

00:46:56recovered into a an image before we put

00:46:58it through the algorithm this is kind of

00:46:59what it looks like it’s very

00:47:00intermingled and mixed can’t quite

00:47:03understand what’s going on so a decision

00:47:05tree is is learning all of these rules

00:47:08and based upon the outcome of those

00:47:10rules is rather yes the person defaulted

00:47:12no they didn’t default so we had

00:47:15approximately 20,000 samples total 50-50

00:47:18split a random forest classifier so it’s

00:47:22a type of decision tree algorithm but is

00:47:25better does not over fit as much only 11

00:47:30input features so the main problem here

00:47:32is I don’t actually think we’ve got

00:47:33enough data to do a really good job but

00:47:35we’ll see what we can do and the one

00:47:38great thing about decision trees is that

00:47:40actually gives you a measure of

00:47:42importance for all of those variables so

00:47:45here we’ve got the variables that were

00:47:47inputted to the algorithm at the bottom

00:47:49and it shows their respective importance

00:47:53of those variables on there on the

00:47:56left-hand side so you can see actually

00:47:57the credit score is in second place so

00:48:00I’m not sure that the credit reference

00:48:01agencies would be too happy that you

00:48:03know they could only explain 0.25 of the

00:48:06data so 25% of the data could only be

00:48:09explained by the credit score alone so

00:48:13not not a great result for them and

00:48:14actually the most important measure was

00:48:17the HPI origination which was the house

00:48:19price index origination for that local

00:48:21area so this is saying that a person who

00:48:24took out a mortgage in a very local area

00:48:26it’s very dependent on the prices within

00:48:28that area as to whether they’re going to

00:48:30default or not and this is kind of a

00:48:31typical really in the US you can see

00:48:33like vast tracts of like places like

00:48:35Detroit that you know as soon as some of

00:48:37the jobs left everybody just lost their

00:48:39jobs in the whole house price area then

00:48:41crashed and then people couldn’t afford

00:48:43to sell because they couldn’t sell it so

00:48:47that’s kind of why that’s so important

00:48:50interesting result and then final

00:48:52example I’m having to move rather

00:48:54quickly here because I’ve only got two

00:48:55minutes left but is it possible to take

00:48:58that data

00:49:00and try and see whether there’s

00:49:02something strange going on without in

00:49:03the data so basically this is an

00:49:05unlabeled example we’re not telling it

00:49:07what to learn here so how do we do that

00:49:10well there’s a deep learning technique

00:49:12called an autoencoder which basically it

00:49:16takes the inputs and it restricts the

00:49:18number of hidden neurons to only a few

00:49:20concepts he’s saying you’ve really got a

00:49:22pick and choose what data you use and

00:49:24generate some concepts that are really

00:49:26quite strict and then we try and

00:49:28reproduce the output again and we’re

00:49:30comparing the output against the input

00:49:32as a measure of how well we’ll have done

00:49:35so basically those restrictions in the

00:49:37middle maybe only two neurons you know

00:49:39yes and no something like that is that

00:49:41possible to reconstruct the data so we

00:49:45can do that so there’s the same data as

00:49:47before slightly it’s a different random

00:49:49sample so it might look slightly

00:49:50different we’ve got an input layer a

00:49:52number of hidden layers that are

00:49:54compressing the data down into smaller

00:49:55and smaller neurons and then we’re

00:49:57reconstructing again back to the input

00:50:00layer and doing a comparison to see how

00:50:01well we did but what we can do then is

00:50:04plot in two or three D one of those

00:50:07hidden layers to actually view those

00:50:08concepts and what we’ve learnt and

00:50:10finally this is the result of that

00:50:12process and the left-hand side we’ve got

00:50:14a 2d representation and you can start to

00:50:17see there’s actually some structure

00:50:18within that data so most generally you

00:50:22can see that the people that defaulted

00:50:24the each ruse on that graph or on the on

00:50:26the left-hand side and the people that

00:50:28didn’t default on the right-hand side

00:50:29and within there if you look on the

00:50:31right-hand side there’s a couple of

00:50:33orange dots and that’s saying that the

00:50:35vast majority of people in there didn’t

00:50:37default but one or two people did now an

00:50:39analyst might start to ask why so it

00:50:42could be something quite innocent you

00:50:43know maybe the person lost his

00:50:45high-powered job went to prison

00:50:47something like that but it’s kind of

00:50:49indicative that something else is going

00:50:51on and this is where the analyst would

00:50:52come in and start investigating that

00:50:54data so these are completely unlabeled

00:50:56and the algorithm has absolutely no idea

00:50:59what it means

00:51:00and it still takes human to do some

00:51:02analysis and to do some investigation to

00:51:04figure out what has happened but these

00:51:07kinds of tools lead the analysts in the

00:51:09right direction as opposed to just

00:51:11taking a random Sam

00:51:12and then finally on the right hand side

00:51:14we’ve got a 3d representation of the

00:51:15same data and this is where it becomes

00:51:17really really powerful you can imagine

00:51:19like if you could get that graph and you

00:51:21can like look into it and and move it

00:51:24and turn it around and you can start to

00:51:26see clusters in 3d space and that’s when

00:51:28it starts to become immersive and given

00:51:31enough time it takes it takes a certain

00:51:32amount of time for any analyst to

00:51:34analyze data but given enough time they

00:51:36will be able to learn to see patterns

00:51:38within that data which will help them to

00:51:41investigate things that they haven’t

00:51:43seen before and I think I better stop

00:51:45there because I’ve completely run out of

00:51:46time so thank you very much for

00:51:47listening

00:52:00you

”