Press "Enter" to skip to content

GOTO 2017 • Cloud-Native Data Science • Phil Winder


thank you very much so this is a talk

thank you very much so this is a talk about applying client native principles

about applying client native principles

about applying client native principles to data science in 2016

to data science in 2016

to data science in 2016 Microsoft made a very gutsy move and

Microsoft made a very gutsy move and

Microsoft made a very gutsy move and they released a new breed of chat bot

they released a new breed of chat bot

they released a new breed of chat bot into the public domain

into the public domain

into the public domain the company’s website claimed that it

the company’s website claimed that it

the company’s website claimed that it had been built using relevant public

had been built using relevant public

had been built using relevant public data that it had been modelled cleaned

data that it had been modelled cleaned

data that it had been modelled cleaned and filtered you may have heard of it it

and filtered you may have heard of it it

and filtered you may have heard of it it was called tayi the purpose of the bot

was called tayi the purpose of the bot

was called tayi the purpose of the bot was to respond to tweets in a humanistic

was to respond to tweets in a humanistic

was to respond to tweets in a humanistic manner you could send it questions on

manner you could send it questions on

manner you could send it questions on Twitter using its handle and it did a

Twitter using its handle and it did a

Twitter using its handle and it did a really good a really good job of

really good a really good job of

really good a really good job of answering like her like a youth actually

answering like her like a youth actually

answering like her like a youth actually as I say a youth because I didn’t

as I say a youth because I didn’t

as I say a youth because I didn’t understand a lot of the acronyms are

understand a lot of the acronyms are

understand a lot of the acronyms are used but in a well when it was released

used but in a well when it was released

used but in a well when it was released everything was actually going swimmingly

everything was actually going swimmingly

everything was actually going swimmingly and it worked remarkably well it really

and it worked remarkably well it really

and it worked remarkably well it really did sound like a human as long neither

did sound like a human as long neither

did sound like a human as long neither end but when a big tech company like

end but when a big tech company like

end but when a big tech company like this like this releases a product like

this like this releases a product like

this like this releases a product like this usually they’re the first users of

this usually they’re the first users of

this usually they’re the first users of this service are engineers and given

this service are engineers and given

this service are engineers and given that you’re all engineers in the room

that you’re all engineers in the room

that you’re all engineers in the room would you a test out this service

would you a test out this service

would you a test out this service appreciate it for what it is and you

appreciate it for what it is and you

appreciate it for what it is and you know ask it sensible questions or B

know ask it sensible questions or B

know ask it sensible questions or B would you try and break it would you

would you try and break it would you

would you try and break it would you send it the most horrific things that

send it the most horrific things that

send it the most horrific things that you could think of in order to try and

you could think of in order to try and

you could think of in order to try and force it to give us the answer well you

force it to give us the answer well you

force it to give us the answer well you know engineers are a sadistic bunch and

know engineers are a sadistic bunch and

know engineers are a sadistic bunch and you can guess which option they chose

you can guess which option they chose

you can guess which option they chose the bot went from a mild-mannered well

the bot went from a mild-mannered well

the bot went from a mild-mannered well answering chat bot to a sexist racist

answering chat bot to a sexist racist

answering chat bot to a sexist racist genocide all Nazi in about 24 hours

genocide all Nazi in about 24 hours

genocide all Nazi in about 24 hours you’ve got a collection of tweets you

you’ve got a collection of tweets you

you’ve got a collection of tweets you can see there where it started off you

can see there where it started off you

can see there where it started off you know looking quite good and we ended up

know looking quite good and we ended up

know looking quite good and we ended up with Hitler you know if you end up with

with Hitler you know if you end up with

with Hitler you know if you end up with Hitler you know it’s gone wrong one of

Hitler you know it’s gone wrong one of

Hitler you know it’s gone wrong one of my favorite tweets actually was about a

my favorite tweets actually was about a

my favorite tweets actually was about a British comedian called Ricky Gervais

British comedian called Ricky Gervais

British comedian called Ricky Gervais and it had a very you know

and it had a very you know

and it had a very you know decent-enough question is ricky gervais

decent-enough question is ricky gervais

decent-enough question is ricky gervais an atheist the response ricky gervais

an atheist the response ricky gervais

an atheist the response ricky gervais slant totalitarianism from adolf hitler

slant totalitarianism from adolf hitler

slant totalitarianism from adolf hitler the inventor of atheism now for all I

the inventor of atheism now for all I

the inventor of atheism now for all I know about Hitler I don’t think that’s

know about Hitler I don’t think that’s

know about Hitler I don’t think that’s his most famous trait but I’ll give it

his most famous trait but I’ll give it

his most famous trait but I’ll give it 10 out of 10 for you no imagination for

10 out of 10 for you no imagination for

10 out of 10 for you no imagination for that one and ultimately the result of

that one and ultimately the result of

that one and ultimately the result of this wonderful experiment 24 hours later

this wonderful experiment 24 hours later

this wonderful experiment 24 hours later it was dead gone and although that’s

it was dead gone and although that’s

it was dead gone and although that’s quite a hilarious story I’m actually

quite a hilarious story I’m actually

quite a hilarious story I’m actually quite impressed with Microsoft it was a

quite impressed with Microsoft it was a

quite impressed with Microsoft it was a very gutsy move to allow this to happen

very gutsy move to allow this to happen

very gutsy move to allow this to happen they managed to deliver something that

they managed to deliver something that

they managed to deliver something that was really quite impressive but I think

was really quite impressive but I think

was really quite impressive but I think I think and this is just speculation

I think and this is just speculation

I think and this is just speculation that I think that might some of

that I think that might some of

that I think that might some of Microsoft’s traditional organizational

Microsoft’s traditional organizational

Microsoft’s traditional organizational stuff got in the way I think that the

stuff got in the way I think that the

stuff got in the way I think that the people if people were able to spot these

people if people were able to spot these

people if people were able to spot these problems and they were in a position to

problems and they were in a position to

problems and they were in a position to be able to spot these problems then they

be able to spot these problems then they

be able to spot these problems then they could have stopped it before something

could have stopped it before something

could have stopped it before something like this happened and that’s really

like this happened and that’s really

like this happened and that’s really about what this talk is about today so

about what this talk is about today so

about what this talk is about today so in normal life tradition is a fantastic

in normal life tradition is a fantastic

in normal life tradition is a fantastic in important part of of culture cultural

in important part of of culture cultural

in important part of of culture cultural meme but in engineering it’s actually

meme but in engineering it’s actually

meme but in engineering it’s actually the harbor of bad habits you know if we

the harbor of bad habits you know if we

the harbor of bad habits you know if we stick to traditions then we tend to

stick to traditions then we tend to

stick to traditions then we tend to repeat the same mistakes I used to work

repeat the same mistakes I used to work

repeat the same mistakes I used to work as a moor in the data science field and

as a moor in the data science field and

as a moor in the data science field and in software engineering and what we used

in software engineering and what we used

in software engineering and what we used to do was I would go away and I would

to do was I would go away and I would

to do was I would go away and I would write my models and do my research and

write my models and do my research and

write my models and do my research and then the only thing that everybody else

then the only thing that everybody else

then the only thing that everybody else would see was just this massive code

would see was just this massive code

would see was just this massive code which I would throw to software

which I would throw to software

which I would throw to software engineers and I would say there you go

engineers and I would say there you go

engineers and I would say there you go software engineers I finished my job now

software engineers I finished my job now

software engineers I finished my job now it’s your turn you implement it and

it’s your turn you implement it and

it’s your turn you implement it and obviously you know most of the time that

obviously you know most of the time that

obviously you know most of the time that just didn’t work some of the time it

just didn’t work some of the time it

just didn’t work some of the time it partially worked but it never worked as

partially worked but it never worked as

partially worked but it never worked as well as it should have done I actually

well as it should have done I actually

well as it should have done I actually spoke to a client the other day and he

spoke to a client the other day and he

spoke to a client the other day and he was worried that he had paid for a

was worried that he had paid for a

was worried that he had paid for a project for his company using he sent

project for his company using he sent

project for his company using he sent some data off to a research arm and they

some data off to a research arm and they

some data off to a research arm and they he was worried that

he was worried that

he was worried that the the work that these researchers were

the the work that these researchers were

the the work that these researchers were doing were kind of not really applicable

doing were kind of not really applicable

doing were kind of not really applicable in real life they were the words he used

in real life they were the words he used

in real life they were the words he used he thought it was a bit too academic

he thought it was a bit too academic

he thought it was a bit too academic with the words he used and what he meant

with the words he used and what he meant

with the words he used and what he meant was the the types of things that they

was the the types of things that they

was the the types of things that they were coming up with we’re not really

were coming up with we’re not really

were coming up with we’re not really realistic and relevant to you know

realistic and relevant to you know

realistic and relevant to you know modern-day industrial software so yeah

modern-day industrial software so yeah

modern-day industrial software so yeah tradition is a bit of a problem but

tradition is a bit of a problem but

tradition is a bit of a problem but traditionally data scientists have

traditionally data scientists have

traditionally data scientists have worked towards a certain type of model

worked towards a certain type of model

worked towards a certain type of model this is a model because the called the

this is a model because the called the

this is a model because the called the cross industry standard practice with

cross industry standard practice with

cross industry standard practice with data mining and this is the nearest

data mining and this is the nearest

data mining and this is the nearest thing we’ve got to you know a process in

thing we’ve got to you know a process in

thing we’ve got to you know a process in data science you’ll see that there’s

data science you’ll see that there’s

data science you’ll see that there’s lots of loops in this process and that’s

lots of loops in this process and that’s

lots of loops in this process and that’s just indicative of the fact that most of

just indicative of the fact that most of

just indicative of the fact that most of data science is kind of open-ended and

data science is kind of open-ended and

data science is kind of open-ended and continuous it never really stops the

continuous it never really stops the

continuous it never really stops the problem with this is that pretty much

problem with this is that pretty much

problem with this is that pretty much all of these steps are you know the very

all of these steps are you know the very

all of these steps are you know the very individual individual and they don’t

individual individual and they don’t

individual individual and they don’t scale very well so the first problem is

scale very well so the first problem is

scale very well so the first problem is the deployment face as I just said when

the deployment face as I just said when

the deployment face as I just said when I was a data scientist I would throw my

I was a data scientist I would throw my

I was a data scientist I would throw my models over to the software engineers

models over to the software engineers

models over to the software engineers and then I would never see it ever again

and then I would never see it ever again

and then I would never see it ever again I kind of think this is probably

I kind of think this is probably

I kind of think this is probably something that happened at Microsoft we

something that happened at Microsoft we

something that happened at Microsoft we get the software engineers that have not

get the software engineers that have not

get the software engineers that have not been trained in data science we give

been trained in data science we give

been trained in data science we give them I give them you know poorly

them I give them you know poorly

them I give them you know poorly documented uninterpretable code and

documented uninterpretable code and

documented uninterpretable code and expect them to understand it and

expect them to understand it and

expect them to understand it and implement it efficiently it’s it’s never

implement it efficiently it’s it’s never

implement it efficiently it’s it’s never going to happen

going to happen

going to happen and then we start going through the

and then we start going through the

and then we start going through the other parts of the model the first is

other parts of the model the first is

other parts of the model the first is data understanding this is a major issue

data understanding this is a major issue

data understanding this is a major issue in in data science because the data is

in in data science because the data is

in in data science because the data is the most impart the most important part

the most impart the most important part

the most impart the most important part of the problem and the data

of the problem and the data

of the problem and the data understanding part we rely on domain

understanding part we rely on domain

understanding part we rely on domain experts in order to interpret the the

experts in order to interpret the the

experts in order to interpret the the data so we had a great talk earlier on

data so we had a great talk earlier on

data so we had a great talk earlier on from Feynman about how he was working on

from Feynman about how he was working on

from Feynman about how he was working on the music domain he was working with if

the music domain he was working with if

the music domain he was working with if you weren’t there he was working with

you weren’t there he was working with

you weren’t there he was working with classical music and to to a normal data

classical music and to to a normal data

classical music and to to a normal data scientist you would know nothing about

scientist you would know nothing about

scientist you would know nothing about the terminology used in music but

the terminology used in music but

the terminology used in music but throughout his years of doing this

throughout his years of doing this

throughout his years of doing this research he finally became a domain

research he finally became a domain

research he finally became a domain but it took years and years of work and

but it took years and years of work and

but it took years and years of work and years of time so that’s a good example

years of time so that’s a good example

years of time so that’s a good example of that data preparation is another

of that data preparation is another

of that data preparation is another problem area because it’s often done by

problem area because it’s often done by

problem area because it’s often done by one person and it’s only done once so

one person and it’s only done once so

one person and it’s only done once so what happens is you get one guy that

what happens is you get one guy that

what happens is you get one guy that really delves into the data and they’re

really delves into the data and they’re

really delves into the data and they’re the ones that really understand it but

the ones that really understand it but

the ones that really understand it but it’s kind of it’s hard to reproduce

it’s kind of it’s hard to reproduce

it’s kind of it’s hard to reproduce because only one person understands it

because only one person understands it

because only one person understands it and the output is again just this

and the output is again just this

and the output is again just this amorphous blob of software which takes

amorphous blob of software which takes

amorphous blob of software which takes messy data and spits out good data and

messy data and spits out good data and

messy data and spits out good data and finally modeling this is a little bit

finally modeling this is a little bit

finally modeling this is a little bit easier to reason about if you have more

easier to reason about if you have more

easier to reason about if you have more and more people understanding the types

and more people understanding the types

and more people understanding the types of models used and how to do modeling

of models used and how to do modeling

of models used and how to do modeling but the issue with modeling is that

but the issue with modeling is that

but the issue with modeling is that again this is a very one person only

again this is a very one person only

again this is a very one person only process because the the process usually

process because the the process usually

process because the the process usually involves trying lots of different things

involves trying lots of different things

involves trying lots of different things basically picking the best one but that

basically picking the best one but that

basically picking the best one but that process that trial and error is never

process that trial and error is never

process that trial and error is never recorded anywhere the only output is the

recorded anywhere the only output is the

recorded anywhere the only output is the model you know that is the answer so if

model you know that is the answer so if

model you know that is the answer so if we want to scale this past one person

we want to scale this past one person

we want to scale this past one person then we need to there that usually what

then we need to there that usually what

then we need to there that usually what happens is the second person repeats all

happens is the second person repeats all

happens is the second person repeats all the same mistakes and that actually ends

the same mistakes and that actually ends

the same mistakes and that actually ends up at a different result because you

up at a different result because you

up at a different result because you know his biases and his preferences to

know his biases and his preferences to

know his biases and his preferences to us algorithms usually ends up in a

us algorithms usually ends up in a

us algorithms usually ends up in a different model so you know the the

different model so you know the the

different model so you know the the whole part of that side of the model is

whole part of that side of the model is

whole part of that side of the model is kind of like a murky canal you know it’s

kind of like a murky canal you know it’s

kind of like a murky canal you know it’s like a mucky Amsterdam canal you know

like a mucky Amsterdam canal you know

like a mucky Amsterdam canal you know the ships can go off and down but you

the ships can go off and down but you

the ships can go off and down but you wouldn’t want to jump in and and follow

wouldn’t want to jump in and and follow

wouldn’t want to jump in and and follow it and then if at the end of all that

it and then if at the end of all that

it and then if at the end of all that you know we’ve got the operation side

you know we’ve got the operation side

you know we’ve got the operation side the deployment side we’ve got the vast

the deployment side we’ve got the vast

the deployment side we’ve got the vast majority of the data science research

majority of the data science research

majority of the data science research phase that’s not going well actually the

phase that’s not going well actually the

phase that’s not going well actually the vast majority of projects fail because

vast majority of projects fail because

vast majority of projects fail because there’s a lack of business understanding

there’s a lack of business understanding

there’s a lack of business understanding and that’s either because the business

and that’s either because the business

and that’s either because the business doesn’t understand the technical

doesn’t understand the technical

doesn’t understand the technical implications that they’re proposing or

implications that they’re proposing or

implications that they’re proposing or the tech guys don’t understand the

the tech guys don’t understand the

the tech guys don’t understand the business problem enough so a whole host

business problem enough so a whole host

business problem enough so a whole host of problems so I think what I’m going to

of problems so I think what I’m going to

of problems so I think what I’m going to do now is I’m gonna ignore the business

do now is I’m gonna ignore the business

do now is I’m gonna ignore the business side a little bit because that is

side a little bit because that is

side a little bit because that is actually a separate problem in itself

actually a separate problem in itself

actually a separate problem in itself and you’re all tech guys so and gals so

and you’re all tech guys so and gals so

and you’re all tech guys so and gals so I’m just going to stick to

I’m just going to stick to

I’m just going to stick to three distinct phases we’ve got the

three distinct phases we’ve got the

three distinct phases we’ve got the research phase which was the bit that

research phase which was the bit that

research phase which was the bit that talks about you know understanding the

talks about you know understanding the

talks about you know understanding the data massaging the data and producing

data massaging the data and producing

data massaging the data and producing the data in the model I’ve got the build

the data in the model I’ve got the build

the data in the model I’ve got the build face trying to prove we’re doing what

face trying to prove we’re doing what

face trying to prove we’re doing what we’re doing is correct and then the

we’re doing is correct and then the

we’re doing is correct and then the actual deployment phase the bit that we

actual deployment phase the bit that we

actual deployment phase the bit that we want to rush into production

want to rush into production

want to rush into production so yeah the research phase consists of

so yeah the research phase consists of

so yeah the research phase consists of the initial data science that can be

the initial data science that can be

the initial data science that can be anything from performing experiments

anything from performing experiments

anything from performing experiments gathering more data preparation data

gathering more data preparation data

gathering more data preparation data cleaning modeling all that good stuff

cleaning modeling all that good stuff

cleaning modeling all that good stuff this is kind of this is called the

this is kind of this is called the

this is kind of this is called the research phase because it is a very

research phase because it is a very

research phase because it is a very scientific process and the biggest

scientific process and the biggest

scientific process and the biggest problem with that is that it’s it’s

problem with that is that it’s it’s

problem with that is that it’s it’s inherently open-ended and therefore it’s

inherently open-ended and therefore it’s

inherently open-ended and therefore it’s very high-risk so there is a high

very high-risk so there is a high

very high-risk so there is a high probability of failure at this point

probability of failure at this point

probability of failure at this point because you might find that either you

because you might find that either you

because you might find that either you don’t have the data to do the job

don’t have the data to do the job

don’t have the data to do the job properly or you just can’t do the job

properly or you just can’t do the job

properly or you just can’t do the job because it’s you know intractable for

because it’s you know intractable for

because it’s you know intractable for some reason so stepping back a bit

some reason so stepping back a bit

some reason so stepping back a bit believe it or not Britain actually had a

believe it or not Britain actually had a

believe it or not Britain actually had a very rich motoring heritage you might

very rich motoring heritage you might

very rich motoring heritage you might not think it these days you might think

not think it these days you might think

not think it these days you might think of Germany or something like that but

of Germany or something like that but

of Germany or something like that but there’s a manufacturing plant near

there’s a manufacturing plant near

there’s a manufacturing plant near Oxford which started in 1913 so this is

Oxford which started in 1913 so this is

Oxford which started in 1913 so this is a picture from that same manufacturing

a picture from that same manufacturing

a picture from that same manufacturing plant in 1943 and from about then until

plant in 1943 and from about then until

plant in 1943 and from about then until the 1970s it was owned by a company

the 1970s it was owned by a company

the 1970s it was owned by a company called British Leyland this is a picture

called British Leyland this is a picture

called British Leyland this is a picture of their manufacturing line building

of their manufacturing line building

of their manufacturing line building cromwell tanks for world war ii by the

cromwell tanks for world war ii by the

cromwell tanks for world war ii by the time it got to the 70s it was building

time it got to the 70s it was building

time it got to the 70s it was building this little couch you probably all

this little couch you probably all

this little couch you probably all recognize but at the start and during

recognize but at the start and during

recognize but at the start and during the 70’s things started going wrong and

the 70’s things started going wrong and

the 70’s things started going wrong and the the ultimate reason why they went

the the ultimate reason why they went

the the ultimate reason why they went wrong was because there were better

wrong was because there were better

wrong was because there were better cheaper alternatives available other

cheaper alternatives available other

cheaper alternatives available other companies were investing in the

companies were investing in the

companies were investing in the automation of these lines in order to

automation of these lines in order to

automation of these lines in order to produce better quality and cheaper

produce better quality and cheaper

produce better quality and cheaper products and you know when we talk about

products and you know when we talk about

products and you know when we talk about software engineering software

software engineering software

software engineering software engineering or engineering

engineering or engineering

engineering or engineering it’s just converting a process into code

it’s just converting a process into code

it’s just converting a process into code so that we can automate it that’s all is

that’s where I should have said that

that’s where I should have said that word and today I think that data science

word and today I think that data science

word and today I think that data science is actually the automation of the data

is actually the automation of the data

is actually the automation of the data so we’re starting to see like a

so we’re starting to see like a

so we’re starting to see like a three-tier hierarchy here between you

three-tier hierarchy here between you

three-tier hierarchy here between you know we’ve got data science at the

know we’ve got data science at the

know we’ve got data science at the bottom which is taking all the data and

bottom which is taking all the data and

bottom which is taking all the data and automating things based upon that data

automating things based upon that data

automating things based upon that data to feed in to the process so I think the

to feed in to the process so I think the

to feed in to the process so I think the data science and software engineering

data science and software engineering

data science and software engineering actually make a very good fit they go

actually make a very good fit they go

actually make a very good fit they go together very well because the data and

together very well because the data and

together very well because the data and the science feeds into the software

the science feeds into the software

the science feeds into the software which then feeds into the the value that

which then feeds into the the value that

which then feeds into the the value that you’re trying to provide and this is a

you’re trying to provide and this is a

you’re trying to provide and this is a picture of the same manufacturing plant

picture of the same manufacturing plant

picture of the same manufacturing plant in 2013 so this is a hundred years after

in 2013 so this is a hundred years after

in 2013 so this is a hundred years after the the manufacturing plant opened and

the the manufacturing plant opened and

the the manufacturing plant opened and you can now see that it’s far more

you can now see that it’s far more

you can now see that it’s far more automated and it’s basically no humans

automated and it’s basically no humans

automated and it’s basically no humans there and that allows this company to

there and that allows this company to

there and that allows this company to build better more reliable cars and as

build better more reliable cars and as

build better more reliable cars and as you probably know this company’s now

you probably know this company’s now

you probably know this company’s now owned by BMW so you know the the the

owned by BMW so you know the the the

owned by BMW so you know the the the great Golden English company was eaten

great Golden English company was eaten

great Golden English company was eaten up by German manufacturers damn it so so

up by German manufacturers damn it so so

up by German manufacturers damn it so so yeah anyway my point is that I think

yeah anyway my point is that I think

yeah anyway my point is that I think software engineers or the software

software engineers or the software

software engineers or the software engineers are actually in a really good

engineers are actually in a really good

engineers are actually in a really good position to actually push ourselves into

position to actually push ourselves into

position to actually push ourselves into data science not the other way around

data science not the other way around

data science not the other way around because we’ve come away with all of the

because we’ve come away with all of the

because we’ve come away with all of the things that we’ve learned during this

things that we’ve learned during this

things that we’ve learned during this you know more traditional automation

you know more traditional automation

you know more traditional automation phase and we can start applying it to

phase and we can start applying it to

phase and we can start applying it to data science because the fact is like at

data science because the fact is like at

data science because the fact is like at the moment none of this happens in data

the moment none of this happens in data

the moment none of this happens in data science at the moment and this leads me

science at the moment and this leads me

science at the moment and this leads me to data Sciences dirty dirty little

to data Sciences dirty dirty little

to data Sciences dirty dirty little secret and the little secret is that the

secret and the little secret is that the

secret and the little secret is that the vast majority of your effort and time

vast majority of your effort and time

vast majority of your effort and time and engineering skill as a data

and engineering skill as a data

and engineering skill as a data scientist goes into the data just

scientist goes into the data just

scientist goes into the data just messing around with the data incessantly

messing around with the data incessantly

messing around with the data incessantly you know we are fixing problems with the

you know we are fixing problems with the

you know we are fixing problems with the data we are imputing missing values we

data we are imputing missing values we

data we are imputing missing values we are removing invalid data and so on and

are removing invalid data and so on and

are removing invalid data and so on and so on and so on and the vast majority of

so on and so on and the vast majority of

so on and so on and the vast majority of the PUF

the PUF

the PUF the final performance of the model is

the final performance of the model is

the final performance of the model is based upon how much you can improve that

based upon how much you can improve that

based upon how much you can improve that data not on the model so you know we’ve

data not on the model so you know we’ve

data not on the model so you know we’ve had some too great fantastic talks this

had some too great fantastic talks this

had some too great fantastic talks this morning you know all about deep learning

morning you know all about deep learning

morning you know all about deep learning all about very sexy technologies but

all about very sexy technologies but

all about very sexy technologies but that’s a very Silicon Valley problem no

that’s a very Silicon Valley problem no

that’s a very Silicon Valley problem no offense to the Silicon Valley guys but

offense to the Silicon Valley guys but

offense to the Silicon Valley guys but that’s a very Silicon Valley bro for

that’s a very Silicon Valley bro for

that’s a very Silicon Valley bro for everybody else outside of Silicon Valley

everybody else outside of Silicon Valley

everybody else outside of Silicon Valley we’re still living in the world where

we’re still living in the world where

we’re still living in the world where you know it is the simple techniques

you know it is the simple techniques

you know it is the simple techniques that really make a difference and it

that really make a difference and it

that really make a difference and it doesn’t have to be a complex model

doesn’t have to be a complex model

doesn’t have to be a complex model simple things can go a long way and one

simple things can go a long way and one

simple things can go a long way and one of the biggest issues with this process

of the biggest issues with this process

of the biggest issues with this process is that this discovery this fixing the

is that this discovery this fixing the

is that this discovery this fixing the data this understanding the data as I

data this understanding the data as I

data this understanding the data as I said it’s only done by one person so I

said it’s only done by one person so I

said it’s only done by one person so I think actually this is just a problem of

think actually this is just a problem of

think actually this is just a problem of visibility there is very little

visibility there is very little

visibility there is very little visibility within data science there’s

visibility within data science there’s

visibility within data science there’s only generally one person that’s working

only generally one person that’s working

only generally one person that’s working on a problem at a time

on a problem at a time

on a problem at a time and it’s you know it’s very difficult to

and it’s you know it’s very difficult to

and it’s you know it’s very difficult to scale or at least it’s very inefficient

scale or at least it’s very inefficient

scale or at least it’s very inefficient to scale over the years software

to scale over the years software

to scale over the years software engineering has done a really great job

engineering has done a really great job

engineering has done a really great job in improving this because we had exactly

in improving this because we had exactly

in improving this because we had exactly the same problems you know we could

the same problems you know we could

the same problems you know we could distribute binaries quite effectively

distribute binaries quite effectively

distribute binaries quite effectively but when it came to source code we’ve

but when it came to source code we’ve

but when it came to source code we’ve gone through you know decades of trying

gone through you know decades of trying

gone through you know decades of trying to improve the visibility and the

to improve the visibility and the

to improve the visibility and the resiliency of our source code and

resiliency of our source code and

resiliency of our source code and thankfully data science is finally

thankfully data science is finally

thankfully data science is finally starting to get there and these two

starting to get there and these two

starting to get there and these two tools in particular have been very

tools in particular have been very

tools in particular have been very prolific so you probably all know one of

prolific so you probably all know one of

prolific so you probably all know one of those so I’m not going to talk about

those so I’m not going to talk about

those so I’m not going to talk about that but the second is a notebook some

that but the second is a notebook some

that but the second is a notebook some of you probably come across it but you

of you probably come across it but you

of you probably come across it but you may be not so I’ll just I’ll just

may be not so I’ll just I’ll just

may be not so I’ll just I’ll just introduce it so this is Jupiter

introduce it so this is Jupiter

introduce it so this is Jupiter notebooks it’s a an evolution of ipython

notebooks it’s a an evolution of ipython

notebooks it’s a an evolution of ipython notebooks the idea is that inside the

notebooks the idea is that inside the

notebooks the idea is that inside the notebook there is a series of cells and

notebook there is a series of cells and

notebook there is a series of cells and each cell can either be marked down or

each cell can either be marked down or

each cell can either be marked down or it can be code what this is done is it

it can be code what this is done is it

it can be code what this is done is it is single-handedly improved the

is single-handedly improved the

is single-handedly improved the visibility from pretty much zero all the

visibility from pretty much zero all the

visibility from pretty much zero all the way to almost as good as it’s get I

way to almost as good as it’s get I

way to almost as good as it’s get I think this is actually probably better

think this is actually probably better

think this is actually probably better than software engineer in terms of

than software engineer in terms of

than software engineer in terms of visibility what it means that it’s when

visibility what it means that it’s when

visibility what it means that it’s when I’m when I first gets

I’m when I first gets

I’m when I first gets data and I’m doing my analysis I can

data and I’m doing my analysis I can

data and I’m doing my analysis I can document everything I do even the

document everything I do even the

document everything I do even the mistakes I can write the code I can

mistakes I can write the code I can

mistakes I can write the code I can write you know words if I need to and

write you know words if I need to and

write you know words if I need to and whenever anybody else was to repeat that

whenever anybody else was to repeat that

whenever anybody else was to repeat that process they can just come along and

process they can just come along and

process they can just come along and read this like a document if they want

read this like a document if they want

read this like a document if they want to they can come in and they can

to they can come in and they can

to they can come in and they can actually start playing with the code

actually start playing with the code

actually start playing with the code they can start doing tests if they if

they can start doing tests if they if

they can start doing tests if they if you think oh I think your models rubbish

you think oh I think your models rubbish

you think oh I think your models rubbish I’m gonna try another model or I’m gonna

I’m gonna try another model or I’m gonna

I’m gonna try another model or I’m gonna try some different parameters it’s very

try some different parameters it’s very

try some different parameters it’s very easy for someone to just come in change

easy for someone to just come in change

easy for someone to just come in change something and run it so this is a very

something and run it so this is a very

something and run it so this is a very iterative very visual way of doing data

iterative very visual way of doing data

iterative very visual way of doing data science and yeah it’s it’s made a huge

science and yeah it’s it’s made a huge

science and yeah it’s it’s made a huge impact and then when you team it up

impact and then when you team it up

impact and then when you team it up we’ve get and I think we’ve got you know

we’ve get and I think we’ve got you know

we’ve get and I think we’ve got you know the holy grail of repeatability from get

the holy grail of repeatability from get

the holy grail of repeatability from get visibility from jupiter notebooks and

visibility from jupiter notebooks and

visibility from jupiter notebooks and even like like like this is something we

even like like like this is something we

even like like like this is something we don’t take for granted for granted for

don’t take for granted for granted for

don’t take for granted for granted for example like when we’re looking at code

example like when we’re looking at code

example like when we’re looking at code normal normal software code we’re using

normal normal software code we’re using

normal normal software code we’re using you know github and get lab and whatever

you know github and get lab and whatever

you know github and get lab and whatever just that the online viewers to view

just that the online viewers to view

just that the online viewers to view code far more often than we actually

code far more often than we actually

code far more often than we actually think that we are and that alone is is

think that we are and that alone is is

think that we are and that alone is is super super helpful for for the

super super helpful for for the

super super helpful for for the visibility there so and that is fine a

visibility there so and that is fine a

visibility there so and that is fine a very good and a huge advancement for

very good and a huge advancement for

very good and a huge advancement for individual developers but how do we

individual developers but how do we

individual developers but how do we scale it to multiple developers we do

scale it to multiple developers we do

scale it to multiple developers we do that with another project from Jupiter

that with another project from Jupiter

that with another project from Jupiter called Jupiter hub and it’s quite a

called Jupiter hub and it’s quite a

called Jupiter hub and it’s quite a simple architecture as you can see main

simple architecture as you can see main

simple architecture as you can see main parts comprise of HTTP proxy we’ve got

parts comprise of HTTP proxy we’ve got

parts comprise of HTTP proxy we’ve got the individual notebook so this notebook

the individual notebook so this notebook

the individual notebook so this notebook part is the bit I’ve just explained to

part is the bit I’ve just explained to

part is the bit I’ve just explained to you the Jupiter notebook and then we’ve

you the Jupiter notebook and then we’ve

you the Jupiter notebook and then we’ve got a couple of user base stuff in there

got a couple of user base stuff in there

got a couple of user base stuff in there on the left hand side there to handle

on the left hand side there to handle

on the left hand side there to handle the multi-tenant instances but the most

the multi-tenant instances but the most

the multi-tenant instances but the most interesting thing is this thing the

interesting thing is this thing the

interesting thing is this thing the spawner because what we could do is we

spawner because what we could do is we

spawner because what we could do is we can override that spawner and plug in a

can override that spawner and plug in a

can override that spawner and plug in a whole range of tools we can plug in

whole range of tools we can plug in

whole range of tools we can plug in dhaka and we could start spinning up

dhaka and we could start spinning up

dhaka and we could start spinning up docker containers where you can start

docker containers where you can start

docker containers where you can start spinning up you know Mises containers

spinning up you know Mises containers

spinning up you know Mises containers an Orchestrator we could spin it up in

an Orchestrator we could spin it up in

an Orchestrator we could spin it up in some sort of cloud-based environment

some sort of cloud-based environment

some sort of cloud-based environment it’s incredibly incredibly useful

it’s incredibly incredibly useful

it’s incredibly incredibly useful possibly my favorite is we can start

possibly my favorite is we can start

possibly my favorite is we can start kubernetes jobs you know start pods with

kubernetes jobs you know start pods with

kubernetes jobs you know start pods with our own containers in and you know

our own containers in and you know

our own containers in and you know fraught ask software engineers we know

fraught ask software engineers we know

fraught ask software engineers we know that this provides us with a huge amount

that this provides us with a huge amount

that this provides us with a huge amount of flexibility we can simply scale out

of flexibility we can simply scale out

of flexibility we can simply scale out when we need to if you’ve got more

when we need to if you’ve got more

when we need to if you’ve got more developers working on a different

developers working on a different

developers working on a different problem just add more pods if you need

problem just add more pods if you need

problem just add more pods if you need bigger machines just scale out the

bigger machines just scale out the

bigger machines just scale out the number of machines we can select our

number of machines we can select our

number of machines we can select our machines whether we want GPUs or CPUs

machines whether we want GPUs or CPUs

machines whether we want GPUs or CPUs it’s all years ahead of data science so

it’s all years ahead of data science so

it’s all years ahead of data science so we’ve got the visibility we’ve now

we’ve got the visibility we’ve now

we’ve got the visibility we’ve now started to containerize the process so

started to containerize the process so

started to containerize the process so this is you know two core tenants of

this is you know two core tenants of

this is you know two core tenants of cloud native containers visibility now

cloud native containers visibility now

cloud native containers visibility now let’s build on last hour let’s move on

let’s build on last hour let’s move on

let’s build on last hour let’s move on to the build face not build on to the

to the build face not build on to the

to the build face not build on to the move face so we’re teaching on this at

move face so we’re teaching on this at

move face so we’re teaching on this at the start but in the past and continuing

the start but in the past and continuing

the start but in the past and continuing today still happens today data scientist

today still happens today data scientist

today still happens today data scientist is a very general term I don’t

is a very general term I don’t

is a very general term I don’t necessarily mean you know people with

necessarily mean you know people with

necessarily mean you know people with PhDs that are working with high level

PhDs that are working with high level

PhDs that are working with high level tools deep learning this that neither

tools deep learning this that neither

tools deep learning this that neither just normal people just working with

just normal people just working with

just normal people just working with normal data they come up with an idea

normal data they come up with an idea

normal data they come up with an idea maybe a simple model they throw it over

maybe a simple model they throw it over

maybe a simple model they throw it over to the software engineers this is

to the software engineers this is

to the software engineers this is completely analogous to where software

completely analogous to where software

completely analogous to where software engineering it was about 10 years ago

engineering it was about 10 years ago

engineering it was about 10 years ago software engineers would take their

software engineers would take their

software engineers would take their binary their software throw it over to

binary their software throw it over to

binary their software throw it over to the Ops guys and we’re combating that

the Ops guys and we’re combating that

the Ops guys and we’re combating that with the idea of DevOps

with the idea of DevOps

with the idea of DevOps so I think that there’s an equivalent

so I think that there’s an equivalent

so I think that there’s an equivalent shift that needs to happen with data

shift that needs to happen with data

shift that needs to happen with data scientists the data scientists need to

scientists the data scientists need to

scientists the data scientists need to be become more integrated with the

be become more integrated with the

be become more integrated with the software engineers and ultimately more

software engineers and ultimately more

software engineers and ultimately more integrated with the ops people as well

integrated with the ops people as well

integrated with the ops people as well so you know data ops if you were and at

so you know data ops if you were and at

so you know data ops if you were and at best if we don’t have that at best we

best if we don’t have that at best we

best if we don’t have that at best we have inefficient models but at worst and

have inefficient models but at worst and

have inefficient models but at worst and you know what’s what’s more likely to

you know what’s what’s more likely to

you know what’s what’s more likely to happen is that things don’t happen at

happen is that things don’t happen at

happen is that things don’t happen at all and if you don’t get that transition

all and if you don’t get that transition

all and if you don’t get that transition right then you just end up with our

right then you just end up with our

right then you just end up with our products and talking of AI and robots I

products and talking of AI and robots I

products and talking of AI and robots I love that video

love that video

love that video the lipstick robot Simone get hilarious

the lipstick robot Simone get hilarious

the lipstick robot Simone get hilarious ah you’re all boring I find that funny

ah you’re all boring I find that funny

ah you’re all boring I find that funny I’m gonna I’m gonna laugh and so how do

I’m gonna I’m gonna laugh and so how do

I’m gonna I’m gonna laugh and so how do we improve this well like like like we

we improve this well like like like we

we improve this well like like like we saw from the devops transition from from

saw from the devops transition from from

saw from the devops transition from from therefore knops much of the problem is

therefore knops much of the problem is

therefore knops much of the problem is actually a people problem it’s it’s

actually a people problem it’s it’s

actually a people problem it’s it’s about getting people to accept their

about getting people to accept their

about getting people to accept their role is changing and it’s changing for

role is changing and it’s changing for

role is changing and it’s changing for the better for the benefit of everyone

the better for the benefit of everyone

the better for the benefit of everyone and that’s and that’s okay but it’s a

and that’s and that’s okay but it’s a

and that’s and that’s okay but it’s a bit boring what we can do technically is

bit boring what we can do technically is

bit boring what we can do technically is start to enforce quality we can enforce

start to enforce quality we can enforce

start to enforce quality we can enforce quality with surprise-surprise

quality with surprise-surprise

quality with surprise-surprise continuous deployment continuous

continuous deployment continuous

continuous deployment continuous integration this is a classic continuous

integration this is a classic continuous

integration this is a classic continuous delivery pipeline the I’m sure you all

delivery pipeline the I’m sure you all

delivery pipeline the I’m sure you all know this the engineer you know would

know this the engineer you know would

know this the engineer you know would commit is code it will go into a build

commit is code it will go into a build

commit is code it will go into a build server it would run through pipeline be

server it would run through pipeline be

server it would run through pipeline be deployed into production now the

deployed into production now the

deployed into production now the pipeline is possibly the most important

pipeline is possibly the most important

pipeline is possibly the most important part of this entire process and it needs

part of this entire process and it needs

part of this entire process and it needs to be customized to your domain and your

to be customized to your domain and your

to be customized to your domain and your problem

problem

problem I like I always like talking about the

I like I always like talking about the

I like I always like talking about the testing triangle this is quite common in

testing triangle this is quite common in

testing triangle this is quite common in the CI literature if you haven’t seen it

the CI literature if you haven’t seen it

the CI literature if you haven’t seen it before it’s an image where on the x-axis

before it’s an image where on the x-axis

before it’s an image where on the x-axis we’ve got the number of tests on the

we’ve got the number of tests on the

we’ve got the number of tests on the y-axis

y-axis

y-axis we’ve got like the scope or the depth of

we’ve got like the scope or the depth of

we’ve got like the scope or the depth of the test so at the bottom we’ve got unit

the test so at the bottom we’ve got unit

the test so at the bottom we’ve got unit tests who we have very large numbers of

tests who we have very large numbers of

tests who we have very large numbers of unit tests that are telling testing very

unit tests that are telling testing very

unit tests that are telling testing very small bits of code all the way up to the

small bits of code all the way up to the

small bits of code all the way up to the top where we have very few tests

top where we have very few tests

top where we have very few tests acceptance tests but they’re testing a

acceptance tests but they’re testing a

acceptance tests but they’re testing a huge amount of code and you know that

huge amount of code and you know that

huge amount of code and you know that testing process is possibly the most

testing process is possibly the most

testing process is possibly the most important part of the build phase if you

important part of the build phase if you

important part of the build phase if you don’t test your models then you end up

don’t test your models then you end up

don’t test your models then you end up with something like this this is my

with something like this this is my

with something like this this is my colleague he was trying to book a flight

colleague he was trying to book a flight

colleague he was trying to book a flight from Amsterdam to Prague I think and

from Amsterdam to Prague I think and

from Amsterdam to Prague I think and kayak kindly recommended the flight me

kayak kindly recommended the flight me

kayak kindly recommended the flight me you know that was a direct output of one

you know that was a direct output of one

you know that was a direct output of one of their recommendations models it was

of their recommendations models it was

of their recommendations models it was me so that was for this guy he obviously

me so that was for this guy he obviously

me so that was for this guy he obviously couldn’t book a flight

couldn’t book a flight

couldn’t book a flight they lost his revenue they lost his

they lost his revenue they lost his

they lost his revenue they lost his money I dread to imagine how many other

money I dread to imagine how many other

money I dread to imagine how many other people were using the site at the same

people were using the site at the same

people were using the site at the same time and they all received big me and

time and they all received big me and

time and they all received big me and they must lost a lot of money I think if

they must lost a lot of money I think if

they must lost a lot of money I think if that if anything is a clear indication

that if anything is a clear indication

that if anything is a clear indication that that the data science people need

that that the data science people need

that that the data science people need to be more integrated into the

to be more integrated into the

to be more integrated into the operations of their actual software

operations of their actual software

operations of their actual software because they’re the only ones that know

because they’re the only ones that know

because they’re the only ones that know you know how to implement monitoring the

you know how to implement monitoring the

you know how to implement monitoring the best way they know how to fix it if it

best way they know how to fix it if it

best way they know how to fix it if it goes wrong and then we get on to the

goes wrong and then we get on to the

goes wrong and then we get on to the deploy phase and this is a bit more

deploy phase and this is a bit more

deploy phase and this is a bit more difficult to talk about because it’s a

difficult to talk about because it’s a

difficult to talk about because it’s a bit more domain-specific it’s very tech

bit more domain-specific it’s very tech

bit more domain-specific it’s very tech stack specific so it depends what

stack specific so it depends what

stack specific so it depends what technology stack you’re using but I can

technology stack you’re using but I can

technology stack you’re using but I can generalize it a little bit by talking

generalize it a little bit by talking

generalize it a little bit by talking about containers but I mean ultimately

about containers but I mean ultimately

about containers but I mean ultimately the the goals are exactly the same

the the goals are exactly the same

the the goals are exactly the same we want our software to be reactive

we want our software to be reactive

we want our software to be reactive resilient and reproducible we want it to

resilient and reproducible we want it to

resilient and reproducible we want it to be reactive so that when we have changes

be reactive so that when we have changes

be reactive so that when we have changes to the outside world we can scale up and

to the outside world we can scale up and

to the outside world we can scale up and scale down as accordingly we want it to

scale down as accordingly we want it to

scale down as accordingly we want it to be resilient so if it ever fails in ease

be resilient so if it ever fails in ease

be resilient so if it ever fails in ease automatically repair itself and

automatically repair itself and

automatically repair itself and reproducible if we can quickly reproduce

reproducible if we can quickly reproduce

reproducible if we can quickly reproduce our cluster in another location of a

our cluster in another location of a

our cluster in another location of a testing or something that improves

testing or something that improves

testing or something that improves testability and that kind of represents

testability and that kind of represents

testability and that kind of represents this tiny little arrow in the in the

this tiny little arrow in the in the

this tiny little arrow in the in the build pipeline and even in continuous

build pipeline and even in continuous

build pipeline and even in continuous delivery this is often overlooked and

delivery this is often overlooked and

delivery this is often overlooked and it’s always represented by a little

it’s always represented by a little

it’s always represented by a little arrow as if it’s like this simple thing

arrow as if it’s like this simple thing

arrow as if it’s like this simple thing where you just push it to production

where you just push it to production

where you just push it to production flowers smiley faces done and it’s never

flowers smiley faces done and it’s never

flowers smiley faces done and it’s never like that it’s kind of it’s a bit more

like that it’s kind of it’s a bit more

like that it’s kind of it’s a bit more difficult and far more specific and

difficult and far more specific and

difficult and far more specific and there’s a lot of engineering effort that

there’s a lot of engineering effort that

there’s a lot of engineering effort that you spent you know trying to push this

you spent you know trying to push this

you spent you know trying to push this out to production for data science land

out to production for data science land

out to production for data science land one of the easiest things we can do is

one of the easiest things we can do is

one of the easiest things we can do is bring in containers again you know so

bring in containers again you know so

bring in containers again you know so how do you do that well you know you you

how do you do that well you know you you

how do you do that well you know you you have some sort of model you can quite

have some sort of model you can quite

have some sort of model you can quite easily stuff that into a container and

easily stuff that into a container and

easily stuff that into a container and if you’ve just got interfaces and

if you’ve just got interfaces and

if you’ve just got interfaces and rooters then they’re all pretty

rooters then they’re all pretty

rooters then they’re all pretty standardized once you’ve got to that

standardized once you’ve got to that

standardized once you’ve got to that point then it becomes much easier to to

point then it becomes much easier to to

point then it becomes much easier to to not only make sure it runs on your

not only make sure it runs on your

not only make sure it runs on your machine and it works the same way in

machine and it works the same way in

machine and it works the same way in production but also it’s easier for

production but also it’s easier for

production but also it’s easier for other people to reason about as well

other people to reason about as well

other people to reason about as well because here you know you’re reducing

because here you know you’re reducing

because here you know you’re reducing the domain that people have to

the domain that people have to

the domain that people have to understand in order to use your service

understand in order to use your service

understand in order to use your service and that model can be anything it could

and that model can be anything it could

and that model can be anything it could be you know just a simple Python model

be you know just a simple Python model

be you know just a simple Python model it can be Fianna derivative tensorflow

it can be Fianna derivative tensorflow

it can be Fianna derivative tensorflow whatever and if you’re into sort of more

whatever and if you’re into sort of more

whatever and if you’re into sort of more streaming technologies and you know we

streaming technologies and you know we

streaming technologies and you know we can easily apply streaming technologies

can easily apply streaming technologies

can easily apply streaming technologies here as well if we just package up the

here as well if we just package up the

here as well if we just package up the whatever it is in your particular

whatever it is in your particular

whatever it is in your particular streaming X streaming package that

streaming X streaming package that

streaming X streaming package that you’re using and like a source or spark

you’re using and like a source or spark

you’re using and like a source or spark executor it’s still perfectly reasonable

executor it’s still perfectly reasonable

executor it’s still perfectly reasonable to do that and that fits really nicely

to do that and that fits really nicely

to do that and that fits really nicely into the testing triangle because we can

into the testing triangle because we can

into the testing triangle because we can build that container as part of our

build that container as part of our

build that container as part of our delivery pipeline and start testing that

delivery pipeline and start testing that

delivery pipeline and start testing that container as opposed to just testing the

container as opposed to just testing the

container as opposed to just testing the code itself so you know it’s all fairly

code itself so you know it’s all fairly

code itself so you know it’s all fairly standard stuff everybody aims for but

standard stuff everybody aims for but

standard stuff everybody aims for but it’s amazing at how much this doesn’t

it’s amazing at how much this doesn’t

it’s amazing at how much this doesn’t happen in real life in data science and

happen in real life in data science and

happen in real life in data science and then finally we can simply stuff that

then finally we can simply stuff that

then finally we can simply stuff that container into production however you

container into production however you

container into production however you want you know using some sort of

want you know using some sort of

want you know using some sort of Orchestrator or you know if you’re using

Orchestrator or you know if you’re using

Orchestrator or you know if you’re using some sort of streaming based system

some sort of streaming based system

some sort of streaming based system selecting GPUs and CPUs it’s the

selecting GPUs and CPUs it’s the

selecting GPUs and CPUs it’s the ultimate in flexibility if it works

ultimate in flexibility if it works

ultimate in flexibility if it works there if it works on your laptop it

there if it works on your laptop it

there if it works on your laptop it doesn’t matter and just to finally push

doesn’t matter and just to finally push

doesn’t matter and just to finally push home one of the this is a slightly

home one of the this is a slightly

home one of the this is a slightly different domain but and I know there’s

different domain but and I know there’s

different domain but and I know there’s a few thought works people here today so

a few thought works people here today so

a few thought works people here today so I’ve got to be a little bit careful

I’ve got to be a little bit careful

I’ve got to be a little bit careful there are a great company an amazing

there are a great company an amazing

there are a great company an amazing company but their marketing department I

company but their marketing department I

company but their marketing department I think also needs to be integrated in

think also needs to be integrated in

think also needs to be integrated in into production as well because they

into production as well because they

into production as well because they sent out this email last week and I

sent out this email last week and I

sent out this email last week and I would be really interested in finding

would be really interested in finding

would be really interested in finding out what thought works seismic shits I

out what thought works seismic shits I

out what thought works seismic shits I find that really fascinating actually I

find that really fascinating actually I

find that really fascinating actually I think this is a genius move by the

think this is a genius move by the

think this is a genius move by the marketing department because so many

marketing department because so many

marketing department because so many people were talking about this in the

people were talking about this in the

people were talking about this in the office and I think that’s done far more

office and I think that’s done far more

office and I think that’s done far more for thought works than than anything

for thought works than than anything

for thought works than than anything they could have sent out so well done

they could have sent out so well done

they could have sent out so well done that marketing person that made that

that marketing person that made that

that marketing person that made that okay so now I have a quick demo

okay so now I have a quick demo

okay so now I have a quick demo demonstrating all of these concepts

demonstrating all of these concepts

demonstrating all of these concepts together

together

together I’ve tried think of a simple example my

I’ve tried think of a simple example my

I’ve tried think of a simple example my example is a a whisky shop so my

example is a a whisky shop so my

example is a a whisky shop so my business requirement is I have a client

business requirement is I have a client

business requirement is I have a client which is a whisky shop because I think

which is a whisky shop because I think

which is a whisky shop because I think whisky and their

whisky and their

whisky and their have come to me because they want to

have come to me because they want to

have come to me because they want to provide a USP in the fact that they can

provide a USP in the fact that they can

provide a USP in the fact that they can recommend better whiskies than anybody

recommend better whiskies than anybody

recommend better whiskies than anybody else but the problem is they want this

else but the problem is they want this

else but the problem is they want this to be able to scale they can’t really

to be able to scale they can’t really

to be able to scale they can’t really afford to employ whiskey experts every

afford to employ whiskey experts every

afford to employ whiskey experts every single one of their shops so it’s much

single one of their shops so it’s much

single one of their shops so it’s much more efficient to write an algorithm to

more efficient to write an algorithm to

more efficient to write an algorithm to do that for them so their requirements

do that for them so their requirements

do that for them so their requirements are they want somebody to pass a

are they want somebody to pass a

are they want somebody to pass a favorite whiskey in and they want

favorite whiskey in and they want

favorite whiskey in and they want recommendations out they want to start

recommendations out they want to start

recommendations out they want to start off with a limited set of whiskey’s but

off with a limited set of whiskey’s but

off with a limited set of whiskey’s but want to be able to update their data in

want to be able to update their data in

want to be able to update their data in the model in the future this is all

the model in the future this is all

the model in the future this is all available on my get repository you can

available on my get repository you can

available on my get repository you can get that for it’s all open source and

get that for it’s all open source and

get that for it’s all open source and it’s it’s pretty simple you know the

it’s it’s pretty simple you know the

it’s it’s pretty simple you know the algorithm of amusing for this it’s

algorithm of amusing for this it’s

algorithm of amusing for this it’s pretty knotty it’s the kind of famous

pretty knotty it’s the kind of famous

pretty knotty it’s the kind of famous standard whiskey dataset and just to

standard whiskey dataset and just to

standard whiskey dataset and just to cover that a little bit it’s a simple

cover that a little bit it’s a simple

cover that a little bit it’s a simple nearest neighbor algorithm so if you

nearest neighbor algorithm so if you

nearest neighbor algorithm so if you have two whiskeys if sorry so all

have two whiskeys if sorry so all

have two whiskeys if sorry so all whiskies are characterized by a set of

whiskies are characterized by a set of

whiskies are characterized by a set of numbers where the numbers correspond to

numbers where the numbers correspond to

numbers where the numbers correspond to a particular feature of that whisky so

a particular feature of that whisky so

a particular feature of that whisky so the features might be smokiness or

the features might be smokiness or

the features might be smokiness or sweetness toffee things like that so

sweetness toffee things like that so

sweetness toffee things like that so what would happen is that it would

what would happen is that it would

what would happen is that it would calculate the distance between someone’s

calculate the distance between someone’s

calculate the distance between someone’s chosen whisky and all of the whiskies

chosen whisky and all of the whiskies

chosen whisky and all of the whiskies and then we would pick the top five or

and then we would pick the top five or

and then we would pick the top five or ten or whatever recommendations based

ten or whatever recommendations based

ten or whatever recommendations based upon that so pretty simple but you know

upon that so pretty simple but you know

upon that so pretty simple but you know works remarkably effectively but the key

works remarkably effectively but the key

works remarkably effectively but the key thing here is got a full continuous

thing here is got a full continuous

thing here is got a full continuous delivery pipeline so all of those stages

delivery pipeline so all of those stages

delivery pipeline so all of those stages have all been implemented with you know

have all been implemented with you know

have all been implemented with you know unit tests and mock data and real data

unit tests and mock data and real data

unit tests and mock data and real data and acceptance tests and I’ve used

and acceptance tests and I’ve used

and acceptance tests and I’ve used Jupiter notebook for the initial

Jupiter notebook for the initial

Jupiter notebook for the initial analysis and we’re able to insert new

analysis and we’re able to insert new

analysis and we’re able to insert new data simply by stuffing it into git and

data simply by stuffing it into git and

data simply by stuffing it into git and then watching it flow through the

then watching it flow through the

then watching it flow through the pipeline so hopefully this is going to

pipeline so hopefully this is going to

pipeline so hopefully this is going to play it is excellent so I’ve made a

play it is excellent so I’ve made a

play it is excellent so I’ve made a video here because as you probably know

video here because as you probably know

video here because as you probably know you know a lot of this takes a lot of

you know a lot of this takes a lot of

you know a lot of this takes a lot of time so now I’m just messing around with

time so now I’m just messing around with

time so now I’m just messing around with terraform creating my new infrastructure

terraform creating my new infrastructure

terraform creating my new infrastructure for this project and we’re going about

for this project and we’re going about

for this project and we’re going about 10 times speed at the moment labid bla

10 times speed at the moment labid bla

10 times speed at the moment labid bla bla bla bla bla bla bla bla probably all

bla bla bla bla bla bla bla probably all

bla bla bla bla bla bla bla probably all used to to seeing this and then the end

used to to seeing this and then the end

used to to seeing this and then the end result is working server

result is working server

result is working server the cloud with some initial software

the cloud with some initial software

the cloud with some initial software deployed Oh Deary me can you see the

deployed Oh Deary me can you see the

deployed Oh Deary me can you see the bottom of that screen oh you can it’s

bottom of that screen oh you can it’s

bottom of that screen oh you can it’s just this monitor it’s okay so what I’ve

just this monitor it’s okay so what I’ve

just this monitor it’s okay so what I’ve just done there is I’m just fixing books

just done there is I’m just fixing books

just done there is I’m just fixing books because it didn’t work and finally we’ve

because it didn’t work and finally we’ve

because it didn’t work and finally we’ve got our algorithm actually working so

got our algorithm actually working so

got our algorithm actually working so this is running out of the container and

this is running out of the container and

this is running out of the container and when I curl the container then I get my

when I curl the container then I get my

when I curl the container then I get my recommendations back so a simple REST

recommendations back so a simple REST

recommendations back so a simple REST API testing you know a passed in mcallen

API testing you know a passed in mcallen

API testing you know a passed in mcallen I want that’s my favorite whiskey and so

I want that’s my favorite whiskey and so

I want that’s my favorite whiskey and so I’m gonna get these recommendations here

I’m gonna get these recommendations here

I’m gonna get these recommendations here awesome

awesome

awesome so first job as a software engineer I’ve

so first job as a software engineer I’ve

so first job as a software engineer I’ve figured out that there’s maybe a little

figured out that there’s maybe a little

figured out that there’s maybe a little bug in my code so I’ve got a UCF ass

bug in my code so I’ve got a UCF ass

bug in my code so I’ve got a UCF ass mcallen there and he’s actually returned

mcallen there and he’s actually returned

mcallen there and he’s actually returned Macallan as one of the recommendations

Macallan as one of the recommendations

Macallan as one of the recommendations so that’s a bit pointless so that’s my

so that’s a bit pointless so that’s my

so that’s a bit pointless so that’s my first book I’m gonna go and try and fix

first book I’m gonna go and try and fix

first book I’m gonna go and try and fix that so now I’m just inside the code and

that so now I’m just inside the code and

that so now I’m just inside the code and I’m just going to edit that code I’m

I’m just going to edit that code I’m

I’m just going to edit that code I’m gonna basically ignore that first first

gonna basically ignore that first first

gonna basically ignore that first first value there when I output my

value there when I output my

value there when I output my recommendations and we’re going to write

recommendations and we’re going to write

recommendations and we’re going to write that back then we’re going to push that

that back then we’re going to push that

that back then we’re going to push that to the repository and there we go and

to the repository and there we go and

to the repository and there we go and then we’re going to watch our pipeline

then we’re going to watch our pipeline

then we’re going to watch our pipeline so this is quite cool we’ve got a

so this is quite cool we’ve got a

so this is quite cool we’ve got a pipeline here where we’ve got all of the

pipeline here where we’ve got all of the

pipeline here where we’ve got all of the tests running in parallel if those tests

tests running in parallel if those tests

tests running in parallel if those tests pass then we go into a registry step

pass then we go into a registry step

pass then we go into a registry step which pushes that that file to a

which pushes that that file to a

which pushes that that file to a registry and then we’ll talk about the

registry and then we’ll talk about the

registry and then we’ll talk about the deploy in a little bit but all of those

deploy in a little bit but all of those

deploy in a little bit but all of those stages is just implemented with a simple

stages is just implemented with a simple

stages is just implemented with a simple yeah more script you know and but the

yeah more script you know and but the

yeah more script you know and but the beauty is is that we’re actually using

beauty is is that we’re actually using

beauty is is that we’re actually using realistic data to test this software

realistic data to test this software

realistic data to test this software which kind of isn’t something that

which kind of isn’t something that

which kind of isn’t something that happens in real life I haven’t noticed

happens in real life I haven’t noticed

happens in real life I haven’t noticed you know so what tends to happen is that

you know so what tends to happen is that

you know so what tends to happen is that you implement it manually and then you

you implement it manually and then you

you implement it manually and then you test it manually and then the software

test it manually and then the software

test it manually and then the software engineers have some sort of dummy data

engineers have some sort of dummy data

engineers have some sort of dummy data that they should use in their tests and

that they should use in their tests and

that they should use in their tests and they have some expected output but it’s

they have some expected output but it’s

they have some expected output but it’s a very small you know it’s usually mock

a very small you know it’s usually mock

a very small you know it’s usually mock data it’s usually not realistic and it’s

data it’s usually not realistic and it’s

data it’s usually not realistic and it’s certainly not real

certainly not real

certainly not real and then at the end of the process the

and then at the end of the process the

and then at the end of the process the the data scientists would come to the

the data scientists would come to the

the data scientists would come to the software engineer and manually test his

software engineer and manually test his

software engineer and manually test his software to see if it’s okay you know

software to see if it’s okay you know

software to see if it’s okay you know it’s it’s a it’s a hugely manual and a

it’s it’s a it’s a hugely manual and a

it’s it’s a it’s a hugely manual and a very poorly managed process if we can

very poorly managed process if we can

very poorly managed process if we can stuff all of that into a pipeline like

stuff all of that into a pipeline like

stuff all of that into a pipeline like we’ve done just here thumbs up okay so

we’ve done just here thumbs up okay so

we’ve done just here thumbs up okay so all our tests have passed it’s now being

all our tests have passed it’s now being

all our tests have passed it’s now being pushed the registry and once it gets

pushed the registry and once it gets

pushed the registry and once it gets pushed to the registry then it will be

pushed to the registry then it will be

pushed to the registry then it will be deployed to the server for this all I’ve

deployed to the server for this all I’ve

deployed to the server for this all I’ve done is just in a real hockey

done is just in a real hockey

done is just in a real hockey let’s SSH into the server and just do a

let’s SSH into the server and just do a

let’s SSH into the server and just do a you know doc Apple docker run which

you know doc Apple docker run which

you know doc Apple docker run which isn’t great if I had more time I’d

isn’t great if I had more time I’d

isn’t great if I had more time I’d probably deploy it to Cuba Nettie’s or

probably deploy it to Cuba Nettie’s or

probably deploy it to Cuba Nettie’s or something but it works

something but it works

something but it works and it demonstrates it quite well so I’m

and it demonstrates it quite well so I’m

and it demonstrates it quite well so I’m just going to watch that container on

just going to watch that container on

just going to watch that container on the on the server now and in a minute

the on the server now and in a minute

the on the server now and in a minute we’ll see that container there you go so

we’ll see that container there you go so

we’ll see that container there you go so now it’s just been deleted and it’s

now it’s just been deleted and it’s

now it’s just been deleted and it’s going to be recreated there so that’s

going to be recreated there so that’s

going to be recreated there so that’s the deployment phase in action so if we

the deployment phase in action so if we

the deployment phase in action so if we now go back and actually test this new

now go back and actually test this new

now go back and actually test this new service and hopefully we should see a

service and hopefully we should see a

service and hopefully we should see a better output I go search for the same

better output I go search for the same

better output I go search for the same thing again account and again and you

thing again account and again and you

thing again account and again and you can see we’ve removed Macallan from the

can see we’ve removed Macallan from the

can see we’ve removed Macallan from the first entry there fan tastic

first entry there fan tastic

first entry there fan tastic and that’s okay and that’s kind of a

and that’s okay and that’s kind of a

and that’s okay and that’s kind of a traditional software task but it’s

traditional software task but it’s

traditional software task but it’s something that a software engineer would

something that a software engineer would

something that a software engineer would normally do not a data scientist so

normally do not a data scientist so

normally do not a data scientist so again they’re the focus here is to try

again they’re the focus here is to try

again they’re the focus here is to try and get the data scientist involved in

and get the data scientist involved in

and get the data scientist involved in the software engineering or vice-versa

the software engineering or vice-versa

the software engineering or vice-versa if now we have a second data scientist

if now we have a second data scientist

if now we have a second data scientist or another engineer that you know what I

or another engineer that you know what I

or another engineer that you know what I don’t like some of your data I’m going

don’t like some of your data I’m going

don’t like some of your data I’m going to change the model so I’m going into

to change the model so I’m going into

to change the model so I’m going into the ipython notebook and I’m looking at

the ipython notebook and I’m looking at

the ipython notebook and I’m looking at what the previous person has done and

what the previous person has done and

what the previous person has done and now you’re just going to see me hacking

now you’re just going to see me hacking

now you’re just going to see me hacking around trying to get something working

around trying to get something working

around trying to get something working but the idea is here that this is the

but the idea is here that this is the

but the idea is here that this is the process that an engineer would normally

process that an engineer would normally

process that an engineer would normally go through when he’s trying to implement

go through when he’s trying to implement

go through when he’s trying to implement a new model or I think in this case I’m

a new model or I think in this case I’m

a new model or I think in this case I’m trying

trying

trying to insert some new data lots of errors

to insert some new data lots of errors

to insert some new data lots of errors lots of errors finally figure out how to

lots of errors finally figure out how to

lots of errors finally figure out how to do it yep still not right data’s wrong

do it yep still not right data’s wrong

do it yep still not right data’s wrong sighs how do I do it how do i there we

sighs how do I do it how do i there we

sighs how do I do it how do i there we go there’s a few minutes there where I

go there’s a few minutes there where I

go there’s a few minutes there where I was on Stack Overflow that’s why the

was on Stack Overflow that’s why the

was on Stack Overflow that’s why the pause was there there we go it’s worked

pause was there there we go it’s worked

pause was there there we go it’s worked so I’ve generated some new data I’ve

so I’ve generated some new data I’ve

so I’ve generated some new data I’ve pushed that to the repository and now

pushed that to the repository and now

pushed that to the repository and now we’re going through the build pipeline

we’re going through the build pipeline

we’re going through the build pipeline again so this is the same build pipeline

again so this is the same build pipeline

again so this is the same build pipeline but with the change of data so we

but with the change of data so we

but with the change of data so we haven’t changed the model now so it’s

haven’t changed the model now so it’s

haven’t changed the model now so it’s important to have the data almost as

important to have the data almost as

important to have the data almost as part of the model if you can maybe the

part of the model if you can maybe the

part of the model if you can maybe the data is too big but if you can it’s

data is too big but if you can it’s

data is too big but if you can it’s really useful to be in there because you

really useful to be in there because you

really useful to be in there because you can catch bugs like this so I think what

can catch bugs like this so I think what

can catch bugs like this so I think what we just saw there was some of the tests

we just saw there was some of the tests

we just saw there was some of the tests failed because the data who become in

failed because the data who become in

failed because the data who become in such a state that it wasn’t giving out

such a state that it wasn’t giving out

such a state that it wasn’t giving out the the output that it should have done

the the output that it should have done

the the output that it should have done so now this time instead of adding data

so now this time instead of adding data

so now this time instead of adding data I’m going to just remove some data so

I’m going to just remove some data so

I’m going to just remove some data so I’ve removed the whiskey and we’ve

I’ve removed the whiskey and we’ve

I’ve removed the whiskey and we’ve reached it I think what I find out now

reached it I think what I find out now

reached it I think what I find out now is that I’ve actually caused some of my

is that I’ve actually caused some of my

is that I’ve actually caused some of my unit tests to fail by removing removing

unit tests to fail by removing removing

unit tests to fail by removing removing one of the whiskies that was in my unit

one of the whiskies that was in my unit

one of the whiskies that was in my unit test so I’m just having to fix that

test so I’m just having to fix that

test so I’m just having to fix that there you go it was a commit they’re

there you go it was a commit they’re

there you go it was a commit they’re saying it’s really working now smiley

saying it’s really working now smiley

saying it’s really working now smiley face and there we go our tests are

face and there we go our tests are

face and there we go our tests are finally passing and once again we’re

finally passing and once again we’re

finally passing and once again we’re going through the registry and we get to

going through the registry and we get to

going through the registry and we get to deploy it there it is come on do it do

deploy it there it is come on do it do

deploy it there it is come on do it do it do it

it’ll get there eventually and and you

it’ll get there eventually and and you know the result is the finally deployed

know the result is the finally deployed

know the result is the finally deployed model there we go finally with the new

model there we go finally with the new

model there we go finally with the new data all without touching the model but

data all without touching the model but

data all without touching the model but still going through the pipeline to

still going through the pipeline to

still going through the pipeline to guarantee that not only our model is

guarantee that not only our model is

guarantee that not only our model is valid but the data makes sense and when

valid but the data makes sense and when

valid but the data makes sense and when we throw different data and new data at

we throw different data and new data at

we throw different data and new data at it it still makes sense so you can

it it still makes sense so you can

it it still makes sense so you can imagine trying to apply this to your own

imagine trying to apply this to your own

imagine trying to apply this to your own stuff that if you if you have got

stuff that if you if you have got

stuff that if you if you have got requirements for like accuracy

requirements for like accuracy

requirements for like accuracy requirements you could make that a hard

requirements you could make that a hard

requirements you could make that a hard and fast rule in your pipeline to fail

and fast rule in your pipeline to fail

and fast rule in your pipeline to fail when your model accuracy decreases to a

when your model accuracy decreases to a

when your model accuracy decreases to a certain point and and that’s it so I

certain point and and that’s it so I

certain point and and that’s it so I think that entire process was probably

think that entire process was probably

think that entire process was probably about an hour I sped it up into about 10

about an hour I sped it up into about 10

about an hour I sped it up into about 10 minutes well that’s probably just due to

minutes well that’s probably just due to

minutes well that’s probably just due to my poor software engineering more than

my poor software engineering more than

my poor software engineering more than anything if you’d like to take a look

anything if you’d like to take a look

anything if you’d like to take a look then just go to the link you can just

then just go to the link you can just

then just go to the link you can just have a look at the slides or come and

have a look at the slides or come and

have a look at the slides or come and see me and we’ll basically search for

see me and we’ll basically search for

see me and we’ll basically search for for window research and we’ll get it

for window research and we’ll get it

for window research and we’ll get it there so with that I’d like to say thank

there so with that I’d like to say thank

there so with that I’d like to say thank you very much just

you very much just

you very much just [Applause]

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *