thank you very much so this is a talk
thank you very much so this is a talk about applying client native principles
about applying client native principles
about applying client native principles to data science in 2016
to data science in 2016
to data science in 2016 Microsoft made a very gutsy move and
Microsoft made a very gutsy move and
Microsoft made a very gutsy move and they released a new breed of chat bot
they released a new breed of chat bot
they released a new breed of chat bot into the public domain
into the public domain
into the public domain the company’s website claimed that it
the company’s website claimed that it
the company’s website claimed that it had been built using relevant public
had been built using relevant public
had been built using relevant public data that it had been modelled cleaned
data that it had been modelled cleaned
data that it had been modelled cleaned and filtered you may have heard of it it
and filtered you may have heard of it it
and filtered you may have heard of it it was called tayi the purpose of the bot
was called tayi the purpose of the bot
was called tayi the purpose of the bot was to respond to tweets in a humanistic
was to respond to tweets in a humanistic
was to respond to tweets in a humanistic manner you could send it questions on
manner you could send it questions on
manner you could send it questions on Twitter using its handle and it did a
Twitter using its handle and it did a
Twitter using its handle and it did a really good a really good job of
really good a really good job of
really good a really good job of answering like her like a youth actually
answering like her like a youth actually
answering like her like a youth actually as I say a youth because I didn’t
as I say a youth because I didn’t
as I say a youth because I didn’t understand a lot of the acronyms are
understand a lot of the acronyms are
understand a lot of the acronyms are used but in a well when it was released
used but in a well when it was released
used but in a well when it was released everything was actually going swimmingly
everything was actually going swimmingly
everything was actually going swimmingly and it worked remarkably well it really
and it worked remarkably well it really
and it worked remarkably well it really did sound like a human as long neither
did sound like a human as long neither
did sound like a human as long neither end but when a big tech company like
end but when a big tech company like
end but when a big tech company like this like this releases a product like
this like this releases a product like
this like this releases a product like this usually they’re the first users of
this usually they’re the first users of
this usually they’re the first users of this service are engineers and given
this service are engineers and given
this service are engineers and given that you’re all engineers in the room
that you’re all engineers in the room
that you’re all engineers in the room would you a test out this service
would you a test out this service
would you a test out this service appreciate it for what it is and you
appreciate it for what it is and you
appreciate it for what it is and you know ask it sensible questions or B
know ask it sensible questions or B
know ask it sensible questions or B would you try and break it would you
would you try and break it would you
would you try and break it would you send it the most horrific things that
send it the most horrific things that
send it the most horrific things that you could think of in order to try and
you could think of in order to try and
you could think of in order to try and force it to give us the answer well you
force it to give us the answer well you
force it to give us the answer well you know engineers are a sadistic bunch and
know engineers are a sadistic bunch and
know engineers are a sadistic bunch and you can guess which option they chose
you can guess which option they chose
you can guess which option they chose the bot went from a mild-mannered well
the bot went from a mild-mannered well
the bot went from a mild-mannered well answering chat bot to a sexist racist
answering chat bot to a sexist racist
answering chat bot to a sexist racist genocide all Nazi in about 24 hours
genocide all Nazi in about 24 hours
genocide all Nazi in about 24 hours you’ve got a collection of tweets you
you’ve got a collection of tweets you
you’ve got a collection of tweets you can see there where it started off you
can see there where it started off you
can see there where it started off you know looking quite good and we ended up
know looking quite good and we ended up
know looking quite good and we ended up with Hitler you know if you end up with
with Hitler you know if you end up with
with Hitler you know if you end up with Hitler you know it’s gone wrong one of
Hitler you know it’s gone wrong one of
Hitler you know it’s gone wrong one of my favorite tweets actually was about a
my favorite tweets actually was about a
my favorite tweets actually was about a British comedian called Ricky Gervais
British comedian called Ricky Gervais
British comedian called Ricky Gervais and it had a very you know
and it had a very you know
and it had a very you know decent-enough question is ricky gervais
decent-enough question is ricky gervais
decent-enough question is ricky gervais an atheist the response ricky gervais
an atheist the response ricky gervais
an atheist the response ricky gervais slant totalitarianism from adolf hitler
slant totalitarianism from adolf hitler
slant totalitarianism from adolf hitler the inventor of atheism now for all I
the inventor of atheism now for all I
the inventor of atheism now for all I know about Hitler I don’t think that’s
know about Hitler I don’t think that’s
know about Hitler I don’t think that’s his most famous trait but I’ll give it
his most famous trait but I’ll give it
his most famous trait but I’ll give it 10 out of 10 for you no imagination for
10 out of 10 for you no imagination for
10 out of 10 for you no imagination for that one and ultimately the result of
that one and ultimately the result of
that one and ultimately the result of this wonderful experiment 24 hours later
this wonderful experiment 24 hours later
this wonderful experiment 24 hours later it was dead gone and although that’s
it was dead gone and although that’s
it was dead gone and although that’s quite a hilarious story I’m actually
quite a hilarious story I’m actually
quite a hilarious story I’m actually quite impressed with Microsoft it was a
quite impressed with Microsoft it was a
quite impressed with Microsoft it was a very gutsy move to allow this to happen
very gutsy move to allow this to happen
very gutsy move to allow this to happen they managed to deliver something that
they managed to deliver something that
they managed to deliver something that was really quite impressive but I think
was really quite impressive but I think
was really quite impressive but I think I think and this is just speculation
I think and this is just speculation
I think and this is just speculation that I think that might some of
that I think that might some of
that I think that might some of Microsoft’s traditional organizational
Microsoft’s traditional organizational
Microsoft’s traditional organizational stuff got in the way I think that the
stuff got in the way I think that the
stuff got in the way I think that the people if people were able to spot these
people if people were able to spot these
people if people were able to spot these problems and they were in a position to
problems and they were in a position to
problems and they were in a position to be able to spot these problems then they
be able to spot these problems then they
be able to spot these problems then they could have stopped it before something
could have stopped it before something
could have stopped it before something like this happened and that’s really
like this happened and that’s really
like this happened and that’s really about what this talk is about today so
about what this talk is about today so
about what this talk is about today so in normal life tradition is a fantastic
in normal life tradition is a fantastic
in normal life tradition is a fantastic in important part of of culture cultural
in important part of of culture cultural
in important part of of culture cultural meme but in engineering it’s actually
meme but in engineering it’s actually
meme but in engineering it’s actually the harbor of bad habits you know if we
the harbor of bad habits you know if we
the harbor of bad habits you know if we stick to traditions then we tend to
stick to traditions then we tend to
stick to traditions then we tend to repeat the same mistakes I used to work
repeat the same mistakes I used to work
repeat the same mistakes I used to work as a moor in the data science field and
as a moor in the data science field and
as a moor in the data science field and in software engineering and what we used
in software engineering and what we used
in software engineering and what we used to do was I would go away and I would
to do was I would go away and I would
to do was I would go away and I would write my models and do my research and
write my models and do my research and
write my models and do my research and then the only thing that everybody else
then the only thing that everybody else
then the only thing that everybody else would see was just this massive code
would see was just this massive code
would see was just this massive code which I would throw to software
which I would throw to software
which I would throw to software engineers and I would say there you go
engineers and I would say there you go
engineers and I would say there you go software engineers I finished my job now
software engineers I finished my job now
software engineers I finished my job now it’s your turn you implement it and
it’s your turn you implement it and
it’s your turn you implement it and obviously you know most of the time that
obviously you know most of the time that
obviously you know most of the time that just didn’t work some of the time it
just didn’t work some of the time it
just didn’t work some of the time it partially worked but it never worked as
partially worked but it never worked as
partially worked but it never worked as well as it should have done I actually
well as it should have done I actually
well as it should have done I actually spoke to a client the other day and he
spoke to a client the other day and he
spoke to a client the other day and he was worried that he had paid for a
was worried that he had paid for a
was worried that he had paid for a project for his company using he sent
project for his company using he sent
project for his company using he sent some data off to a research arm and they
some data off to a research arm and they
some data off to a research arm and they he was worried that
he was worried that
he was worried that the the work that these researchers were
the the work that these researchers were
the the work that these researchers were doing were kind of not really applicable
doing were kind of not really applicable
doing were kind of not really applicable in real life they were the words he used
in real life they were the words he used
in real life they were the words he used he thought it was a bit too academic
he thought it was a bit too academic
he thought it was a bit too academic with the words he used and what he meant
with the words he used and what he meant
with the words he used and what he meant was the the types of things that they
was the the types of things that they
was the the types of things that they were coming up with we’re not really
were coming up with we’re not really
were coming up with we’re not really realistic and relevant to you know
realistic and relevant to you know
realistic and relevant to you know modern-day industrial software so yeah
modern-day industrial software so yeah
modern-day industrial software so yeah tradition is a bit of a problem but
tradition is a bit of a problem but
tradition is a bit of a problem but traditionally data scientists have
traditionally data scientists have
traditionally data scientists have worked towards a certain type of model
worked towards a certain type of model
worked towards a certain type of model this is a model because the called the
this is a model because the called the
this is a model because the called the cross industry standard practice with
cross industry standard practice with
cross industry standard practice with data mining and this is the nearest
data mining and this is the nearest
data mining and this is the nearest thing we’ve got to you know a process in
thing we’ve got to you know a process in
thing we’ve got to you know a process in data science you’ll see that there’s
data science you’ll see that there’s
data science you’ll see that there’s lots of loops in this process and that’s
lots of loops in this process and that’s
lots of loops in this process and that’s just indicative of the fact that most of
just indicative of the fact that most of
just indicative of the fact that most of data science is kind of open-ended and
data science is kind of open-ended and
data science is kind of open-ended and continuous it never really stops the
continuous it never really stops the
continuous it never really stops the problem with this is that pretty much
problem with this is that pretty much
problem with this is that pretty much all of these steps are you know the very
all of these steps are you know the very
all of these steps are you know the very individual individual and they don’t
individual individual and they don’t
individual individual and they don’t scale very well so the first problem is
scale very well so the first problem is
scale very well so the first problem is the deployment face as I just said when
the deployment face as I just said when
the deployment face as I just said when I was a data scientist I would throw my
I was a data scientist I would throw my
I was a data scientist I would throw my models over to the software engineers
models over to the software engineers
models over to the software engineers and then I would never see it ever again
and then I would never see it ever again
and then I would never see it ever again I kind of think this is probably
I kind of think this is probably
I kind of think this is probably something that happened at Microsoft we
something that happened at Microsoft we
something that happened at Microsoft we get the software engineers that have not
get the software engineers that have not
get the software engineers that have not been trained in data science we give
been trained in data science we give
been trained in data science we give them I give them you know poorly
them I give them you know poorly
them I give them you know poorly documented uninterpretable code and
documented uninterpretable code and
documented uninterpretable code and expect them to understand it and
expect them to understand it and
expect them to understand it and implement it efficiently it’s it’s never
implement it efficiently it’s it’s never
implement it efficiently it’s it’s never going to happen
going to happen
going to happen and then we start going through the
and then we start going through the
and then we start going through the other parts of the model the first is
other parts of the model the first is
other parts of the model the first is data understanding this is a major issue
data understanding this is a major issue
data understanding this is a major issue in in data science because the data is
in in data science because the data is
in in data science because the data is the most impart the most important part
the most impart the most important part
the most impart the most important part of the problem and the data
of the problem and the data
of the problem and the data understanding part we rely on domain
understanding part we rely on domain
understanding part we rely on domain experts in order to interpret the the
experts in order to interpret the the
experts in order to interpret the the data so we had a great talk earlier on
data so we had a great talk earlier on
data so we had a great talk earlier on from Feynman about how he was working on
from Feynman about how he was working on
from Feynman about how he was working on the music domain he was working with if
the music domain he was working with if
the music domain he was working with if you weren’t there he was working with
you weren’t there he was working with
you weren’t there he was working with classical music and to to a normal data
classical music and to to a normal data
classical music and to to a normal data scientist you would know nothing about
scientist you would know nothing about
scientist you would know nothing about the terminology used in music but
the terminology used in music but
the terminology used in music but throughout his years of doing this
throughout his years of doing this
throughout his years of doing this research he finally became a domain
research he finally became a domain
research he finally became a domain but it took years and years of work and
but it took years and years of work and
but it took years and years of work and years of time so that’s a good example
years of time so that’s a good example
years of time so that’s a good example of that data preparation is another
of that data preparation is another
of that data preparation is another problem area because it’s often done by
problem area because it’s often done by
problem area because it’s often done by one person and it’s only done once so
one person and it’s only done once so
one person and it’s only done once so what happens is you get one guy that
what happens is you get one guy that
what happens is you get one guy that really delves into the data and they’re
really delves into the data and they’re
really delves into the data and they’re the ones that really understand it but
the ones that really understand it but
the ones that really understand it but it’s kind of it’s hard to reproduce
it’s kind of it’s hard to reproduce
it’s kind of it’s hard to reproduce because only one person understands it
because only one person understands it
because only one person understands it and the output is again just this
and the output is again just this
and the output is again just this amorphous blob of software which takes
amorphous blob of software which takes
amorphous blob of software which takes messy data and spits out good data and
messy data and spits out good data and
messy data and spits out good data and finally modeling this is a little bit
finally modeling this is a little bit
finally modeling this is a little bit easier to reason about if you have more
easier to reason about if you have more
easier to reason about if you have more and more people understanding the types
and more people understanding the types
and more people understanding the types of models used and how to do modeling
of models used and how to do modeling
of models used and how to do modeling but the issue with modeling is that
but the issue with modeling is that
but the issue with modeling is that again this is a very one person only
again this is a very one person only
again this is a very one person only process because the the process usually
process because the the process usually
process because the the process usually involves trying lots of different things
involves trying lots of different things
involves trying lots of different things basically picking the best one but that
basically picking the best one but that
basically picking the best one but that process that trial and error is never
process that trial and error is never
process that trial and error is never recorded anywhere the only output is the
recorded anywhere the only output is the
recorded anywhere the only output is the model you know that is the answer so if
model you know that is the answer so if
model you know that is the answer so if we want to scale this past one person
we want to scale this past one person
we want to scale this past one person then we need to there that usually what
then we need to there that usually what
then we need to there that usually what happens is the second person repeats all
happens is the second person repeats all
happens is the second person repeats all the same mistakes and that actually ends
the same mistakes and that actually ends
the same mistakes and that actually ends up at a different result because you
up at a different result because you
up at a different result because you know his biases and his preferences to
know his biases and his preferences to
know his biases and his preferences to us algorithms usually ends up in a
us algorithms usually ends up in a
us algorithms usually ends up in a different model so you know the the
different model so you know the the
different model so you know the the whole part of that side of the model is
whole part of that side of the model is
whole part of that side of the model is kind of like a murky canal you know it’s
kind of like a murky canal you know it’s
kind of like a murky canal you know it’s like a mucky Amsterdam canal you know
like a mucky Amsterdam canal you know
like a mucky Amsterdam canal you know the ships can go off and down but you
the ships can go off and down but you
the ships can go off and down but you wouldn’t want to jump in and and follow
wouldn’t want to jump in and and follow
wouldn’t want to jump in and and follow it and then if at the end of all that
it and then if at the end of all that
it and then if at the end of all that you know we’ve got the operation side
you know we’ve got the operation side
you know we’ve got the operation side the deployment side we’ve got the vast
the deployment side we’ve got the vast
the deployment side we’ve got the vast majority of the data science research
majority of the data science research
majority of the data science research phase that’s not going well actually the
phase that’s not going well actually the
phase that’s not going well actually the vast majority of projects fail because
vast majority of projects fail because
vast majority of projects fail because there’s a lack of business understanding
there’s a lack of business understanding
there’s a lack of business understanding and that’s either because the business
and that’s either because the business
and that’s either because the business doesn’t understand the technical
doesn’t understand the technical
doesn’t understand the technical implications that they’re proposing or
implications that they’re proposing or
implications that they’re proposing or the tech guys don’t understand the
the tech guys don’t understand the
the tech guys don’t understand the business problem enough so a whole host
business problem enough so a whole host
business problem enough so a whole host of problems so I think what I’m going to
of problems so I think what I’m going to
of problems so I think what I’m going to do now is I’m gonna ignore the business
do now is I’m gonna ignore the business
do now is I’m gonna ignore the business side a little bit because that is
side a little bit because that is
side a little bit because that is actually a separate problem in itself
actually a separate problem in itself
actually a separate problem in itself and you’re all tech guys so and gals so
and you’re all tech guys so and gals so
and you’re all tech guys so and gals so I’m just going to stick to
I’m just going to stick to
I’m just going to stick to three distinct phases we’ve got the
three distinct phases we’ve got the
three distinct phases we’ve got the research phase which was the bit that
research phase which was the bit that
research phase which was the bit that talks about you know understanding the
talks about you know understanding the
talks about you know understanding the data massaging the data and producing
data massaging the data and producing
data massaging the data and producing the data in the model I’ve got the build
the data in the model I’ve got the build
the data in the model I’ve got the build face trying to prove we’re doing what
face trying to prove we’re doing what
face trying to prove we’re doing what we’re doing is correct and then the
we’re doing is correct and then the
we’re doing is correct and then the actual deployment phase the bit that we
actual deployment phase the bit that we
actual deployment phase the bit that we want to rush into production
want to rush into production
want to rush into production so yeah the research phase consists of
so yeah the research phase consists of
so yeah the research phase consists of the initial data science that can be
the initial data science that can be
the initial data science that can be anything from performing experiments
anything from performing experiments
anything from performing experiments gathering more data preparation data
gathering more data preparation data
gathering more data preparation data cleaning modeling all that good stuff
cleaning modeling all that good stuff
cleaning modeling all that good stuff this is kind of this is called the
this is kind of this is called the
this is kind of this is called the research phase because it is a very
research phase because it is a very
research phase because it is a very scientific process and the biggest
scientific process and the biggest
scientific process and the biggest problem with that is that it’s it’s
problem with that is that it’s it’s
problem with that is that it’s it’s inherently open-ended and therefore it’s
inherently open-ended and therefore it’s
inherently open-ended and therefore it’s very high-risk so there is a high
very high-risk so there is a high
very high-risk so there is a high probability of failure at this point
probability of failure at this point
probability of failure at this point because you might find that either you
because you might find that either you
because you might find that either you don’t have the data to do the job
don’t have the data to do the job
don’t have the data to do the job properly or you just can’t do the job
properly or you just can’t do the job
properly or you just can’t do the job because it’s you know intractable for
because it’s you know intractable for
because it’s you know intractable for some reason so stepping back a bit
some reason so stepping back a bit
some reason so stepping back a bit believe it or not Britain actually had a
believe it or not Britain actually had a
believe it or not Britain actually had a very rich motoring heritage you might
very rich motoring heritage you might
very rich motoring heritage you might not think it these days you might think
not think it these days you might think
not think it these days you might think of Germany or something like that but
of Germany or something like that but
of Germany or something like that but there’s a manufacturing plant near
there’s a manufacturing plant near
there’s a manufacturing plant near Oxford which started in 1913 so this is
Oxford which started in 1913 so this is
Oxford which started in 1913 so this is a picture from that same manufacturing
a picture from that same manufacturing
a picture from that same manufacturing plant in 1943 and from about then until
plant in 1943 and from about then until
plant in 1943 and from about then until the 1970s it was owned by a company
the 1970s it was owned by a company
the 1970s it was owned by a company called British Leyland this is a picture
called British Leyland this is a picture
called British Leyland this is a picture of their manufacturing line building
of their manufacturing line building
of their manufacturing line building cromwell tanks for world war ii by the
cromwell tanks for world war ii by the
cromwell tanks for world war ii by the time it got to the 70s it was building
time it got to the 70s it was building
time it got to the 70s it was building this little couch you probably all
this little couch you probably all
this little couch you probably all recognize but at the start and during
recognize but at the start and during
recognize but at the start and during the 70’s things started going wrong and
the 70’s things started going wrong and
the 70’s things started going wrong and the the ultimate reason why they went
the the ultimate reason why they went
the the ultimate reason why they went wrong was because there were better
wrong was because there were better
wrong was because there were better cheaper alternatives available other
cheaper alternatives available other
cheaper alternatives available other companies were investing in the
companies were investing in the
companies were investing in the automation of these lines in order to
automation of these lines in order to
automation of these lines in order to produce better quality and cheaper
produce better quality and cheaper
produce better quality and cheaper products and you know when we talk about
products and you know when we talk about
products and you know when we talk about software engineering software
software engineering software
software engineering software engineering or engineering
engineering or engineering
engineering or engineering it’s just converting a process into code
it’s just converting a process into code
it’s just converting a process into code so that we can automate it that’s all is
that’s where I should have said that
that’s where I should have said that word and today I think that data science
word and today I think that data science
word and today I think that data science is actually the automation of the data
is actually the automation of the data
is actually the automation of the data so we’re starting to see like a
so we’re starting to see like a
so we’re starting to see like a three-tier hierarchy here between you
three-tier hierarchy here between you
three-tier hierarchy here between you know we’ve got data science at the
know we’ve got data science at the
know we’ve got data science at the bottom which is taking all the data and
bottom which is taking all the data and
bottom which is taking all the data and automating things based upon that data
automating things based upon that data
automating things based upon that data to feed in to the process so I think the
to feed in to the process so I think the
to feed in to the process so I think the data science and software engineering
data science and software engineering
data science and software engineering actually make a very good fit they go
actually make a very good fit they go
actually make a very good fit they go together very well because the data and
together very well because the data and
together very well because the data and the science feeds into the software
the science feeds into the software
the science feeds into the software which then feeds into the the value that
which then feeds into the the value that
which then feeds into the the value that you’re trying to provide and this is a
you’re trying to provide and this is a
you’re trying to provide and this is a picture of the same manufacturing plant
picture of the same manufacturing plant
picture of the same manufacturing plant in 2013 so this is a hundred years after
in 2013 so this is a hundred years after
in 2013 so this is a hundred years after the the manufacturing plant opened and
the the manufacturing plant opened and
the the manufacturing plant opened and you can now see that it’s far more
you can now see that it’s far more
you can now see that it’s far more automated and it’s basically no humans
automated and it’s basically no humans
automated and it’s basically no humans there and that allows this company to
there and that allows this company to
there and that allows this company to build better more reliable cars and as
build better more reliable cars and as
build better more reliable cars and as you probably know this company’s now
you probably know this company’s now
you probably know this company’s now owned by BMW so you know the the the
owned by BMW so you know the the the
owned by BMW so you know the the the great Golden English company was eaten
great Golden English company was eaten
great Golden English company was eaten up by German manufacturers damn it so so
up by German manufacturers damn it so so
up by German manufacturers damn it so so yeah anyway my point is that I think
yeah anyway my point is that I think
yeah anyway my point is that I think software engineers or the software
software engineers or the software
software engineers or the software engineers are actually in a really good
engineers are actually in a really good
engineers are actually in a really good position to actually push ourselves into
position to actually push ourselves into
position to actually push ourselves into data science not the other way around
data science not the other way around
data science not the other way around because we’ve come away with all of the
because we’ve come away with all of the
because we’ve come away with all of the things that we’ve learned during this
things that we’ve learned during this
things that we’ve learned during this you know more traditional automation
you know more traditional automation
you know more traditional automation phase and we can start applying it to
phase and we can start applying it to
phase and we can start applying it to data science because the fact is like at
data science because the fact is like at
data science because the fact is like at the moment none of this happens in data
the moment none of this happens in data
the moment none of this happens in data science at the moment and this leads me
science at the moment and this leads me
science at the moment and this leads me to data Sciences dirty dirty little
to data Sciences dirty dirty little
to data Sciences dirty dirty little secret and the little secret is that the
secret and the little secret is that the
secret and the little secret is that the vast majority of your effort and time
vast majority of your effort and time
vast majority of your effort and time and engineering skill as a data
and engineering skill as a data
and engineering skill as a data scientist goes into the data just
scientist goes into the data just
scientist goes into the data just messing around with the data incessantly
messing around with the data incessantly
messing around with the data incessantly you know we are fixing problems with the
you know we are fixing problems with the
you know we are fixing problems with the data we are imputing missing values we
data we are imputing missing values we
data we are imputing missing values we are removing invalid data and so on and
are removing invalid data and so on and
are removing invalid data and so on and so on and so on and the vast majority of
so on and so on and the vast majority of
so on and so on and the vast majority of the PUF
the PUF
the PUF the final performance of the model is
the final performance of the model is
the final performance of the model is based upon how much you can improve that
based upon how much you can improve that
based upon how much you can improve that data not on the model so you know we’ve
data not on the model so you know we’ve
data not on the model so you know we’ve had some too great fantastic talks this
had some too great fantastic talks this
had some too great fantastic talks this morning you know all about deep learning
morning you know all about deep learning
morning you know all about deep learning all about very sexy technologies but
all about very sexy technologies but
all about very sexy technologies but that’s a very Silicon Valley problem no
that’s a very Silicon Valley problem no
that’s a very Silicon Valley problem no offense to the Silicon Valley guys but
offense to the Silicon Valley guys but
offense to the Silicon Valley guys but that’s a very Silicon Valley bro for
that’s a very Silicon Valley bro for
that’s a very Silicon Valley bro for everybody else outside of Silicon Valley
everybody else outside of Silicon Valley
everybody else outside of Silicon Valley we’re still living in the world where
we’re still living in the world where
we’re still living in the world where you know it is the simple techniques
you know it is the simple techniques
you know it is the simple techniques that really make a difference and it
that really make a difference and it
that really make a difference and it doesn’t have to be a complex model
doesn’t have to be a complex model
doesn’t have to be a complex model simple things can go a long way and one
simple things can go a long way and one
simple things can go a long way and one of the biggest issues with this process
of the biggest issues with this process
of the biggest issues with this process is that this discovery this fixing the
is that this discovery this fixing the
is that this discovery this fixing the data this understanding the data as I
data this understanding the data as I
data this understanding the data as I said it’s only done by one person so I
said it’s only done by one person so I
said it’s only done by one person so I think actually this is just a problem of
think actually this is just a problem of
think actually this is just a problem of visibility there is very little
visibility there is very little
visibility there is very little visibility within data science there’s
visibility within data science there’s
visibility within data science there’s only generally one person that’s working
only generally one person that’s working
only generally one person that’s working on a problem at a time
on a problem at a time
on a problem at a time and it’s you know it’s very difficult to
and it’s you know it’s very difficult to
and it’s you know it’s very difficult to scale or at least it’s very inefficient
scale or at least it’s very inefficient
scale or at least it’s very inefficient to scale over the years software
to scale over the years software
to scale over the years software engineering has done a really great job
engineering has done a really great job
engineering has done a really great job in improving this because we had exactly
in improving this because we had exactly
in improving this because we had exactly the same problems you know we could
the same problems you know we could
the same problems you know we could distribute binaries quite effectively
distribute binaries quite effectively
distribute binaries quite effectively but when it came to source code we’ve
but when it came to source code we’ve
but when it came to source code we’ve gone through you know decades of trying
gone through you know decades of trying
gone through you know decades of trying to improve the visibility and the
to improve the visibility and the
to improve the visibility and the resiliency of our source code and
resiliency of our source code and
resiliency of our source code and thankfully data science is finally
thankfully data science is finally
thankfully data science is finally starting to get there and these two
starting to get there and these two
starting to get there and these two tools in particular have been very
tools in particular have been very
tools in particular have been very prolific so you probably all know one of
prolific so you probably all know one of
prolific so you probably all know one of those so I’m not going to talk about
those so I’m not going to talk about
those so I’m not going to talk about that but the second is a notebook some
that but the second is a notebook some
that but the second is a notebook some of you probably come across it but you
of you probably come across it but you
of you probably come across it but you may be not so I’ll just I’ll just
may be not so I’ll just I’ll just
may be not so I’ll just I’ll just introduce it so this is Jupiter
introduce it so this is Jupiter
introduce it so this is Jupiter notebooks it’s a an evolution of ipython
notebooks it’s a an evolution of ipython
notebooks it’s a an evolution of ipython notebooks the idea is that inside the
notebooks the idea is that inside the
notebooks the idea is that inside the notebook there is a series of cells and
notebook there is a series of cells and
notebook there is a series of cells and each cell can either be marked down or
each cell can either be marked down or
each cell can either be marked down or it can be code what this is done is it
it can be code what this is done is it
it can be code what this is done is it is single-handedly improved the
is single-handedly improved the
is single-handedly improved the visibility from pretty much zero all the
visibility from pretty much zero all the
visibility from pretty much zero all the way to almost as good as it’s get I
way to almost as good as it’s get I
way to almost as good as it’s get I think this is actually probably better
think this is actually probably better
think this is actually probably better than software engineer in terms of
than software engineer in terms of
than software engineer in terms of visibility what it means that it’s when
visibility what it means that it’s when
visibility what it means that it’s when I’m when I first gets
I’m when I first gets
I’m when I first gets data and I’m doing my analysis I can
data and I’m doing my analysis I can
data and I’m doing my analysis I can document everything I do even the
document everything I do even the
document everything I do even the mistakes I can write the code I can
mistakes I can write the code I can
mistakes I can write the code I can write you know words if I need to and
write you know words if I need to and
write you know words if I need to and whenever anybody else was to repeat that
whenever anybody else was to repeat that
whenever anybody else was to repeat that process they can just come along and
process they can just come along and
process they can just come along and read this like a document if they want
read this like a document if they want
read this like a document if they want to they can come in and they can
to they can come in and they can
to they can come in and they can actually start playing with the code
actually start playing with the code
actually start playing with the code they can start doing tests if they if
they can start doing tests if they if
they can start doing tests if they if you think oh I think your models rubbish
you think oh I think your models rubbish
you think oh I think your models rubbish I’m gonna try another model or I’m gonna
I’m gonna try another model or I’m gonna
I’m gonna try another model or I’m gonna try some different parameters it’s very
try some different parameters it’s very
try some different parameters it’s very easy for someone to just come in change
easy for someone to just come in change
easy for someone to just come in change something and run it so this is a very
something and run it so this is a very
something and run it so this is a very iterative very visual way of doing data
iterative very visual way of doing data
iterative very visual way of doing data science and yeah it’s it’s made a huge
science and yeah it’s it’s made a huge
science and yeah it’s it’s made a huge impact and then when you team it up
impact and then when you team it up
impact and then when you team it up we’ve get and I think we’ve got you know
we’ve get and I think we’ve got you know
we’ve get and I think we’ve got you know the holy grail of repeatability from get
the holy grail of repeatability from get
the holy grail of repeatability from get visibility from jupiter notebooks and
visibility from jupiter notebooks and
visibility from jupiter notebooks and even like like like this is something we
even like like like this is something we
even like like like this is something we don’t take for granted for granted for
don’t take for granted for granted for
don’t take for granted for granted for example like when we’re looking at code
example like when we’re looking at code
example like when we’re looking at code normal normal software code we’re using
normal normal software code we’re using
normal normal software code we’re using you know github and get lab and whatever
you know github and get lab and whatever
you know github and get lab and whatever just that the online viewers to view
just that the online viewers to view
just that the online viewers to view code far more often than we actually
code far more often than we actually
code far more often than we actually think that we are and that alone is is
think that we are and that alone is is
think that we are and that alone is is super super helpful for for the
super super helpful for for the
super super helpful for for the visibility there so and that is fine a
visibility there so and that is fine a
visibility there so and that is fine a very good and a huge advancement for
very good and a huge advancement for
very good and a huge advancement for individual developers but how do we
individual developers but how do we
individual developers but how do we scale it to multiple developers we do
scale it to multiple developers we do
scale it to multiple developers we do that with another project from Jupiter
that with another project from Jupiter
that with another project from Jupiter called Jupiter hub and it’s quite a
called Jupiter hub and it’s quite a
called Jupiter hub and it’s quite a simple architecture as you can see main
simple architecture as you can see main
simple architecture as you can see main parts comprise of HTTP proxy we’ve got
parts comprise of HTTP proxy we’ve got
parts comprise of HTTP proxy we’ve got the individual notebook so this notebook
the individual notebook so this notebook
the individual notebook so this notebook part is the bit I’ve just explained to
part is the bit I’ve just explained to
part is the bit I’ve just explained to you the Jupiter notebook and then we’ve
you the Jupiter notebook and then we’ve
you the Jupiter notebook and then we’ve got a couple of user base stuff in there
got a couple of user base stuff in there
got a couple of user base stuff in there on the left hand side there to handle
on the left hand side there to handle
on the left hand side there to handle the multi-tenant instances but the most
the multi-tenant instances but the most
the multi-tenant instances but the most interesting thing is this thing the
interesting thing is this thing the
interesting thing is this thing the spawner because what we could do is we
spawner because what we could do is we
spawner because what we could do is we can override that spawner and plug in a
can override that spawner and plug in a
can override that spawner and plug in a whole range of tools we can plug in
whole range of tools we can plug in
whole range of tools we can plug in dhaka and we could start spinning up
dhaka and we could start spinning up
dhaka and we could start spinning up docker containers where you can start
docker containers where you can start
docker containers where you can start spinning up you know Mises containers
spinning up you know Mises containers
spinning up you know Mises containers an Orchestrator we could spin it up in
an Orchestrator we could spin it up in
an Orchestrator we could spin it up in some sort of cloud-based environment
some sort of cloud-based environment
some sort of cloud-based environment it’s incredibly incredibly useful
it’s incredibly incredibly useful
it’s incredibly incredibly useful possibly my favorite is we can start
possibly my favorite is we can start
possibly my favorite is we can start kubernetes jobs you know start pods with
kubernetes jobs you know start pods with
kubernetes jobs you know start pods with our own containers in and you know
our own containers in and you know
our own containers in and you know fraught ask software engineers we know
fraught ask software engineers we know
fraught ask software engineers we know that this provides us with a huge amount
that this provides us with a huge amount
that this provides us with a huge amount of flexibility we can simply scale out
of flexibility we can simply scale out
of flexibility we can simply scale out when we need to if you’ve got more
when we need to if you’ve got more
when we need to if you’ve got more developers working on a different
developers working on a different
developers working on a different problem just add more pods if you need
problem just add more pods if you need
problem just add more pods if you need bigger machines just scale out the
bigger machines just scale out the
bigger machines just scale out the number of machines we can select our
number of machines we can select our
number of machines we can select our machines whether we want GPUs or CPUs
machines whether we want GPUs or CPUs
machines whether we want GPUs or CPUs it’s all years ahead of data science so
it’s all years ahead of data science so
it’s all years ahead of data science so we’ve got the visibility we’ve now
we’ve got the visibility we’ve now
we’ve got the visibility we’ve now started to containerize the process so
started to containerize the process so
started to containerize the process so this is you know two core tenants of
this is you know two core tenants of
this is you know two core tenants of cloud native containers visibility now
cloud native containers visibility now
cloud native containers visibility now let’s build on last hour let’s move on
let’s build on last hour let’s move on
let’s build on last hour let’s move on to the build face not build on to the
to the build face not build on to the
to the build face not build on to the move face so we’re teaching on this at
move face so we’re teaching on this at
move face so we’re teaching on this at the start but in the past and continuing
the start but in the past and continuing
the start but in the past and continuing today still happens today data scientist
today still happens today data scientist
today still happens today data scientist is a very general term I don’t
is a very general term I don’t
is a very general term I don’t necessarily mean you know people with
necessarily mean you know people with
necessarily mean you know people with PhDs that are working with high level
PhDs that are working with high level
PhDs that are working with high level tools deep learning this that neither
tools deep learning this that neither
tools deep learning this that neither just normal people just working with
just normal people just working with
just normal people just working with normal data they come up with an idea
normal data they come up with an idea
normal data they come up with an idea maybe a simple model they throw it over
maybe a simple model they throw it over
maybe a simple model they throw it over to the software engineers this is
to the software engineers this is
to the software engineers this is completely analogous to where software
completely analogous to where software
completely analogous to where software engineering it was about 10 years ago
engineering it was about 10 years ago
engineering it was about 10 years ago software engineers would take their
software engineers would take their
software engineers would take their binary their software throw it over to
binary their software throw it over to
binary their software throw it over to the Ops guys and we’re combating that
the Ops guys and we’re combating that
the Ops guys and we’re combating that with the idea of DevOps
with the idea of DevOps
with the idea of DevOps so I think that there’s an equivalent
so I think that there’s an equivalent
so I think that there’s an equivalent shift that needs to happen with data
shift that needs to happen with data
shift that needs to happen with data scientists the data scientists need to
scientists the data scientists need to
scientists the data scientists need to be become more integrated with the
be become more integrated with the
be become more integrated with the software engineers and ultimately more
software engineers and ultimately more
software engineers and ultimately more integrated with the ops people as well
integrated with the ops people as well
integrated with the ops people as well so you know data ops if you were and at
so you know data ops if you were and at
so you know data ops if you were and at best if we don’t have that at best we
best if we don’t have that at best we
best if we don’t have that at best we have inefficient models but at worst and
have inefficient models but at worst and
have inefficient models but at worst and you know what’s what’s more likely to
you know what’s what’s more likely to
you know what’s what’s more likely to happen is that things don’t happen at
happen is that things don’t happen at
happen is that things don’t happen at all and if you don’t get that transition
all and if you don’t get that transition
all and if you don’t get that transition right then you just end up with our
right then you just end up with our
right then you just end up with our products and talking of AI and robots I
products and talking of AI and robots I
products and talking of AI and robots I love that video
love that video
love that video the lipstick robot Simone get hilarious
the lipstick robot Simone get hilarious
the lipstick robot Simone get hilarious ah you’re all boring I find that funny
ah you’re all boring I find that funny
ah you’re all boring I find that funny I’m gonna I’m gonna laugh and so how do
I’m gonna I’m gonna laugh and so how do
I’m gonna I’m gonna laugh and so how do we improve this well like like like we
we improve this well like like like we
we improve this well like like like we saw from the devops transition from from
saw from the devops transition from from
saw from the devops transition from from therefore knops much of the problem is
therefore knops much of the problem is
therefore knops much of the problem is actually a people problem it’s it’s
actually a people problem it’s it’s
actually a people problem it’s it’s about getting people to accept their
about getting people to accept their
about getting people to accept their role is changing and it’s changing for
role is changing and it’s changing for
role is changing and it’s changing for the better for the benefit of everyone
the better for the benefit of everyone
the better for the benefit of everyone and that’s and that’s okay but it’s a
and that’s and that’s okay but it’s a
and that’s and that’s okay but it’s a bit boring what we can do technically is
bit boring what we can do technically is
bit boring what we can do technically is start to enforce quality we can enforce
start to enforce quality we can enforce
start to enforce quality we can enforce quality with surprise-surprise
quality with surprise-surprise
quality with surprise-surprise continuous deployment continuous
continuous deployment continuous
continuous deployment continuous integration this is a classic continuous
integration this is a classic continuous
integration this is a classic continuous delivery pipeline the I’m sure you all
delivery pipeline the I’m sure you all
delivery pipeline the I’m sure you all know this the engineer you know would
know this the engineer you know would
know this the engineer you know would commit is code it will go into a build
commit is code it will go into a build
commit is code it will go into a build server it would run through pipeline be
server it would run through pipeline be
server it would run through pipeline be deployed into production now the
deployed into production now the
deployed into production now the pipeline is possibly the most important
pipeline is possibly the most important
pipeline is possibly the most important part of this entire process and it needs
part of this entire process and it needs
part of this entire process and it needs to be customized to your domain and your
to be customized to your domain and your
to be customized to your domain and your problem
problem
problem I like I always like talking about the
I like I always like talking about the
I like I always like talking about the testing triangle this is quite common in
testing triangle this is quite common in
testing triangle this is quite common in the CI literature if you haven’t seen it
the CI literature if you haven’t seen it
the CI literature if you haven’t seen it before it’s an image where on the x-axis
before it’s an image where on the x-axis
before it’s an image where on the x-axis we’ve got the number of tests on the
we’ve got the number of tests on the
we’ve got the number of tests on the y-axis
y-axis
y-axis we’ve got like the scope or the depth of
we’ve got like the scope or the depth of
we’ve got like the scope or the depth of the test so at the bottom we’ve got unit
the test so at the bottom we’ve got unit
the test so at the bottom we’ve got unit tests who we have very large numbers of
tests who we have very large numbers of
tests who we have very large numbers of unit tests that are telling testing very
unit tests that are telling testing very
unit tests that are telling testing very small bits of code all the way up to the
small bits of code all the way up to the
small bits of code all the way up to the top where we have very few tests
top where we have very few tests
top where we have very few tests acceptance tests but they’re testing a
acceptance tests but they’re testing a
acceptance tests but they’re testing a huge amount of code and you know that
huge amount of code and you know that
huge amount of code and you know that testing process is possibly the most
testing process is possibly the most
testing process is possibly the most important part of the build phase if you
important part of the build phase if you
important part of the build phase if you don’t test your models then you end up
don’t test your models then you end up
don’t test your models then you end up with something like this this is my
with something like this this is my
with something like this this is my colleague he was trying to book a flight
colleague he was trying to book a flight
colleague he was trying to book a flight from Amsterdam to Prague I think and
from Amsterdam to Prague I think and
from Amsterdam to Prague I think and kayak kindly recommended the flight me
kayak kindly recommended the flight me
kayak kindly recommended the flight me you know that was a direct output of one
you know that was a direct output of one
you know that was a direct output of one of their recommendations models it was
of their recommendations models it was
of their recommendations models it was me so that was for this guy he obviously
me so that was for this guy he obviously
me so that was for this guy he obviously couldn’t book a flight
couldn’t book a flight
couldn’t book a flight they lost his revenue they lost his
they lost his revenue they lost his
they lost his revenue they lost his money I dread to imagine how many other
money I dread to imagine how many other
money I dread to imagine how many other people were using the site at the same
people were using the site at the same
people were using the site at the same time and they all received big me and
time and they all received big me and
time and they all received big me and they must lost a lot of money I think if
they must lost a lot of money I think if
they must lost a lot of money I think if that if anything is a clear indication
that if anything is a clear indication
that if anything is a clear indication that that the data science people need
that that the data science people need
that that the data science people need to be more integrated into the
to be more integrated into the
to be more integrated into the operations of their actual software
operations of their actual software
operations of their actual software because they’re the only ones that know
because they’re the only ones that know
because they’re the only ones that know you know how to implement monitoring the
you know how to implement monitoring the
you know how to implement monitoring the best way they know how to fix it if it
best way they know how to fix it if it
best way they know how to fix it if it goes wrong and then we get on to the
goes wrong and then we get on to the
goes wrong and then we get on to the deploy phase and this is a bit more
deploy phase and this is a bit more
deploy phase and this is a bit more difficult to talk about because it’s a
difficult to talk about because it’s a
difficult to talk about because it’s a bit more domain-specific it’s very tech
bit more domain-specific it’s very tech
bit more domain-specific it’s very tech stack specific so it depends what
stack specific so it depends what
stack specific so it depends what technology stack you’re using but I can
technology stack you’re using but I can
technology stack you’re using but I can generalize it a little bit by talking
generalize it a little bit by talking
generalize it a little bit by talking about containers but I mean ultimately
about containers but I mean ultimately
about containers but I mean ultimately the the goals are exactly the same
the the goals are exactly the same
the the goals are exactly the same we want our software to be reactive
we want our software to be reactive
we want our software to be reactive resilient and reproducible we want it to
resilient and reproducible we want it to
resilient and reproducible we want it to be reactive so that when we have changes
be reactive so that when we have changes
be reactive so that when we have changes to the outside world we can scale up and
to the outside world we can scale up and
to the outside world we can scale up and scale down as accordingly we want it to
scale down as accordingly we want it to
scale down as accordingly we want it to be resilient so if it ever fails in ease
be resilient so if it ever fails in ease
be resilient so if it ever fails in ease automatically repair itself and
automatically repair itself and
automatically repair itself and reproducible if we can quickly reproduce
reproducible if we can quickly reproduce
reproducible if we can quickly reproduce our cluster in another location of a
our cluster in another location of a
our cluster in another location of a testing or something that improves
testing or something that improves
testing or something that improves testability and that kind of represents
testability and that kind of represents
testability and that kind of represents this tiny little arrow in the in the
this tiny little arrow in the in the
this tiny little arrow in the in the build pipeline and even in continuous
build pipeline and even in continuous
build pipeline and even in continuous delivery this is often overlooked and
delivery this is often overlooked and
delivery this is often overlooked and it’s always represented by a little
it’s always represented by a little
it’s always represented by a little arrow as if it’s like this simple thing
arrow as if it’s like this simple thing
arrow as if it’s like this simple thing where you just push it to production
where you just push it to production
where you just push it to production flowers smiley faces done and it’s never
flowers smiley faces done and it’s never
flowers smiley faces done and it’s never like that it’s kind of it’s a bit more
like that it’s kind of it’s a bit more
like that it’s kind of it’s a bit more difficult and far more specific and
difficult and far more specific and
difficult and far more specific and there’s a lot of engineering effort that
there’s a lot of engineering effort that
there’s a lot of engineering effort that you spent you know trying to push this
you spent you know trying to push this
you spent you know trying to push this out to production for data science land
out to production for data science land
out to production for data science land one of the easiest things we can do is
one of the easiest things we can do is
one of the easiest things we can do is bring in containers again you know so
bring in containers again you know so
bring in containers again you know so how do you do that well you know you you
how do you do that well you know you you
how do you do that well you know you you have some sort of model you can quite
have some sort of model you can quite
have some sort of model you can quite easily stuff that into a container and
easily stuff that into a container and
easily stuff that into a container and if you’ve just got interfaces and
if you’ve just got interfaces and
if you’ve just got interfaces and rooters then they’re all pretty
rooters then they’re all pretty
rooters then they’re all pretty standardized once you’ve got to that
standardized once you’ve got to that
standardized once you’ve got to that point then it becomes much easier to to
point then it becomes much easier to to
point then it becomes much easier to to not only make sure it runs on your
not only make sure it runs on your
not only make sure it runs on your machine and it works the same way in
machine and it works the same way in
machine and it works the same way in production but also it’s easier for
production but also it’s easier for
production but also it’s easier for other people to reason about as well
other people to reason about as well
other people to reason about as well because here you know you’re reducing
because here you know you’re reducing
because here you know you’re reducing the domain that people have to
the domain that people have to
the domain that people have to understand in order to use your service
understand in order to use your service
understand in order to use your service and that model can be anything it could
and that model can be anything it could
and that model can be anything it could be you know just a simple Python model
be you know just a simple Python model
be you know just a simple Python model it can be Fianna derivative tensorflow
it can be Fianna derivative tensorflow
it can be Fianna derivative tensorflow whatever and if you’re into sort of more
whatever and if you’re into sort of more
whatever and if you’re into sort of more streaming technologies and you know we
streaming technologies and you know we
streaming technologies and you know we can easily apply streaming technologies
can easily apply streaming technologies
can easily apply streaming technologies here as well if we just package up the
here as well if we just package up the
here as well if we just package up the whatever it is in your particular
whatever it is in your particular
whatever it is in your particular streaming X streaming package that
streaming X streaming package that
streaming X streaming package that you’re using and like a source or spark
you’re using and like a source or spark
you’re using and like a source or spark executor it’s still perfectly reasonable
executor it’s still perfectly reasonable
executor it’s still perfectly reasonable to do that and that fits really nicely
to do that and that fits really nicely
to do that and that fits really nicely into the testing triangle because we can
into the testing triangle because we can
into the testing triangle because we can build that container as part of our
build that container as part of our
build that container as part of our delivery pipeline and start testing that
delivery pipeline and start testing that
delivery pipeline and start testing that container as opposed to just testing the
container as opposed to just testing the
container as opposed to just testing the code itself so you know it’s all fairly
code itself so you know it’s all fairly
code itself so you know it’s all fairly standard stuff everybody aims for but
standard stuff everybody aims for but
standard stuff everybody aims for but it’s amazing at how much this doesn’t
it’s amazing at how much this doesn’t
it’s amazing at how much this doesn’t happen in real life in data science and
happen in real life in data science and
happen in real life in data science and then finally we can simply stuff that
then finally we can simply stuff that
then finally we can simply stuff that container into production however you
container into production however you
container into production however you want you know using some sort of
want you know using some sort of
want you know using some sort of Orchestrator or you know if you’re using
Orchestrator or you know if you’re using
Orchestrator or you know if you’re using some sort of streaming based system
some sort of streaming based system
some sort of streaming based system selecting GPUs and CPUs it’s the
selecting GPUs and CPUs it’s the
selecting GPUs and CPUs it’s the ultimate in flexibility if it works
ultimate in flexibility if it works
ultimate in flexibility if it works there if it works on your laptop it
there if it works on your laptop it
there if it works on your laptop it doesn’t matter and just to finally push
doesn’t matter and just to finally push
doesn’t matter and just to finally push home one of the this is a slightly
home one of the this is a slightly
home one of the this is a slightly different domain but and I know there’s
different domain but and I know there’s
different domain but and I know there’s a few thought works people here today so
a few thought works people here today so
a few thought works people here today so I’ve got to be a little bit careful
I’ve got to be a little bit careful
I’ve got to be a little bit careful there are a great company an amazing
there are a great company an amazing
there are a great company an amazing company but their marketing department I
company but their marketing department I
company but their marketing department I think also needs to be integrated in
think also needs to be integrated in
think also needs to be integrated in into production as well because they
into production as well because they
into production as well because they sent out this email last week and I
sent out this email last week and I
sent out this email last week and I would be really interested in finding
would be really interested in finding
would be really interested in finding out what thought works seismic shits I
out what thought works seismic shits I
out what thought works seismic shits I find that really fascinating actually I
find that really fascinating actually I
find that really fascinating actually I think this is a genius move by the
think this is a genius move by the
think this is a genius move by the marketing department because so many
marketing department because so many
marketing department because so many people were talking about this in the
people were talking about this in the
people were talking about this in the office and I think that’s done far more
office and I think that’s done far more
office and I think that’s done far more for thought works than than anything
for thought works than than anything
for thought works than than anything they could have sent out so well done
they could have sent out so well done
they could have sent out so well done that marketing person that made that
that marketing person that made that
that marketing person that made that okay so now I have a quick demo
okay so now I have a quick demo
okay so now I have a quick demo demonstrating all of these concepts
demonstrating all of these concepts
demonstrating all of these concepts together
together
together I’ve tried think of a simple example my
I’ve tried think of a simple example my
I’ve tried think of a simple example my example is a a whisky shop so my
example is a a whisky shop so my
example is a a whisky shop so my business requirement is I have a client
business requirement is I have a client
business requirement is I have a client which is a whisky shop because I think
which is a whisky shop because I think
which is a whisky shop because I think whisky and their
whisky and their
whisky and their have come to me because they want to
have come to me because they want to
have come to me because they want to provide a USP in the fact that they can
provide a USP in the fact that they can
provide a USP in the fact that they can recommend better whiskies than anybody
recommend better whiskies than anybody
recommend better whiskies than anybody else but the problem is they want this
else but the problem is they want this
else but the problem is they want this to be able to scale they can’t really
to be able to scale they can’t really
to be able to scale they can’t really afford to employ whiskey experts every
afford to employ whiskey experts every
afford to employ whiskey experts every single one of their shops so it’s much
single one of their shops so it’s much
single one of their shops so it’s much more efficient to write an algorithm to
more efficient to write an algorithm to
more efficient to write an algorithm to do that for them so their requirements
do that for them so their requirements
do that for them so their requirements are they want somebody to pass a
are they want somebody to pass a
are they want somebody to pass a favorite whiskey in and they want
favorite whiskey in and they want
favorite whiskey in and they want recommendations out they want to start
recommendations out they want to start
recommendations out they want to start off with a limited set of whiskey’s but
off with a limited set of whiskey’s but
off with a limited set of whiskey’s but want to be able to update their data in
want to be able to update their data in
want to be able to update their data in the model in the future this is all
the model in the future this is all
the model in the future this is all available on my get repository you can
available on my get repository you can
available on my get repository you can get that for it’s all open source and
get that for it’s all open source and
get that for it’s all open source and it’s it’s pretty simple you know the
it’s it’s pretty simple you know the
it’s it’s pretty simple you know the algorithm of amusing for this it’s
algorithm of amusing for this it’s
algorithm of amusing for this it’s pretty knotty it’s the kind of famous
pretty knotty it’s the kind of famous
pretty knotty it’s the kind of famous standard whiskey dataset and just to
standard whiskey dataset and just to
standard whiskey dataset and just to cover that a little bit it’s a simple
cover that a little bit it’s a simple
cover that a little bit it’s a simple nearest neighbor algorithm so if you
nearest neighbor algorithm so if you
nearest neighbor algorithm so if you have two whiskeys if sorry so all
have two whiskeys if sorry so all
have two whiskeys if sorry so all whiskies are characterized by a set of
whiskies are characterized by a set of
whiskies are characterized by a set of numbers where the numbers correspond to
numbers where the numbers correspond to
numbers where the numbers correspond to a particular feature of that whisky so
a particular feature of that whisky so
a particular feature of that whisky so the features might be smokiness or
the features might be smokiness or
the features might be smokiness or sweetness toffee things like that so
sweetness toffee things like that so
sweetness toffee things like that so what would happen is that it would
what would happen is that it would
what would happen is that it would calculate the distance between someone’s
calculate the distance between someone’s
calculate the distance between someone’s chosen whisky and all of the whiskies
chosen whisky and all of the whiskies
chosen whisky and all of the whiskies and then we would pick the top five or
and then we would pick the top five or
and then we would pick the top five or ten or whatever recommendations based
ten or whatever recommendations based
ten or whatever recommendations based upon that so pretty simple but you know
upon that so pretty simple but you know
upon that so pretty simple but you know works remarkably effectively but the key
works remarkably effectively but the key
works remarkably effectively but the key thing here is got a full continuous
thing here is got a full continuous
thing here is got a full continuous delivery pipeline so all of those stages
delivery pipeline so all of those stages
delivery pipeline so all of those stages have all been implemented with you know
have all been implemented with you know
have all been implemented with you know unit tests and mock data and real data
unit tests and mock data and real data
unit tests and mock data and real data and acceptance tests and I’ve used
and acceptance tests and I’ve used
and acceptance tests and I’ve used Jupiter notebook for the initial
Jupiter notebook for the initial
Jupiter notebook for the initial analysis and we’re able to insert new
analysis and we’re able to insert new
analysis and we’re able to insert new data simply by stuffing it into git and
data simply by stuffing it into git and
data simply by stuffing it into git and then watching it flow through the
then watching it flow through the
then watching it flow through the pipeline so hopefully this is going to
pipeline so hopefully this is going to
pipeline so hopefully this is going to play it is excellent so I’ve made a
play it is excellent so I’ve made a
play it is excellent so I’ve made a video here because as you probably know
video here because as you probably know
video here because as you probably know you know a lot of this takes a lot of
you know a lot of this takes a lot of
you know a lot of this takes a lot of time so now I’m just messing around with
time so now I’m just messing around with
time so now I’m just messing around with terraform creating my new infrastructure
terraform creating my new infrastructure
terraform creating my new infrastructure for this project and we’re going about
for this project and we’re going about
for this project and we’re going about 10 times speed at the moment labid bla
10 times speed at the moment labid bla
10 times speed at the moment labid bla bla bla bla bla bla bla bla probably all
bla bla bla bla bla bla bla probably all
bla bla bla bla bla bla bla probably all used to to seeing this and then the end
used to to seeing this and then the end
used to to seeing this and then the end result is working server
result is working server
result is working server the cloud with some initial software
the cloud with some initial software
the cloud with some initial software deployed Oh Deary me can you see the
deployed Oh Deary me can you see the
deployed Oh Deary me can you see the bottom of that screen oh you can it’s
bottom of that screen oh you can it’s
bottom of that screen oh you can it’s just this monitor it’s okay so what I’ve
just this monitor it’s okay so what I’ve
just this monitor it’s okay so what I’ve just done there is I’m just fixing books
just done there is I’m just fixing books
just done there is I’m just fixing books because it didn’t work and finally we’ve
because it didn’t work and finally we’ve
because it didn’t work and finally we’ve got our algorithm actually working so
got our algorithm actually working so
got our algorithm actually working so this is running out of the container and
this is running out of the container and
this is running out of the container and when I curl the container then I get my
when I curl the container then I get my
when I curl the container then I get my recommendations back so a simple REST
recommendations back so a simple REST
recommendations back so a simple REST API testing you know a passed in mcallen
API testing you know a passed in mcallen
API testing you know a passed in mcallen I want that’s my favorite whiskey and so
I want that’s my favorite whiskey and so
I want that’s my favorite whiskey and so I’m gonna get these recommendations here
I’m gonna get these recommendations here
I’m gonna get these recommendations here awesome
awesome
awesome so first job as a software engineer I’ve
so first job as a software engineer I’ve
so first job as a software engineer I’ve figured out that there’s maybe a little
figured out that there’s maybe a little
figured out that there’s maybe a little bug in my code so I’ve got a UCF ass
bug in my code so I’ve got a UCF ass
bug in my code so I’ve got a UCF ass mcallen there and he’s actually returned
mcallen there and he’s actually returned
mcallen there and he’s actually returned Macallan as one of the recommendations
Macallan as one of the recommendations
Macallan as one of the recommendations so that’s a bit pointless so that’s my
so that’s a bit pointless so that’s my
so that’s a bit pointless so that’s my first book I’m gonna go and try and fix
first book I’m gonna go and try and fix
first book I’m gonna go and try and fix that so now I’m just inside the code and
that so now I’m just inside the code and
that so now I’m just inside the code and I’m just going to edit that code I’m
I’m just going to edit that code I’m
I’m just going to edit that code I’m gonna basically ignore that first first
gonna basically ignore that first first
gonna basically ignore that first first value there when I output my
value there when I output my
value there when I output my recommendations and we’re going to write
recommendations and we’re going to write
recommendations and we’re going to write that back then we’re going to push that
that back then we’re going to push that
that back then we’re going to push that to the repository and there we go and
to the repository and there we go and
to the repository and there we go and then we’re going to watch our pipeline
then we’re going to watch our pipeline
then we’re going to watch our pipeline so this is quite cool we’ve got a
so this is quite cool we’ve got a
so this is quite cool we’ve got a pipeline here where we’ve got all of the
pipeline here where we’ve got all of the
pipeline here where we’ve got all of the tests running in parallel if those tests
tests running in parallel if those tests
tests running in parallel if those tests pass then we go into a registry step
pass then we go into a registry step
pass then we go into a registry step which pushes that that file to a
which pushes that that file to a
which pushes that that file to a registry and then we’ll talk about the
registry and then we’ll talk about the
registry and then we’ll talk about the deploy in a little bit but all of those
deploy in a little bit but all of those
deploy in a little bit but all of those stages is just implemented with a simple
stages is just implemented with a simple
stages is just implemented with a simple yeah more script you know and but the
yeah more script you know and but the
yeah more script you know and but the beauty is is that we’re actually using
beauty is is that we’re actually using
beauty is is that we’re actually using realistic data to test this software
realistic data to test this software
realistic data to test this software which kind of isn’t something that
which kind of isn’t something that
which kind of isn’t something that happens in real life I haven’t noticed
happens in real life I haven’t noticed
happens in real life I haven’t noticed you know so what tends to happen is that
you know so what tends to happen is that
you know so what tends to happen is that you implement it manually and then you
you implement it manually and then you
you implement it manually and then you test it manually and then the software
test it manually and then the software
test it manually and then the software engineers have some sort of dummy data
engineers have some sort of dummy data
engineers have some sort of dummy data that they should use in their tests and
that they should use in their tests and
that they should use in their tests and they have some expected output but it’s
they have some expected output but it’s
they have some expected output but it’s a very small you know it’s usually mock
a very small you know it’s usually mock
a very small you know it’s usually mock data it’s usually not realistic and it’s
data it’s usually not realistic and it’s
data it’s usually not realistic and it’s certainly not real
certainly not real
certainly not real and then at the end of the process the
and then at the end of the process the
and then at the end of the process the the data scientists would come to the
the data scientists would come to the
the data scientists would come to the software engineer and manually test his
software engineer and manually test his
software engineer and manually test his software to see if it’s okay you know
software to see if it’s okay you know
software to see if it’s okay you know it’s it’s a it’s a hugely manual and a
it’s it’s a it’s a hugely manual and a
it’s it’s a it’s a hugely manual and a very poorly managed process if we can
very poorly managed process if we can
very poorly managed process if we can stuff all of that into a pipeline like
stuff all of that into a pipeline like
stuff all of that into a pipeline like we’ve done just here thumbs up okay so
we’ve done just here thumbs up okay so
we’ve done just here thumbs up okay so all our tests have passed it’s now being
all our tests have passed it’s now being
all our tests have passed it’s now being pushed the registry and once it gets
pushed the registry and once it gets
pushed the registry and once it gets pushed to the registry then it will be
pushed to the registry then it will be
pushed to the registry then it will be deployed to the server for this all I’ve
deployed to the server for this all I’ve
deployed to the server for this all I’ve done is just in a real hockey
done is just in a real hockey
done is just in a real hockey let’s SSH into the server and just do a
let’s SSH into the server and just do a
let’s SSH into the server and just do a you know doc Apple docker run which
you know doc Apple docker run which
you know doc Apple docker run which isn’t great if I had more time I’d
isn’t great if I had more time I’d
isn’t great if I had more time I’d probably deploy it to Cuba Nettie’s or
probably deploy it to Cuba Nettie’s or
probably deploy it to Cuba Nettie’s or something but it works
something but it works
something but it works and it demonstrates it quite well so I’m
and it demonstrates it quite well so I’m
and it demonstrates it quite well so I’m just going to watch that container on
just going to watch that container on
just going to watch that container on the on the server now and in a minute
the on the server now and in a minute
the on the server now and in a minute we’ll see that container there you go so
we’ll see that container there you go so
we’ll see that container there you go so now it’s just been deleted and it’s
now it’s just been deleted and it’s
now it’s just been deleted and it’s going to be recreated there so that’s
going to be recreated there so that’s
going to be recreated there so that’s the deployment phase in action so if we
the deployment phase in action so if we
the deployment phase in action so if we now go back and actually test this new
now go back and actually test this new
now go back and actually test this new service and hopefully we should see a
service and hopefully we should see a
service and hopefully we should see a better output I go search for the same
better output I go search for the same
better output I go search for the same thing again account and again and you
thing again account and again and you
thing again account and again and you can see we’ve removed Macallan from the
can see we’ve removed Macallan from the
can see we’ve removed Macallan from the first entry there fan tastic
first entry there fan tastic
first entry there fan tastic and that’s okay and that’s kind of a
and that’s okay and that’s kind of a
and that’s okay and that’s kind of a traditional software task but it’s
traditional software task but it’s
traditional software task but it’s something that a software engineer would
something that a software engineer would
something that a software engineer would normally do not a data scientist so
normally do not a data scientist so
normally do not a data scientist so again they’re the focus here is to try
again they’re the focus here is to try
again they’re the focus here is to try and get the data scientist involved in
and get the data scientist involved in
and get the data scientist involved in the software engineering or vice-versa
the software engineering or vice-versa
the software engineering or vice-versa if now we have a second data scientist
if now we have a second data scientist
if now we have a second data scientist or another engineer that you know what I
or another engineer that you know what I
or another engineer that you know what I don’t like some of your data I’m going
don’t like some of your data I’m going
don’t like some of your data I’m going to change the model so I’m going into
to change the model so I’m going into
to change the model so I’m going into the ipython notebook and I’m looking at
the ipython notebook and I’m looking at
the ipython notebook and I’m looking at what the previous person has done and
what the previous person has done and
what the previous person has done and now you’re just going to see me hacking
now you’re just going to see me hacking
now you’re just going to see me hacking around trying to get something working
around trying to get something working
around trying to get something working but the idea is here that this is the
but the idea is here that this is the
but the idea is here that this is the process that an engineer would normally
process that an engineer would normally
process that an engineer would normally go through when he’s trying to implement
go through when he’s trying to implement
go through when he’s trying to implement a new model or I think in this case I’m
a new model or I think in this case I’m
a new model or I think in this case I’m trying
trying
trying to insert some new data lots of errors
to insert some new data lots of errors
to insert some new data lots of errors lots of errors finally figure out how to
lots of errors finally figure out how to
lots of errors finally figure out how to do it yep still not right data’s wrong
do it yep still not right data’s wrong
do it yep still not right data’s wrong sighs how do I do it how do i there we
sighs how do I do it how do i there we
sighs how do I do it how do i there we go there’s a few minutes there where I
go there’s a few minutes there where I
go there’s a few minutes there where I was on Stack Overflow that’s why the
was on Stack Overflow that’s why the
was on Stack Overflow that’s why the pause was there there we go it’s worked
pause was there there we go it’s worked
pause was there there we go it’s worked so I’ve generated some new data I’ve
so I’ve generated some new data I’ve
so I’ve generated some new data I’ve pushed that to the repository and now
pushed that to the repository and now
pushed that to the repository and now we’re going through the build pipeline
we’re going through the build pipeline
we’re going through the build pipeline again so this is the same build pipeline
again so this is the same build pipeline
again so this is the same build pipeline but with the change of data so we
but with the change of data so we
but with the change of data so we haven’t changed the model now so it’s
haven’t changed the model now so it’s
haven’t changed the model now so it’s important to have the data almost as
important to have the data almost as
important to have the data almost as part of the model if you can maybe the
part of the model if you can maybe the
part of the model if you can maybe the data is too big but if you can it’s
data is too big but if you can it’s
data is too big but if you can it’s really useful to be in there because you
really useful to be in there because you
really useful to be in there because you can catch bugs like this so I think what
can catch bugs like this so I think what
can catch bugs like this so I think what we just saw there was some of the tests
we just saw there was some of the tests
we just saw there was some of the tests failed because the data who become in
failed because the data who become in
failed because the data who become in such a state that it wasn’t giving out
such a state that it wasn’t giving out
such a state that it wasn’t giving out the the output that it should have done
the the output that it should have done
the the output that it should have done so now this time instead of adding data
so now this time instead of adding data
so now this time instead of adding data I’m going to just remove some data so
I’m going to just remove some data so
I’m going to just remove some data so I’ve removed the whiskey and we’ve
I’ve removed the whiskey and we’ve
I’ve removed the whiskey and we’ve reached it I think what I find out now
reached it I think what I find out now
reached it I think what I find out now is that I’ve actually caused some of my
is that I’ve actually caused some of my
is that I’ve actually caused some of my unit tests to fail by removing removing
unit tests to fail by removing removing
unit tests to fail by removing removing one of the whiskies that was in my unit
one of the whiskies that was in my unit
one of the whiskies that was in my unit test so I’m just having to fix that
test so I’m just having to fix that
test so I’m just having to fix that there you go it was a commit they’re
there you go it was a commit they’re
there you go it was a commit they’re saying it’s really working now smiley
saying it’s really working now smiley
saying it’s really working now smiley face and there we go our tests are
face and there we go our tests are
face and there we go our tests are finally passing and once again we’re
finally passing and once again we’re
finally passing and once again we’re going through the registry and we get to
going through the registry and we get to
going through the registry and we get to deploy it there it is come on do it do
deploy it there it is come on do it do
deploy it there it is come on do it do it do it
it’ll get there eventually and and you
it’ll get there eventually and and you know the result is the finally deployed
know the result is the finally deployed
know the result is the finally deployed model there we go finally with the new
model there we go finally with the new
model there we go finally with the new data all without touching the model but
data all without touching the model but
data all without touching the model but still going through the pipeline to
still going through the pipeline to
still going through the pipeline to guarantee that not only our model is
guarantee that not only our model is
guarantee that not only our model is valid but the data makes sense and when
valid but the data makes sense and when
valid but the data makes sense and when we throw different data and new data at
we throw different data and new data at
we throw different data and new data at it it still makes sense so you can
it it still makes sense so you can
it it still makes sense so you can imagine trying to apply this to your own
imagine trying to apply this to your own
imagine trying to apply this to your own stuff that if you if you have got
stuff that if you if you have got
stuff that if you if you have got requirements for like accuracy
requirements for like accuracy
requirements for like accuracy requirements you could make that a hard
requirements you could make that a hard
requirements you could make that a hard and fast rule in your pipeline to fail
and fast rule in your pipeline to fail
and fast rule in your pipeline to fail when your model accuracy decreases to a
when your model accuracy decreases to a
when your model accuracy decreases to a certain point and and that’s it so I
certain point and and that’s it so I
certain point and and that’s it so I think that entire process was probably
think that entire process was probably
think that entire process was probably about an hour I sped it up into about 10
about an hour I sped it up into about 10
about an hour I sped it up into about 10 minutes well that’s probably just due to
minutes well that’s probably just due to
minutes well that’s probably just due to my poor software engineering more than
my poor software engineering more than
my poor software engineering more than anything if you’d like to take a look
anything if you’d like to take a look
anything if you’d like to take a look then just go to the link you can just
then just go to the link you can just
then just go to the link you can just have a look at the slides or come and
have a look at the slides or come and
have a look at the slides or come and see me and we’ll basically search for
see me and we’ll basically search for
see me and we’ll basically search for for window research and we’ll get it
for window research and we’ll get it
for window research and we’ll get it there so with that I’d like to say thank
there so with that I’d like to say thank
there so with that I’d like to say thank you very much just
you very much just
you very much just [Applause]
Be First to Comment