00:00:06[Music]
00:00:09okay I guess we’ll start thanks very
00:00:13much for deciding to be in this talk
00:00:16rather than something else I’m Goku and
00:00:19what I’ll be talking about is how do
00:00:22service deployments impact how we design
00:00:27and deploy our systems I think one of
00:00:30the really interesting things that is
00:00:32happening at the moment is this whole
00:00:36buzzword is exploding and lots of people
00:00:40are approaching it from a technical
00:00:41perspective where is it
00:00:44stateless is it functions is it platform
00:00:47as a service and as an industry we like
00:00:49dealing with technical stuff that’s what
00:00:51we do tend to do but I think that the
00:00:56whole serverless thing whatever you
00:00:58decide to call it is much much much more
00:01:00important to look from a financial
00:01:03perspective so the way we are
00:01:07approaching things in terms of designing
00:01:10and deploying applications things we
00:01:12take for granted today I have been built
00:01:15based on our experiences in the last 20
00:01:1930 years deploying systems and many of
00:01:24the constraints that exists in the last
00:01:2820 30 years no longer applied to
00:01:31deploying with a SS lambda which means
00:01:34that many things that we now take for
00:01:37granted as best practices just are
00:01:39solutions for constraints that no longer
00:01:41apply and that was really really
00:01:44interesting for me kind of when we
00:01:46started getting our head around lambda
00:01:48deployments and that’s kind of what I
00:01:49want to talk about what are the things
00:01:51that we now think are best practices but
00:01:54are actually just solutions for problems
00:01:56that are no longer applicable and kind
00:01:59of in terms of what I’m gonna talk about
00:02:03just as an example I develop a
00:02:08collaboration up that helps people do
00:02:12mind maps online and in February 2016
00:02:15we started migrating from Heroku to a SS
00:02:18lambda it
00:02:19took us about one year to move
00:02:21everything because we did gradually
00:02:23during that year we increased them now
00:02:28kind of decreased the hosting costs by
00:02:30about 1/2 and while at the same time
00:02:34adding a bunch of new services and our
00:02:36number of active users increased by 50%
00:02:38in the same period so kind of all in all
00:02:41my estimate is that we saved around 66%
00:02:45or 2/3 on our hosting costs and that’s
00:02:49really really interesting if you look at
00:02:51it from a perspective running a small
00:02:52business
00:02:54so after kind of I started publishing on
00:02:58this a guy called Robert Charlie got in
00:03:00touch with me he wanted to do a
00:03:01scientific paper on this and together we
00:03:04wrote a proper scientific paper Martin
00:03:06talked about science papers yesterday so
00:03:08you know all my professors at the
00:03:10university of finally gonna be proud of
00:03:12me and know that all the alcohol kind of
00:03:14was worth something at the end and you
00:03:17can kind of download this it’s it’s
00:03:19called the kind of economic an
00:03:22architectural impact of cephalus is
00:03:24horrible but that’s okay so you can get
00:03:26a lot more on the numbers that I talked
00:03:27about there but I realized as we were
00:03:29doing research for this that actually
00:03:31our results were not even that
00:03:33interesting because we talked to people
00:03:36that saved something like 99% on their
00:03:39hosting costs by moving away from other
00:03:42platforms to – lambda’
00:03:45Heroku is reasonably cost efficient
00:03:48anyway and moving or moving from older
00:03:51generations of cloud hosting or moving
00:03:53from on-premise hosting to lambda has an
00:03:55even bigger potential to kind of do
00:03:57stuff so the key thing there I I think
00:04:01that is really important to consider is
00:04:03kind of the way lambda is priced and in
00:04:06my experience is mostly with AWS lambda
00:04:08although pretty much all the other cloud
00:04:11providers are copying features and
00:04:13models now so the way a double is lambda
00:04:15is priced kind of fundamentally changes
00:04:17the incentives for deployments
00:04:20fundamentally changes the incentives for
00:04:22good architecture and I think Martin’s
00:04:24talked about yesterday about engineering
00:04:26and how that’s working within
00:04:27constraints
00:04:28I think cost is one of the major
00:04:31constraints we have to work in
00:04:33and the pricing model kind of
00:04:37fundamentally changes really important
00:04:38so kind of in terms of the pricing lamda
00:04:41prices stuff per request and per 100
00:04:46millisecond increments in a processor in
00:04:50a virtual memory pack so these two
00:04:53things are really really important to
00:04:55consider because they change how we pay
00:05:00for for kind of what we’re using and
00:05:03they start charging things not in terms
00:05:07of reserved capacity but in terms of
00:05:10actual usage on a hundred millisecond
00:05:12increments so it doesn’t matter whether
00:05:15you have five VMs 500 VMs if you run
00:05:18three boxes or 5000 boxes to process
00:05:22something all that matters is how many
00:05:23requests came in and how long they took
00:05:27to execute under what memory conditions
00:05:29so this whole buzzword of services is is
00:05:33horrible because of course there are
00:05:34servers out there and things like that
00:05:36but as somebody came up with a real nice
00:05:39definition on Twitter the other day is
00:05:40saying that kind of that the thing is
00:05:42serverless
00:05:43if you’re only paying for actual usage
00:05:45if you’re not paying for what you have
00:05:48to plan as a reserved capacity so kind
00:05:51of historically that’s not been like
00:05:53that historically good architecture
00:05:56optimized for reserved resources and I
00:06:01you know 20 years ago in my previous
00:06:04life I worked on trading platforms that
00:06:06were deployed on immortal Hardware I was
00:06:09never supposed to die the storage cost
00:06:11more than my house the processors were
00:06:14insanely expensive and everything was
00:06:16duplicated replicated replicated because
00:06:20it’s never ever ever supposed to die but
00:06:23once you have a machine like that you
00:06:26optimize for using what you’ve bought
00:06:27you kind of you bundle stuff onto it you
00:06:30put everything you can to run there and
00:06:33you’re very very careful not to exceed
00:06:35the capacity of that because if you do
00:06:37then you know adding a couple of more
00:06:39processors or adding a bit more storage
00:06:41requires
00:06:42an insane amount of cost so then 2006
00:06:47Amazon kind of came out with the idea
00:06:50that you can get a virtual machine
00:06:51running in about five or ten minutes
00:06:53which was completely insane at the time
00:06:57I worked for a big telecoms provider
00:06:58where it took them nine months to
00:07:01provision a virtual machine internally
00:07:03and you know now comes Amazon you can
00:07:07get the VM for about ten minutes and and
00:07:09that there was amazing but kind of it
00:07:11did not fundamentally change how we are
00:07:13thinking about deployments because you
00:07:16got five virtual machines you’re paying
00:07:18for five virtual machines so you’re
00:07:19gonna bundle everything into those five
00:07:21virtual machines and we teach people
00:07:24good software design like decoupling
00:07:26isolation and and all those brilliant
00:07:29software design practices and then
00:07:31because you have five virtual machines
00:07:33then you put your payment servers and
00:07:34your log and your monitoring system all
00:07:38on the five virtual machines where they
00:07:40start interacting with each other and we
00:07:43had this problem we’re kind of deploying
00:07:45lots of different payment services to
00:07:46reserve VMs one of the VMS did not clean
00:07:50up the temp space correctly now one of
00:07:54the payment services did not clean up
00:07:56after itself and all the time it filled
00:07:58up the temp space Linux starts really
00:08:00really misbehaving when you fill up the
00:08:01temp space so that kind of VM started
00:08:04going crazy all the payment services
00:08:06that machine went down although they
00:08:08were designed to be decoupled and
00:08:09isolated and everything and that when
00:08:11that machine went down there was a
00:08:12cascade all you know the remaining four
00:08:15machines started filling up that time
00:08:17space very very quickly and everything
00:08:19kind of imploded
00:08:20so um although you know he teach people
00:08:23about good design as decoupling we end
00:08:26up deploying stuff to save cash because
00:08:28we need to reserve capacity and the next
00:08:31generation of cloud deployments Google
00:08:34App Engine Heroku and things like that
00:08:35they moved a lot of the responsibilities
00:08:39over to the cloud providers like
00:08:41provisioning like monitoring and things
00:08:44like that but the whole thing remained
00:08:46you were paying for dinos on Heroku for
00:08:48example we our app was deployed on
00:08:50Heroku we
00:08:51we have about 20 or 30 different
00:08:53exporters in two different formats and
00:08:56if I’m going to run a primary and a
00:08:58failover for each of these exporters on
00:09:00separate isolated VMs plus for some
00:09:03exporters I need a lot more capacity and
00:09:05for others for some one in five or six
00:09:06VMs that means that for 30 exporters I
00:09:11need a hundred VMs to run on if I really
00:09:13want to make it reliable and isolated
00:09:17but some of those exporters like
00:09:19markdown I wrote for myself nobody else
00:09:21uses that some of those exporters like
00:09:25PDF I use it all the time so you know
00:09:27I’ll put the market on exporter on the
00:09:29same block of machines as the PDF I’m
00:09:32not going to create a separate block of
00:09:33machines to save money and you know of
00:09:35course we made a stupid mistake and some
00:09:38of those things started interfering with
00:09:39each other because they were on the same
00:09:40machine now technically we have a
00:09:43solution for this and we had a solution
00:09:44for this it’s containers it’s isolation
00:09:46it’s things like that but as a you know
00:09:48average company it’s very very difficult
00:09:50to dedicate the resources to manage
00:09:54everything there so what lambda does
00:09:56Islam that kind of provides that as a
00:09:57service so we start moving away from
00:10:00reserved capacity to utilize capacity it
00:10:04doesn’t matter how many VMs you need
00:10:06Amazon is going to scale this and
00:10:08recharge you for how much you’ve used
00:10:09and that’s kind of the request based
00:10:11pricing that’s really really interesting
00:10:13so what that means is that there’s no
00:10:15more financial incentive for me to put
00:10:17the markdown in the PDF exporter on the
00:10:19same VM even worse because they have
00:10:24different memory needs markdown doesn’t
00:10:26need any memory PDF needs a lot because
00:10:28it’s using go script and go script is a
00:10:30memory hog putting those two things on
00:10:32the same VM I will end up paying for a
00:10:34markdown for a high memory watermark
00:10:36that’s a lot more than I need if I
00:10:38separate them so the financial
00:10:39incentives here are actually for people
00:10:42to unbundle the apps in all these small
00:10:44isolated modules it’s really really
00:10:46interesting because said historically it
00:10:48was completely different so one of the
00:10:51things that I think kind of people
00:10:54should think about more and it’s not
00:10:56microservices it’s not kind of functions
00:10:58you can you can buzz for this any way
00:10:59you want but I think what we started
00:11:02thinking about is really
00:11:03decoupling different tasks that have
00:11:06different memory constraints different
00:11:07CPU constraints different deployment
00:11:10needs and thinking about well you know
00:11:13if it doesn’t matter how many VMs this
00:11:15runs on I don’t care about it being
00:11:18separately deployed and that’s very very
00:11:20liberating and it’s completely opposite
00:11:23of what we’ve been taught to do a lot of
00:11:26kind of many practices like Bluegreen
00:11:28deployments many practices like kind of
00:11:30concentrated monitoring and things like
00:11:31that they did no longer that problematic
00:11:35and that opens up some really
00:11:38interesting possibilities for example
00:11:40moving from Heroku to lambda we’ve kind
00:11:42of deleted a bunch of code that was –
00:11:45dealing with kind of you know figuring
00:11:47out what’s going on on a particular VM
00:11:49figuring out how these things interact
00:11:50with do they interact with each other so
00:11:52a lot of this infrastructure code is
00:11:54gone so the next thing that kind of I
00:11:58think historically happened and no
00:11:59longer applies is generally good
00:12:01architecture because of the whole idea
00:12:03of you know reserved stuff optimized for
00:12:07failovers and what that meant was really
00:12:11complex state management you wanted to
00:12:13have a warm failover machine or a couple
00:12:16of warm fill of machines that can take
00:12:18the load in you know very quickly if
00:12:21something fails so data needs to be
00:12:23replicated cache needs to be replicated
00:12:25data needs to be synchronized across
00:12:28replicated machines then this kind of
00:12:31cache invalidation policies people need
00:12:32to care about this there’s a lot of
00:12:34really complex code to do to make this
00:12:38work really really well but from a
00:12:41perspective of paying per request kind
00:12:45of failover machines are not doing
00:12:47anything so kind of unless the receiving
00:12:48request you’re not paying for them and
00:12:50then the big question becomes do you
00:12:52actually need to plan for failover like
00:12:54that and another really interesting
00:12:56constraint that I think Canova’s is
00:12:58changing with the whole lambda
00:12:59architecture is kind of the time to
00:13:02recover that was historically a big
00:13:06driver for a lot of architectural
00:13:07practices is no longer that important
00:13:10and that opens up some really
00:13:12interesting possibilities what I think
00:13:13is becoming a lot more important is the
00:13:15time to start
00:13:17so when we talk about the time to start
00:13:20I’ll just show you something very
00:13:22quickly let’s see if the gods of the
00:13:23Internet help us here so ok so if I do
00:13:37[Music]
00:13:39and I do constable let’s be nice
00:14:17so this is kind of a a web api that on
00:14:21any request to slash hello will reply
00:14:23with hi there it’s not that’s kind of
00:14:27complicated
00:14:28and if I’ve not made a stupid mistake we
00:14:30should be able to do something like so
00:14:43I’m using an open-source tool for
00:14:46deployments here that actually we’ve
00:14:48open sourced while building it for
00:14:51mindmap the first one that we created
00:14:53was something like 30 lines of
00:14:55JavaScript code and about 250 lines of
00:14:57shell scripts to deploy and then I
00:14:59realized that kind of the risk is no
00:15:01longer in the code the risk is in the
00:15:02deployment so we wanted to have properly
00:15:05unit tested deployment and configuration
00:15:06and and we kind of just packaged up all
00:15:09the scripts so this thing is now on on a
00:15:11URL on Amazon and if I grab this URL so
00:15:20I’ve created a slash hello so if I do
00:15:22curl hello I get to hi there
00:15:27so now this is a kind of thing that’s
00:15:29running on one VM but what’s really
00:15:32interesting about this is that if all of
00:15:36you start attacking this thing now it
00:15:38will perfectly handle the load if
00:15:39nobody’s using it they don’t have to pay
00:15:41for anything if we have a massive
00:15:44massive spike now but then kind of you
00:15:46know the spike goes out that’s perfectly
00:15:47fine as well it will scale up and scale
00:15:49down plus with this I get a bunch of
00:15:52things automatically like monitoring
00:15:54logging security provisioning all the
00:15:58stuff that kind of people want to deal
00:15:59with the operations so I’ve created this
00:16:01thing called go to Copenhagen so if I go
00:16:04to my lambdas now
00:16:12come on internet okay so we have a knob
00:16:20that’s not the right account
00:16:41goodgoodgood
00:16:42so let’s see Amazon four one three six
00:16:47so for now you’ve seen my code come on
00:16:54come on so if I go to lambda now I have
00:17:07no idea what they’ve changed here so
00:17:09this is the okay III you can see how
00:17:17often I use this anomaly use the command
00:17:19line D so what I want you to say is kind
00:17:23of with with this you get a bunch of
00:17:24operations things for free immediately
00:17:26and why is this suffering bustle and
00:17:29things that doesn’t matter
00:17:30let’s functions functions functions
00:17:35functions page to go to Copenhagen okay
00:17:42so here it is and then I have kind of
00:17:44lots of stuff around monitoring I have
00:17:47my kind of errors throttles I have logs
00:17:51in cloud watch I have pretty much
00:17:52everything I need from an Operations
00:17:54perspective already provided to me so
00:17:57there’s this whole buzzword kind of that
00:18:00lambda is moving from DevOps to no ops
00:18:02and Simon worldly who’s a researcher in
00:18:05the UK said kind of your company’s not
00:18:07done DevOps yet just don’t bother skip a
00:18:09whole generation of things and move to
00:18:13something like this and it’s easy
00:18:14fair enough it’s not completely killing
00:18:16DevOps but now I said you know you have
00:18:18all the operations stuff already I have
00:18:20a log I have kind of when it started
00:18:22when it ended how much memory it used
00:18:24and this is amazingly good for
00:18:27optimizing stuff if you look at kind of
00:18:30the typical architecture and and and
00:18:32stuff like that we normally had figuring
00:18:35out how much money a particular function
00:18:38cost you to run was almost impossible
00:18:39people are optimizing based on a gut
00:18:42feel they were optimizing whole
00:18:44applications now I know exactly for this
00:18:46particular task how much money I’m
00:18:47spending so I can decide do I need to
00:18:50kind of invest in optimizing that or not
00:18:52so that’s really really really
00:18:53interesting and kind of in terms of time
00:18:55to start here’s some here’s some
00:18:57empirical numbers that I got these are
00:19:00not confirmed by the AWS a Tablas
00:19:02doesn’t publish any numbers about this
00:19:03but these are kind of my numbers that we
00:19:06got generally so for a completely new
00:19:09instance of a deployed up so if you
00:19:13start hitting this stuff now and this
00:19:14instance that it’s running their content
00:19:16load with the JavaScript
00:19:18this lady takes less than one second to
00:19:21get the new instance if you want to do a
00:19:23new deployment a new version like I’ve
00:19:25just done it takes about three or four
00:19:26seconds to kind of set everything up so
00:19:29that’s the infrastructure stuff now
00:19:32what’s left is how long our app takes to
00:19:36connect to the database load up the
00:19:38cache load up the data and things that’s
00:19:39I think kind of unlike the previous
00:19:42architectures where we were optimizing
00:19:44for quick failover what we need to start
00:19:45thinking about is optimizing for quick
00:19:47start lazy loading everything kind of
00:19:50making sure that we don’t add a lot of
00:19:53this stuff because if we can then does
00:19:54this stop just kind of starts and dies
00:19:56on its own now there’s a whole buzzword
00:19:58here where people are talking about how
00:20:00we need to optimize we need to write
00:20:03this stuff in a stateless way because
00:20:08there’s a big confusion of buzzwords and
00:20:12some people talk about services function
00:20:15as a service than in a functional
00:20:16programming and functional programming
00:20:17is stateless and this you know people
00:20:19throw buzzwords around lambdas are not
00:20:22stateless don’t have to anybody tell you
00:20:23that lambdas are a stateful container
00:20:27just having no idea how many containers
00:20:29you’re running once the container starts
00:20:32you have no control over when is it
00:20:33going to stop and whether the next
00:20:35request from the same user is going to
00:20:37hit the same container or a completely
00:20:38different container so rather than
00:20:40developing for stateless what I think we
00:20:43should be thinking about this developing
00:20:44kind of for sure nothing designing for
00:20:47sure nothing when the VM load is
00:20:49perfectly fine to start caching things
00:20:51in that VM that are not user specific
00:20:53that’s how we save a lot of time on
00:20:56processing requests that’s how we save a
00:20:57lot of cash because you know that there
00:21:00will be a VM running it’s just that you
00:21:02mustn’t ever catch anything that is
00:21:04specific to that particular request the
00:21:06state of a user session is not there the
00:21:09state of a user session needs to be
00:21:10somewhere else and we’ll talk about
00:21:13quite an interesting option where to
00:21:15push the user state later but kind of
00:21:19it’s it’s not stateless people that have
00:21:22never kind of deployed any of this talk
00:21:24about lambdas being stateless they’re
00:21:26not and and I think you lose a lot of
00:21:28money by treating them like stateless so
00:21:30the next thing that I think kind of
00:21:32historically happened is it was really
00:21:35really difficult to replicate production
00:21:37it was you know people had a production
00:21:41system with expensive storage or even if
00:21:44it’s on the cloud with good VMs and then
00:21:46that cost a lot so you wouldn’t want to
00:21:49pay the same amount for your staging you
00:21:51didn’t want to pay the same amount for
00:21:52the testing so typically kind of
00:21:55performance tests unless you are Lmax
00:21:57would run on a kind of a smaller copy of
00:22:01data on a smaller copy of the production
00:22:04and things like that they were not
00:22:05really relevant they were not reliable
00:22:07and generally that made it really really
00:22:09difficult to claim anything about
00:22:12production usage and production
00:22:14performance based on the testing system
00:22:16now kind of I have a couple of friends
00:22:22from Australia and and they they claim
00:22:24that I don’t know if this is true or not
00:22:26the time that kind of Facebook is
00:22:27continuously broken in Australia because
00:22:30Facebook is really good with kind of
00:22:32running business experiments on all
00:22:33their users and kind of Australia’s the
00:22:36market that’s big enough to be
00:22:38statistically significant it’s English
00:22:40so they don’t have to waste a lot of
00:22:41time preparing software for that but
00:22:43Facebook doesn’t really care too much
00:22:45about Australians they’re laid-back
00:22:46people anyway so they can do experiments
00:22:49in Australia and then when they figure
00:22:51what works what doesn’t work they can
00:22:52translate that to the US now when you
00:22:55think about something like that that’s
00:22:57incredibly useful but up until five
00:23:00years ago or up until three years ago
00:23:01that was available to you if your Google
00:23:04or Facebook running experiments like
00:23:07that is you feel expensive you need a
00:23:09complete copy of production for
00:23:11something like that and you need the
00:23:14relevant copy of production you need to
00:23:16synchronize the data between these
00:23:17things what one of them kind of best
00:23:19examples of how important that is is a
00:23:22story called 40 shades of blue they’re
00:23:25absolutely loved because it’s one of
00:23:27those rare things where you can read
00:23:29both sides of the story online forty
00:23:32shades of blue led to kind of that the
00:23:34head of design at Google wanted to
00:23:37change the color of the links on the
00:23:39homepage for ads and the developers
00:23:42challenged him a bit
00:23:44and he wanted to change it to a
00:23:46particular blue collar overnight they
00:23:48ran 40 different colors of blue and
00:23:51measured how much people are clicking on
00:23:54that and then as a result of that
00:23:56depending whose side you read this guy
00:23:58either quitter was fired because the
00:24:02difference was something like 250
00:24:03million dollars between the current
00:24:06color and his color if you expand it to
00:24:08a whole whole year and 100% of the users
00:24:10so kind of things like that are
00:24:14incredibly important to test but for
00:24:18most people out there it was almost
00:24:20impossible to do it because production
00:24:22copy cost so much now when we started
00:24:24doing lambda I had this brilliant best
00:24:26idea in the world and I wanted to
00:24:29integrate wiki data graph of knowledge
00:24:33with our app so that when you press a
00:24:35question mark it automatically opens up
00:24:37related terms it was going to be amazing
00:24:38it was going to be brilliant it’s going
00:24:40to be fantastic and and my business
00:24:42partner said no that’s a shitty idea
00:24:43nobody’s going to use that let’s not
00:24:47waste time doing that so you know I kind
00:24:49of generally don’t like to ask for
00:24:52permission like to ask for forgiveness
00:24:53so I decided I’m going to do that anyway
00:24:55and he said I know you’re going to do it
00:24:56anyway do not touch the
00:24:58production code because if you put this
00:25:00in we’re going to have people you know
00:25:02screaming that we can’t support the
00:25:04right performance we’re gonna be bottom
00:25:05like here we’re gonna and I said okay
00:25:07and then I realized kind of with lambdas
00:25:10you don’t pay for stuff if you people
00:25:12don’t use them so I kind of spent two
00:25:15days knocked up a quick lambda version
00:25:18that was extended with this deployed a
00:25:20completely separate copy of our
00:25:23production and sent 20% of our users
00:25:26there and you know lo and behold I
00:25:29proved that it was a idea nobody
00:25:31wants to use it so we deleted that code
00:25:34with our disturbing uh production so but
00:25:39this is this is some you know something
00:25:41that traditionally we’d get into a fight
00:25:43who’s right who’s wrong guy would do it
00:25:45anyway and then that code would stay in
00:25:46the production and kind of you know cost
00:25:48us more to maintain it was amazing
00:25:50because we could run a quick experiment
00:25:51so because we’re paying for requests
00:25:53we’re not paying for reserve capacity it
00:25:56was exactly the same amount of money to
00:25:57send 80% of users to one version 20% to
00:26:00another or to send everybody to the same
00:26:03version or to have a version for
00:26:04Copenhagen or to have a version for
00:26:06go-to or to have a version for anything
00:26:08you want and actually kind of what this
00:26:10starts getting us to think about is
00:26:12moving away from thinking about
00:26:13production to thinking about multiple
00:26:15versions and multi versioning in lambda
00:26:17is incredibly well done I you know for
00:26:22my scenes like everybody else have done
00:26:24an Orion framework a logging framework
00:26:25and a data multi version in framework
00:26:27everybody’s done that when they young
00:26:28otherwise you don’t you know you don’t
00:26:30get to call yourself a developer and
00:26:32then you end up kind of suffering
00:26:34through maintaining that for 10
00:26:36years but that’s okay so a data multi
00:26:41versioning request multi versioning
00:26:42service multi versioning is really
00:26:43really difficult to do well at scale and
00:26:45Amazon probably have done it for their
00:26:47own needs and they’ve just exposed a T
00:26:49lambda so every lambda function gets a
00:26:51numerical version every time you deploy
00:26:53it and you can direct the calls for a
00:26:58particular function either to the last
00:27:01non-deployed version or to a particular
00:27:04numerical deployment or you can assign
00:27:06aliases to a numerical deployment like
00:27:08production testing staging so when we
00:27:10started doing this we started really
00:27:12kind of doing oh I have a production
00:27:14version a testing version and staging
00:27:15version and then we realized why I’ll
00:27:18just save a version to test this feature
00:27:19oh I’ll have a version to kind of you
00:27:21know when we’re doing something
00:27:22experimental and 5% of our users really
00:27:24really need this feature but it’s not
00:27:27ready for everybody else we can deploy a
00:27:29version for them and give it to them and
00:27:30test it on them early and it’s multi
00:27:33version is incredibly well done so I
00:27:35think this is a really interesting thing
00:27:36to start thinking about kind of how do
00:27:38we bundle our tasks because it has a
00:27:40massive impact on how we design how we
00:27:43deploy if we are going to design for a
00:27:45multi version universe where you know at
00:27:48the same time several versions of a
00:27:50wrapper running and different things are
00:27:51communicating with even I think the
00:27:52whole domain driven design concept of
00:27:54aggregates becomes incredibly more
00:27:56important because we want to make sure
00:27:59that kind of you know all the data for a
00:28:01particular version of this object
00:28:02travels together and because we are kind
00:28:06of don’t want to clog the communication
00:28:08things that we want to make sure
00:28:09aggregates are actually relatively
00:28:10minimal so that we can push them around
00:28:13and we can load them quickly but the
00:28:17movie to land has gotten us to thinking
00:28:20a lot more about what are our actual
00:28:21aggregates where the aggregate
00:28:23boundaries what needs to be in a
00:28:26particular version consistent with
00:28:28itself what can just kind of be
00:28:29different versions and and be okay so I
00:28:32think that’s a completely interesting
00:28:34kind of phenomenon that are not really
00:28:35seen in my code before that and because
00:28:38we’ve designed for multi versioning now
00:28:40up front as we’re migrating we can do
00:28:42lots of crazy things and the stuff that
00:28:44was the traditional available to you
00:28:46know companies that make billions and
00:28:48billions of dollars we can do now and
00:28:50we’re a two-person team and that that’s
00:28:53I think amazing as a kind of for what we
00:28:56can do from the platform so the next
00:29:00really interesting thing driven by the
00:29:03pricing model of AWS lambda is that
00:29:06different services charge for different
00:29:08things so you can save quite a lot of
00:29:10money by playing arbitrage available yes
00:29:13against AWS that’s amazing
00:29:15and remember kind of lambda is as a
00:29:18processing service charges for the
00:29:20number of requests in time so if you can
00:29:23delegate work from lambda to stuff that
00:29:26does not charge for the number of
00:29:28requests in time you can save a lot of
00:29:32cash for example kind of API gateway
00:29:35charges for the bytes being transferred
00:29:38the number of requests at the same time
00:29:41kind of s3 that’s the storage system
00:29:43just charges for transfer it doesn’t
00:29:45care about the number of requests so one
00:29:49example of that is that we own Heroku
00:29:52and before we started really thinking
00:29:53about this we would let people upload
00:29:58files for export to a server where the
00:30:02server would communicate with the
00:30:04storage and then the server would save
00:30:07stuff through storage it with rondo
00:30:08converter it would kind of upload it
00:30:10back from the storage to the user what
00:30:12that means is that during that whole
00:30:13time the server is busy we’re paying for
00:30:16the server now if you’re uploading a
00:30:19hundred megabyte file the transfer from
00:30:21you to
00:30:22Amazon and the transfer for Amazon to
00:30:25you is actually the most amount of time
00:30:26the conversion time is relatively quick
00:30:28so we’re paying kind of for this
00:30:30operation for a long time we’re because
00:30:33there’s three only pays s3 only charges
00:30:35for transfer if we can get people to
00:30:36upload directly to s3 and download
00:30:38directly from s3 we have reduced our
00:30:42server costs by a significant amount so
00:30:45some other service that are interesting
00:30:47to consider like kognito the Amazon
00:30:49authentication and session service only
00:30:50charges for the number of users it
00:30:53doesn’t charge for the number of
00:30:54requests those users make it doesn’t
00:30:55charge for the capacity of the sessions
00:30:58it only charges for the number of users
00:31:00so moving session state into kognito
00:31:02it’s a really really interesting way of
00:31:05kind of playing arbitrage with Amazon
00:31:06versus Amazon so kind of as a kind of an
00:31:11example I’ll show you later there’s also
00:31:12this thing called the IOT gateway that
00:31:14we abuse massively IOT gateway is
00:31:17designed to get low-power devices to
00:31:19talk to each other but we’ve built kind
00:31:21of real-time collaboration directly
00:31:23through we’re building real-time
00:31:25collaboration directly through IOT
00:31:26gateway which kind of because it only
00:31:28charges for the number of messages it
00:31:30doesn’t charge for processing time it
00:31:32doesn’t charge for data transfer again
00:31:34you can arbitrage things nicely so kind
00:31:36of we started moving a lot more for
00:31:40thinking about applications and and kind
00:31:43of managed apps like Heroku like Google
00:31:46App Engine to really the glue between
00:31:47different platform services for me
00:31:50lambda is mostly about what’s missing
00:31:53from amazon’s platform and how do i kind
00:31:55of glue those things things together and
00:31:57what’s the kind of what’s the real
00:31:59business value of my code because it’s
00:32:02unlikely that developing a queue
00:32:04interface system is where i can provide
00:32:06the most value i can provide value
00:32:08developing kind of the small bits of
00:32:10processing those queue messages and
00:32:12that’s really really interesting so kind
00:32:14of just as an example that this is kind
00:32:17of how to get people to use s3 directly
00:32:21from a browser so on a server we have
00:32:23something like this the s3 is the Amazon
00:32:26API SDK so you get a request for upload
00:32:30file you populated with you know where
00:32:32it needs to go you limit the file size
00:32:34you
00:32:35kind of provides security stuff and then
00:32:37you get a signature and then returned
00:32:39that to the browser that takes you know
00:32:4120 milliseconds then the browser spends
00:32:44ten minutes uploading the big file to s3
00:32:46where you’re just paying for transfer
00:32:48then kind of this thing goes very
00:32:50quickly you go back and people download
00:32:53it directly from s3 from a signed URL so
00:32:56amazon being amazon of course there’s
00:32:58this five different ways of authorizing
00:33:00requests
00:33:01there’s signed urls there’s the cig v4
00:33:05that they do for kind of approaches this
00:33:06is giving people access to kognito so
00:33:09this lots and lots of ways how you can
00:33:11get people to access directly one of
00:33:13these front-end services without going
00:33:16through a traditional server so kind of
00:33:19in to really benefit from that
00:33:21financially I think what we started
00:33:23thinking about a lot more is give the
00:33:25platform the roles that were
00:33:26traditionally associate with the server
00:33:28process things like being the Gateway
00:33:31keeper things like being the
00:33:32orchestrator things like keeping kind of
00:33:34sensation storage if you push that away
00:33:38from the lambda that’s kind of the
00:33:41processor to the other parts of the
00:33:43platform you can save a lot of cash now
00:33:46here’s why I think this is so insane in
00:33:54September our app had something like
00:33:57400,000 active users it’s it’s not you
00:34:01know Google but it’s not Mickey Mouse as
00:34:03well and just so that I’m not cheating
00:34:06this is V live Amazon page so if I go to
00:34:14my billing dashboard and I look at my
00:34:19costs for where is it the September bill
00:34:24bill details September so for four
00:34:29hundred thousand active users in
00:34:30September we have paid 53 cents
00:34:38for lambda now beat that with your
00:34:41hosting costs and we you know they of
00:34:45course there’s there’s some other
00:34:47services like we paid four dollars for
00:34:49data transfer and then we paid something
00:34:52for API gateway and something for dynamo
00:34:54and things like that but all in all that
00:34:56the bill was a hundred bucks for a you
00:34:59know four hundred thousand active users
00:35:01that are kind of collaborating in real
00:35:03time now this is insane
00:35:05completely insane if you look at kind of
00:35:07stuff equivalent stuff I was doing ten
00:35:11years ago that this is completely
00:35:13completely insane and you know add up
00:35:17all the multi versioning and everything
00:35:20else they provide for almost not free
00:35:23but included in the price that’s that’s
00:35:26why I think this is gorgeous and and and
00:35:29you know fantastic in so many ways so
00:35:30kind of the another thing that started
00:35:35happening here as we started thinking
00:35:37about more and more of arbitrage in
00:35:39different services against each other we
00:35:40realized that kind of what engrained in
00:35:44my head for the last 30 years is do not
00:35:48trust users to talk to Becky and
00:35:50resources like users are not allowed to
00:35:53talk to storage directly users are never
00:35:57ever ever allowed to connect your
00:35:58database directly they have to go
00:36:01through a gatekeeper they have to go
00:36:03through a server because on the server
00:36:05we discard invalid requests we validate
00:36:08stuff we you know everything before the
00:36:10server we don’t trust everything after
00:36:12the server we trust that’s how we you
00:36:14know did the whole web logic thing
00:36:17evolved we have application servers we
00:36:19have storage we have clients and I think
00:36:23kind of especially if you start to use
00:36:25the platform on Amazon none of these
00:36:28things are actually physically back-end
00:36:30resources anymore s3 is available over
00:36:33HTTP dynamo that’s the databases
00:36:35available over HTTP the fact that we are
00:36:39not letting users talk to you directly
00:36:42doesn’t mean it’s not available if
00:36:44somebody kind of guesses the name
00:36:46it’s there Amazon is making it available
00:36:48and because of that kind of Amazon is
00:36:51actually implementing really really good
00:36:54request level authorization policies
00:36:56each single request going from lambda to
00:37:01a database is authorized because if they
00:37:03have no idea if your lambda is talking
00:37:05to the database if somebody else Islam
00:37:06days talking to derivative is and it’s
00:37:08not your database anywhere it’s their
00:37:09database and it’s kind of things like
00:37:14this are really really interesting from
00:37:15a perspective of thinking about well you
00:37:17know if it’s authorized per request
00:37:22what’s the damage in actually kind of
00:37:25authorizing clients to go there so
00:37:27Amazon gives you three or four different
00:37:29mechanisms for authorizing these
00:37:31requests including say including saying
00:37:33that this user is only allowed to write
00:37:35to this particular key in the database
00:37:37and only allowed to read from these keys
00:37:39in the database hierarchically or this
00:37:41user is only allowed to read from this
00:37:43part of the queue and post to these
00:37:45parts of the queue and the same
00:37:48authorization policies exactly the same
00:37:50apply as if you talk to from the client
00:37:52there then if you go through the server
00:37:54and then I realized well you know we’re
00:37:57just introducing latency by putting a
00:37:58server in the middle we’re just paying
00:38:00more and we started kind of using this
00:38:03like mad my brain still does not allow
00:38:07me to or let users connect to the
00:38:09database because I’ve been a server-side
00:38:12developer for 20 years and it’s just
00:38:14wrong
00:38:15but we’re letting people talk to the
00:38:18storage directly well I think people
00:38:19talk to the queues and and kind of
00:38:21things like so you know maybe two years
00:38:23from now I’ll I’ll come into the talk
00:38:24and say no no everybody’s you know
00:38:25talking to the database and all our data
00:38:27is stolen and it’s horrible or you know
00:38:30but generally if you think about it’s
00:38:32not your database it’s Amazon’s database
00:38:33and it’s available using HTTP so it’s
00:38:38not back-end it’s content or its
00:38:41middlewares of some kind so kind of we
00:38:44started moving away from kind of
00:38:46three-tier models to smart clients not
00:38:48not dumb terminal smart terminals where
00:38:51things connect directly so here’s a URL
00:38:55and this is a
00:38:58a prototype we’ve developed for a chat
00:39:00app that works on all browsers and works
00:39:04on all mobile devices and things like
00:39:06that and kind of it
00:39:09the source code is on github you can
00:39:11look it up afterwards so kind of connect
00:39:24to this from your mobile phone or
00:39:26something like that and then you can
00:39:27kind of log in as a guest what that
00:39:30means is that we are now getting a
00:39:32authorization ID from Cognito I can make
00:39:36this login with the username and
00:39:37password log in through Google login
00:39:39through Facebook log in through many
00:39:41many different ways do two-factor
00:39:42authentication for a conference kind of
00:39:44show it’s you know open and then once I
00:39:49have this I can actually use their API
00:39:51directly from a browser to talk to any
00:39:54resource I want that I’m allowed to talk
00:39:55to so in this case we are talking
00:39:57directly to if you cannot get into the
00:40:00server listing just pump it up we are
00:40:02talking to the IOT gateway the IOT
00:40:08gateway is designed to kind of exchange
00:40:09messages between low-power devices but
00:40:11it’s actually allowing you to have a
00:40:15WebSocket interface as well how cool is
00:40:17that so you can use WebSockets on demand
00:40:20managed that paid $5 per million
00:40:26messages so peanuts and they’re
00:40:30completely managed so you know we can
00:40:33get a million people connecting to this
00:40:35now or we can get five people connecting
00:40:38to this and it’s it’s all done
00:40:40operationally the source code is 30
00:40:42lines of code so it is just completely
00:40:45insane what we can do now with this
00:40:47stuff and how much it costs and I you
00:40:50know ten years ago I remember kind of
00:40:52evaluating lots of different push
00:40:53mechanisms where they were doing
00:40:55degrading of a flash they were doing
00:40:57long pole they were doing all these
00:40:59amazing things and I think the best
00:41:00thing we’ll choose they were asking for
00:41:02something like a hundred thousand quid a
00:41:04month I can get this now for five
00:41:08dollars on a million messages if
00:41:10nobody’s using it
00:41:11I don’t pay if people are using it and
00:41:14probably making money of it so you know
00:41:15perfectly fine to pay but there’s no
00:41:18upfront cost there’s no monthly
00:41:20maintenance cost this is insane
00:41:22it’s completely since I think kind of
00:41:24this thing changes how we approach what
00:41:27we reserve what we what we kind of do
00:41:30and I said you just format see a nice ah
00:41:36good good good good good
00:41:38so um so yeah I said that is you know
00:41:42from here you can just go to go to the
00:41:44Lincoln and you can get the the source
00:41:46code for this so kind of in that respect
00:41:50what I want to say is good engineering
00:41:55who the architecture is driven by
00:41:57constraints cost is one of the key
00:41:59constraints we have to deal with and
00:42:01deploying on lambda fundamentally
00:42:03changes the cost structure so lots of
00:42:06stuff that you know have evolved as best
00:42:08practices over the last 20 years kind of
00:42:12no longer apply and our challenge as a
00:42:15community over the next five 10 years is
00:42:16going to be to figure out what are you
00:42:19know just the shackles of the old world
00:42:21that we’re running with I mean in it you
00:42:23know I I can talk what’s wrong about
00:42:27lambda 4 you know days on end and this
00:42:29is not a silver bullet it doesn’t solve
00:42:31all the problems but I think it’s a
00:42:33really really interesting perspective if
00:42:35you can you know run a four hundred
00:42:38thousand uses and pay fifty three cents
00:42:40for the whole thing it’s just insane and
00:42:43I think kind of the financial incentives
00:42:45of that are going to get pretty much
00:42:47everything that can run in lambda to run
00:42:49in lambda over the next five years and
00:42:52that’s why you know although most people
00:42:55here I assume are not really deploying
00:42:57things in production for lambda yet this
00:42:59will comes to start investigating that
00:43:01and in particular running cheap
00:43:03experiments is amazing if you need to
00:43:06run a kind of cheap stupid experiment
00:43:08and you don’t know if it’s going to work
00:43:09out or not this is brilliant for that
00:43:10and then even if you do and kind of
00:43:12on-premise deployment later then you
00:43:14know what to integrate and and and what
00:43:16to throw away so kind of I think one of
00:43:21the key things that was you know a mind
00:43:24shift for us is
00:43:25is to start letting clients connect to
00:43:27back-end resources because there are no
00:43:28back-end resources anymore and what that
00:43:30means is that all of the sudden your app
00:43:32is not really running just on fifty or
00:43:36five or five hundred virtual machines
00:43:38it’s running on four hundred thousand
00:43:40client processes as well because you can
00:43:41push if you let people talk to Becky and
00:43:44resources you can push orchestration you
00:43:46can push state to the client and all the
00:43:51stuff that kind of is difficult to
00:43:52manage in the distributed architectures
00:43:54push it to the clients machine where the
00:43:56client is a single client having a
00:43:58single state which simplifies things
00:44:00significantly so instead of kind of that
00:44:03that’s why we pay so little for lambda
00:44:04our app does not run on the VMS what
00:44:07runs on the VMS is a glue between
00:44:08different back-end services our app
00:44:11actually runs on four hundred thousand
00:44:14client processes that we do not have to
00:44:15pay for and that’s kind of the really
00:44:19really interesting mind shift here so
00:44:21kind of as two URLs for for more info
00:44:25this first one is my blog where I post a
00:44:29lot about this stuff because I’m
00:44:30incredibly excited about it the second
00:44:33one is the open source tool for
00:44:34deployment that I’ve shown you that kind
00:44:36of simplify stuff if you do in
00:44:38JavaScript that’s pretty much it thank
00:44:41you very much I hope I kind of tickled
00:44:43your imagination at least of it
00:44:45and if anybody’s posted any questions I
00:44:47guess we can talk about that now do we
00:44:50have any questions where’s the I can
00:44:53read it loud okay lovely how do you feel
00:44:56about locking kind of that that’s a
00:45:00really interesting question and
00:45:02unlocking is is problematic on several
00:45:05levels lots of people talk about locking
00:45:08in terms of locking with libraries
00:45:10locking with code if you use Oracle then
00:45:12you know you use Oracle’s libraries with
00:45:14lambda because the platform calls you
00:45:16not the other way around you’re actually
00:45:18not locked into the lambda API at all
00:45:20there aren’t it’s it’s moving away from
00:45:22there would be trivial the big problem
00:45:25is you locked into the platform if you
00:45:27really want to get the benefits of
00:45:28lambda then you’re letting clients talk
00:45:29to storage directly I think lines talk
00:45:31to the database correctly and that’s
00:45:33where the locking happens now for us
00:45:36we’ve kind of we we’ve decide
00:45:38commercially that going really for
00:45:41Amazon and and using Amazon for
00:45:42everything is a good commercial decision
00:45:46it gives us the risk Obama’s are not
00:45:48working but in my experience kind of
00:45:49they’re pretty solid and they’re much
00:45:51much better doing the OP stand I can so
00:45:55I I know that there are some tools like
00:45:57the serverless framework that allow you
00:45:59to deploy to multiple clouds and do kind
00:46:02of this hybrid thing but I guess the big
00:46:04problem then like you know doing a
00:46:06database independent deployment is you
00:46:09get to use the the least common
00:46:10denominator of the whole thing and
00:46:11you’re never really using the platform
00:46:13what it is so kind of that’s a
00:46:16commercial decision that I guess
00:46:17everybody needs to make on their own but
00:46:19there’s definitely things like you know
00:46:20ports and adapters or hexagonal
00:46:22architecture and things that you can
00:46:24design stuff so if you do actually
00:46:26decide to move on to different cloud
00:46:27provider you know you will I I don’t
00:46:30I’ve never worked with a company where
00:46:32the whole investment in being able to
00:46:35move from the primary database paid off
00:46:37because they never moved away from the
00:46:39primary database so it’s a commercial
00:46:41decision there’s only two of us building
00:46:43this thing so I’d rather spend stuff
00:46:45delivering successful features than
00:46:47building kind of an abstract software
00:46:49system but you know if you have five
00:46:51kind of developers why not keep them
00:46:54busy so so the this disorder locked your
00:46:59kind of can I get Apple pay on your
00:47:01phone as well when you unlock it okay so
00:47:06we have how do you keep the state we
00:47:09decision so we tend to push a lot to the
00:47:11clients directly and and we tend to keep
00:47:13the state in either incognito or in the
00:47:15users browsers kognito does automatic
00:47:17synchronization across devices we don’t
00:47:21tend to use that a lot because our our
00:47:22state is typically kind of the document
00:47:24you’re working on and we don’t need to
00:47:28keep a lot of that but kognito is a
00:47:29pretty good way of kind of synchronizing
00:47:32stuff across devices using dynamo using
00:47:35something like that to keep kind of the
00:47:36state per user is also pretty good
00:47:38because you can configure dynamo to
00:47:41allow users to write only to a
00:47:43particular key so you ko or or a sub key
00:47:46so you can write only to your own state
00:47:48and read only from your own state for
00:47:50example that
00:47:51that would be a possibility mm-hmm if
00:47:58I’m if I’m P dose so uh III don’t know
00:48:02that’s never happened to us kind of
00:48:04lambda n API gateway allow you to
00:48:07throttle things you can just configure
00:48:08throttling so you can say that you know
00:48:11I want to run up to a thousand
00:48:13concurrent functions of this or up to I
00:48:16think the limit by default is a thousand
00:48:18per function but then you can increase
00:48:21it or with the API gate where you can do
00:48:24throttling based on an API key an
00:48:26authorization key or kind of generally
00:48:29on an API endpoint so you can configure
00:48:31throught link so you can with the
00:48:34monitoring they have and things like
00:48:35that spot if you are kind of being DDoS
00:48:37again my assumption is that Amazon will
00:48:42protect against DDoS much much better
00:48:44than I can code that I don’t know kind
00:48:47of about anybody in the audience whether
00:48:49you feel you can build a better DDoS
00:48:51defense system than Amazon but certainly
00:48:54you can configure it to be to get an
00:48:57early warning and then figure out what
00:49:00to do from there so you don’t have to
00:49:02spend millions and millions and millions
00:49:04if you get the dust at the moment the
00:49:12biggest disadvantage is that lambda
00:49:14functions are limited to about five
00:49:17minutes run
00:49:18so anything that takes longer than five
00:49:20minutes you need to split into multiple
00:49:22executions which means you can’t keep an
00:49:24open socket we were trying to develop
00:49:27something that talks to the Twitter API
00:49:28s and Twitter doesn’t have a push API or
00:49:33you know if you’re on mortal you cannot
00:49:36get the push API you need to connect the
00:49:38socket and get them to kind of stream
00:49:41stuff to you and with lambda that’s not
00:49:43I mean it’s possible but you need to
00:49:45kind of load it every five minutes and
00:49:46then disconnect in the render state so
00:49:48for something like that I would still
00:49:50use ECS another kind of reasonable
00:49:53disadvantage for many people is that
00:49:55there’s this virtually no SLA the lambda
00:49:58or not virtual is actually no isolation
00:50:02and that they don’t offer any date and
00:50:04they don’t of
00:50:04Neela’s yet so our experience is that
00:50:08kind of the u.s. East one the region
00:50:13that is overloaded with everything
00:50:14because that’s the first one that
00:50:15started occasionally gets kind of
00:50:17hiccups where we get a bit of delay but
00:50:21we’ve never really had a full outage
00:50:23since February 2016 when we started kind
00:50:28of moving to this that doesn’t it’s not
00:50:29going to happen and I think you know as
00:50:32Lomb that becomes more and more
00:50:33important I assume they will start
00:50:35providing a slice for it at some point
00:50:38so that’s an interesting limitation B I
00:50:40guess those would be those would be the
00:50:46two key key limitations for people so I
00:50:49think we ran out of time I don’t know if
00:50:51we have kind of I’ll be around you know
00:50:54you can pick me up in in in in the
00:50:56corridor and then we’ll talk about this
00:50:57thing more I need to get other people to
00:50:59set up thank you very much
00:51:00[Applause]