Press "Enter" to skip to content

GOTO 2017 • Designing for the Serverless Age • Gojko Adzic


00:00:06[Music]

00:00:09okay I guess we’ll start thanks very

00:00:13much for deciding to be in this talk

00:00:16rather than something else I’m Goku and

00:00:19what I’ll be talking about is how do

00:00:22service deployments impact how we design

00:00:27and deploy our systems I think one of

00:00:30the really interesting things that is

00:00:32happening at the moment is this whole

00:00:36buzzword is exploding and lots of people

00:00:40are approaching it from a technical

00:00:41perspective where is it

00:00:44stateless is it functions is it platform

00:00:47as a service and as an industry we like

00:00:49dealing with technical stuff that’s what

00:00:51we do tend to do but I think that the

00:00:56whole serverless thing whatever you

00:00:58decide to call it is much much much more

00:01:00important to look from a financial

00:01:03perspective so the way we are

00:01:07approaching things in terms of designing

00:01:10and deploying applications things we

00:01:12take for granted today I have been built

00:01:15based on our experiences in the last 20

00:01:1930 years deploying systems and many of

00:01:24the constraints that exists in the last

00:01:2820 30 years no longer applied to

00:01:31deploying with a SS lambda which means

00:01:34that many things that we now take for

00:01:37granted as best practices just are

00:01:39solutions for constraints that no longer

00:01:41apply and that was really really

00:01:44interesting for me kind of when we

00:01:46started getting our head around lambda

00:01:48deployments and that’s kind of what I

00:01:49want to talk about what are the things

00:01:51that we now think are best practices but

00:01:54are actually just solutions for problems

00:01:56that are no longer applicable and kind

00:01:59of in terms of what I’m gonna talk about

00:02:03just as an example I develop a

00:02:08collaboration up that helps people do

00:02:12mind maps online and in February 2016

00:02:15we started migrating from Heroku to a SS

00:02:18lambda it

00:02:19took us about one year to move

00:02:21everything because we did gradually

00:02:23during that year we increased them now

00:02:28kind of decreased the hosting costs by

00:02:30about 1/2 and while at the same time

00:02:34adding a bunch of new services and our

00:02:36number of active users increased by 50%

00:02:38in the same period so kind of all in all

00:02:41my estimate is that we saved around 66%

00:02:45or 2/3 on our hosting costs and that’s

00:02:49really really interesting if you look at

00:02:51it from a perspective running a small

00:02:52business

00:02:54so after kind of I started publishing on

00:02:58this a guy called Robert Charlie got in

00:03:00touch with me he wanted to do a

00:03:01scientific paper on this and together we

00:03:04wrote a proper scientific paper Martin

00:03:06talked about science papers yesterday so

00:03:08you know all my professors at the

00:03:10university of finally gonna be proud of

00:03:12me and know that all the alcohol kind of

00:03:14was worth something at the end and you

00:03:17can kind of download this it’s it’s

00:03:19called the kind of economic an

00:03:22architectural impact of cephalus is

00:03:24horrible but that’s okay so you can get

00:03:26a lot more on the numbers that I talked

00:03:27about there but I realized as we were

00:03:29doing research for this that actually

00:03:31our results were not even that

00:03:33interesting because we talked to people

00:03:36that saved something like 99% on their

00:03:39hosting costs by moving away from other

00:03:42platforms to – lambda’

00:03:45Heroku is reasonably cost efficient

00:03:48anyway and moving or moving from older

00:03:51generations of cloud hosting or moving

00:03:53from on-premise hosting to lambda has an

00:03:55even bigger potential to kind of do

00:03:57stuff so the key thing there I I think

00:04:01that is really important to consider is

00:04:03kind of the way lambda is priced and in

00:04:06my experience is mostly with AWS lambda

00:04:08although pretty much all the other cloud

00:04:11providers are copying features and

00:04:13models now so the way a double is lambda

00:04:15is priced kind of fundamentally changes

00:04:17the incentives for deployments

00:04:20fundamentally changes the incentives for

00:04:22good architecture and I think Martin’s

00:04:24talked about yesterday about engineering

00:04:26and how that’s working within

00:04:27constraints

00:04:28I think cost is one of the major

00:04:31constraints we have to work in

00:04:33and the pricing model kind of

00:04:37fundamentally changes really important

00:04:38so kind of in terms of the pricing lamda

00:04:41prices stuff per request and per 100

00:04:46millisecond increments in a processor in

00:04:50a virtual memory pack so these two

00:04:53things are really really important to

00:04:55consider because they change how we pay

00:05:00for for kind of what we’re using and

00:05:03they start charging things not in terms

00:05:07of reserved capacity but in terms of

00:05:10actual usage on a hundred millisecond

00:05:12increments so it doesn’t matter whether

00:05:15you have five VMs 500 VMs if you run

00:05:18three boxes or 5000 boxes to process

00:05:22something all that matters is how many

00:05:23requests came in and how long they took

00:05:27to execute under what memory conditions

00:05:29so this whole buzzword of services is is

00:05:33horrible because of course there are

00:05:34servers out there and things like that

00:05:36but as somebody came up with a real nice

00:05:39definition on Twitter the other day is

00:05:40saying that kind of that the thing is

00:05:42serverless

00:05:43if you’re only paying for actual usage

00:05:45if you’re not paying for what you have

00:05:48to plan as a reserved capacity so kind

00:05:51of historically that’s not been like

00:05:53that historically good architecture

00:05:56optimized for reserved resources and I

00:06:01you know 20 years ago in my previous

00:06:04life I worked on trading platforms that

00:06:06were deployed on immortal Hardware I was

00:06:09never supposed to die the storage cost

00:06:11more than my house the processors were

00:06:14insanely expensive and everything was

00:06:16duplicated replicated replicated because

00:06:20it’s never ever ever supposed to die but

00:06:23once you have a machine like that you

00:06:26optimize for using what you’ve bought

00:06:27you kind of you bundle stuff onto it you

00:06:30put everything you can to run there and

00:06:33you’re very very careful not to exceed

00:06:35the capacity of that because if you do

00:06:37then you know adding a couple of more

00:06:39processors or adding a bit more storage

00:06:41requires

00:06:42an insane amount of cost so then 2006

00:06:47Amazon kind of came out with the idea

00:06:50that you can get a virtual machine

00:06:51running in about five or ten minutes

00:06:53which was completely insane at the time

00:06:57I worked for a big telecoms provider

00:06:58where it took them nine months to

00:07:01provision a virtual machine internally

00:07:03and you know now comes Amazon you can

00:07:07get the VM for about ten minutes and and

00:07:09that there was amazing but kind of it

00:07:11did not fundamentally change how we are

00:07:13thinking about deployments because you

00:07:16got five virtual machines you’re paying

00:07:18for five virtual machines so you’re

00:07:19gonna bundle everything into those five

00:07:21virtual machines and we teach people

00:07:24good software design like decoupling

00:07:26isolation and and all those brilliant

00:07:29software design practices and then

00:07:31because you have five virtual machines

00:07:33then you put your payment servers and

00:07:34your log and your monitoring system all

00:07:38on the five virtual machines where they

00:07:40start interacting with each other and we

00:07:43had this problem we’re kind of deploying

00:07:45lots of different payment services to

00:07:46reserve VMs one of the VMS did not clean

00:07:50up the temp space correctly now one of

00:07:54the payment services did not clean up

00:07:56after itself and all the time it filled

00:07:58up the temp space Linux starts really

00:08:00really misbehaving when you fill up the

00:08:01temp space so that kind of VM started

00:08:04going crazy all the payment services

00:08:06that machine went down although they

00:08:08were designed to be decoupled and

00:08:09isolated and everything and that when

00:08:11that machine went down there was a

00:08:12cascade all you know the remaining four

00:08:15machines started filling up that time

00:08:17space very very quickly and everything

00:08:19kind of imploded

00:08:20so um although you know he teach people

00:08:23about good design as decoupling we end

00:08:26up deploying stuff to save cash because

00:08:28we need to reserve capacity and the next

00:08:31generation of cloud deployments Google

00:08:34App Engine Heroku and things like that

00:08:35they moved a lot of the responsibilities

00:08:39over to the cloud providers like

00:08:41provisioning like monitoring and things

00:08:44like that but the whole thing remained

00:08:46you were paying for dinos on Heroku for

00:08:48example we our app was deployed on

00:08:50Heroku we

00:08:51we have about 20 or 30 different

00:08:53exporters in two different formats and

00:08:56if I’m going to run a primary and a

00:08:58failover for each of these exporters on

00:09:00separate isolated VMs plus for some

00:09:03exporters I need a lot more capacity and

00:09:05for others for some one in five or six

00:09:06VMs that means that for 30 exporters I

00:09:11need a hundred VMs to run on if I really

00:09:13want to make it reliable and isolated

00:09:17but some of those exporters like

00:09:19markdown I wrote for myself nobody else

00:09:21uses that some of those exporters like

00:09:25PDF I use it all the time so you know

00:09:27I’ll put the market on exporter on the

00:09:29same block of machines as the PDF I’m

00:09:32not going to create a separate block of

00:09:33machines to save money and you know of

00:09:35course we made a stupid mistake and some

00:09:38of those things started interfering with

00:09:39each other because they were on the same

00:09:40machine now technically we have a

00:09:43solution for this and we had a solution

00:09:44for this it’s containers it’s isolation

00:09:46it’s things like that but as a you know

00:09:48average company it’s very very difficult

00:09:50to dedicate the resources to manage

00:09:54everything there so what lambda does

00:09:56Islam that kind of provides that as a

00:09:57service so we start moving away from

00:10:00reserved capacity to utilize capacity it

00:10:04doesn’t matter how many VMs you need

00:10:06Amazon is going to scale this and

00:10:08recharge you for how much you’ve used

00:10:09and that’s kind of the request based

00:10:11pricing that’s really really interesting

00:10:13so what that means is that there’s no

00:10:15more financial incentive for me to put

00:10:17the markdown in the PDF exporter on the

00:10:19same VM even worse because they have

00:10:24different memory needs markdown doesn’t

00:10:26need any memory PDF needs a lot because

00:10:28it’s using go script and go script is a

00:10:30memory hog putting those two things on

00:10:32the same VM I will end up paying for a

00:10:34markdown for a high memory watermark

00:10:36that’s a lot more than I need if I

00:10:38separate them so the financial

00:10:39incentives here are actually for people

00:10:42to unbundle the apps in all these small

00:10:44isolated modules it’s really really

00:10:46interesting because said historically it

00:10:48was completely different so one of the

00:10:51things that I think kind of people

00:10:54should think about more and it’s not

00:10:56microservices it’s not kind of functions

00:10:58you can you can buzz for this any way

00:10:59you want but I think what we started

00:11:02thinking about is really

00:11:03decoupling different tasks that have

00:11:06different memory constraints different

00:11:07CPU constraints different deployment

00:11:10needs and thinking about well you know

00:11:13if it doesn’t matter how many VMs this

00:11:15runs on I don’t care about it being

00:11:18separately deployed and that’s very very

00:11:20liberating and it’s completely opposite

00:11:23of what we’ve been taught to do a lot of

00:11:26kind of many practices like Bluegreen

00:11:28deployments many practices like kind of

00:11:30concentrated monitoring and things like

00:11:31that they did no longer that problematic

00:11:35and that opens up some really

00:11:38interesting possibilities for example

00:11:40moving from Heroku to lambda we’ve kind

00:11:42of deleted a bunch of code that was –

00:11:45dealing with kind of you know figuring

00:11:47out what’s going on on a particular VM

00:11:49figuring out how these things interact

00:11:50with do they interact with each other so

00:11:52a lot of this infrastructure code is

00:11:54gone so the next thing that kind of I

00:11:58think historically happened and no

00:11:59longer applies is generally good

00:12:01architecture because of the whole idea

00:12:03of you know reserved stuff optimized for

00:12:07failovers and what that meant was really

00:12:11complex state management you wanted to

00:12:13have a warm failover machine or a couple

00:12:16of warm fill of machines that can take

00:12:18the load in you know very quickly if

00:12:21something fails so data needs to be

00:12:23replicated cache needs to be replicated

00:12:25data needs to be synchronized across

00:12:28replicated machines then this kind of

00:12:31cache invalidation policies people need

00:12:32to care about this there’s a lot of

00:12:34really complex code to do to make this

00:12:38work really really well but from a

00:12:41perspective of paying per request kind

00:12:45of failover machines are not doing

00:12:47anything so kind of unless the receiving

00:12:48request you’re not paying for them and

00:12:50then the big question becomes do you

00:12:52actually need to plan for failover like

00:12:54that and another really interesting

00:12:56constraint that I think Canova’s is

00:12:58changing with the whole lambda

00:12:59architecture is kind of the time to

00:13:02recover that was historically a big

00:13:06driver for a lot of architectural

00:13:07practices is no longer that important

00:13:10and that opens up some really

00:13:12interesting possibilities what I think

00:13:13is becoming a lot more important is the

00:13:15time to start

00:13:17so when we talk about the time to start

00:13:20I’ll just show you something very

00:13:22quickly let’s see if the gods of the

00:13:23Internet help us here so ok so if I do

00:13:37[Music]

00:13:39and I do constable let’s be nice

00:14:17so this is kind of a a web api that on

00:14:21any request to slash hello will reply

00:14:23with hi there it’s not that’s kind of

00:14:27complicated

00:14:28and if I’ve not made a stupid mistake we

00:14:30should be able to do something like so

00:14:43I’m using an open-source tool for

00:14:46deployments here that actually we’ve

00:14:48open sourced while building it for

00:14:51mindmap the first one that we created

00:14:53was something like 30 lines of

00:14:55JavaScript code and about 250 lines of

00:14:57shell scripts to deploy and then I

00:14:59realized that kind of the risk is no

00:15:01longer in the code the risk is in the

00:15:02deployment so we wanted to have properly

00:15:05unit tested deployment and configuration

00:15:06and and we kind of just packaged up all

00:15:09the scripts so this thing is now on on a

00:15:11URL on Amazon and if I grab this URL so

00:15:20I’ve created a slash hello so if I do

00:15:22curl hello I get to hi there

00:15:27so now this is a kind of thing that’s

00:15:29running on one VM but what’s really

00:15:32interesting about this is that if all of

00:15:36you start attacking this thing now it

00:15:38will perfectly handle the load if

00:15:39nobody’s using it they don’t have to pay

00:15:41for anything if we have a massive

00:15:44massive spike now but then kind of you

00:15:46know the spike goes out that’s perfectly

00:15:47fine as well it will scale up and scale

00:15:49down plus with this I get a bunch of

00:15:52things automatically like monitoring

00:15:54logging security provisioning all the

00:15:58stuff that kind of people want to deal

00:15:59with the operations so I’ve created this

00:16:01thing called go to Copenhagen so if I go

00:16:04to my lambdas now

00:16:12come on internet okay so we have a knob

00:16:20that’s not the right account

00:16:41goodgoodgood

00:16:42so let’s see Amazon four one three six

00:16:47so for now you’ve seen my code come on

00:16:54come on so if I go to lambda now I have

00:17:07no idea what they’ve changed here so

00:17:09this is the okay III you can see how

00:17:17often I use this anomaly use the command

00:17:19line D so what I want you to say is kind

00:17:23of with with this you get a bunch of

00:17:24operations things for free immediately

00:17:26and why is this suffering bustle and

00:17:29things that doesn’t matter

00:17:30let’s functions functions functions

00:17:35functions page to go to Copenhagen okay

00:17:42so here it is and then I have kind of

00:17:44lots of stuff around monitoring I have

00:17:47my kind of errors throttles I have logs

00:17:51in cloud watch I have pretty much

00:17:52everything I need from an Operations

00:17:54perspective already provided to me so

00:17:57there’s this whole buzzword kind of that

00:18:00lambda is moving from DevOps to no ops

00:18:02and Simon worldly who’s a researcher in

00:18:05the UK said kind of your company’s not

00:18:07done DevOps yet just don’t bother skip a

00:18:09whole generation of things and move to

00:18:13something like this and it’s easy

00:18:14fair enough it’s not completely killing

00:18:16DevOps but now I said you know you have

00:18:18all the operations stuff already I have

00:18:20a log I have kind of when it started

00:18:22when it ended how much memory it used

00:18:24and this is amazingly good for

00:18:27optimizing stuff if you look at kind of

00:18:30the typical architecture and and and

00:18:32stuff like that we normally had figuring

00:18:35out how much money a particular function

00:18:38cost you to run was almost impossible

00:18:39people are optimizing based on a gut

00:18:42feel they were optimizing whole

00:18:44applications now I know exactly for this

00:18:46particular task how much money I’m

00:18:47spending so I can decide do I need to

00:18:50kind of invest in optimizing that or not

00:18:52so that’s really really really

00:18:53interesting and kind of in terms of time

00:18:55to start here’s some here’s some

00:18:57empirical numbers that I got these are

00:19:00not confirmed by the AWS a Tablas

00:19:02doesn’t publish any numbers about this

00:19:03but these are kind of my numbers that we

00:19:06got generally so for a completely new

00:19:09instance of a deployed up so if you

00:19:13start hitting this stuff now and this

00:19:14instance that it’s running their content

00:19:16load with the JavaScript

00:19:18this lady takes less than one second to

00:19:21get the new instance if you want to do a

00:19:23new deployment a new version like I’ve

00:19:25just done it takes about three or four

00:19:26seconds to kind of set everything up so

00:19:29that’s the infrastructure stuff now

00:19:32what’s left is how long our app takes to

00:19:36connect to the database load up the

00:19:38cache load up the data and things that’s

00:19:39I think kind of unlike the previous

00:19:42architectures where we were optimizing

00:19:44for quick failover what we need to start

00:19:45thinking about is optimizing for quick

00:19:47start lazy loading everything kind of

00:19:50making sure that we don’t add a lot of

00:19:53this stuff because if we can then does

00:19:54this stop just kind of starts and dies

00:19:56on its own now there’s a whole buzzword

00:19:58here where people are talking about how

00:20:00we need to optimize we need to write

00:20:03this stuff in a stateless way because

00:20:08there’s a big confusion of buzzwords and

00:20:12some people talk about services function

00:20:15as a service than in a functional

00:20:16programming and functional programming

00:20:17is stateless and this you know people

00:20:19throw buzzwords around lambdas are not

00:20:22stateless don’t have to anybody tell you

00:20:23that lambdas are a stateful container

00:20:27just having no idea how many containers

00:20:29you’re running once the container starts

00:20:32you have no control over when is it

00:20:33going to stop and whether the next

00:20:35request from the same user is going to

00:20:37hit the same container or a completely

00:20:38different container so rather than

00:20:40developing for stateless what I think we

00:20:43should be thinking about this developing

00:20:44kind of for sure nothing designing for

00:20:47sure nothing when the VM load is

00:20:49perfectly fine to start caching things

00:20:51in that VM that are not user specific

00:20:53that’s how we save a lot of time on

00:20:56processing requests that’s how we save a

00:20:57lot of cash because you know that there

00:21:00will be a VM running it’s just that you

00:21:02mustn’t ever catch anything that is

00:21:04specific to that particular request the

00:21:06state of a user session is not there the

00:21:09state of a user session needs to be

00:21:10somewhere else and we’ll talk about

00:21:13quite an interesting option where to

00:21:15push the user state later but kind of

00:21:19it’s it’s not stateless people that have

00:21:22never kind of deployed any of this talk

00:21:24about lambdas being stateless they’re

00:21:26not and and I think you lose a lot of

00:21:28money by treating them like stateless so

00:21:30the next thing that I think kind of

00:21:32historically happened is it was really

00:21:35really difficult to replicate production

00:21:37it was you know people had a production

00:21:41system with expensive storage or even if

00:21:44it’s on the cloud with good VMs and then

00:21:46that cost a lot so you wouldn’t want to

00:21:49pay the same amount for your staging you

00:21:51didn’t want to pay the same amount for

00:21:52the testing so typically kind of

00:21:55performance tests unless you are Lmax

00:21:57would run on a kind of a smaller copy of

00:22:01data on a smaller copy of the production

00:22:04and things like that they were not

00:22:05really relevant they were not reliable

00:22:07and generally that made it really really

00:22:09difficult to claim anything about

00:22:12production usage and production

00:22:14performance based on the testing system

00:22:16now kind of I have a couple of friends

00:22:22from Australia and and they they claim

00:22:24that I don’t know if this is true or not

00:22:26the time that kind of Facebook is

00:22:27continuously broken in Australia because

00:22:30Facebook is really good with kind of

00:22:32running business experiments on all

00:22:33their users and kind of Australia’s the

00:22:36market that’s big enough to be

00:22:38statistically significant it’s English

00:22:40so they don’t have to waste a lot of

00:22:41time preparing software for that but

00:22:43Facebook doesn’t really care too much

00:22:45about Australians they’re laid-back

00:22:46people anyway so they can do experiments

00:22:49in Australia and then when they figure

00:22:51what works what doesn’t work they can

00:22:52translate that to the US now when you

00:22:55think about something like that that’s

00:22:57incredibly useful but up until five

00:23:00years ago or up until three years ago

00:23:01that was available to you if your Google

00:23:04or Facebook running experiments like

00:23:07that is you feel expensive you need a

00:23:09complete copy of production for

00:23:11something like that and you need the

00:23:14relevant copy of production you need to

00:23:16synchronize the data between these

00:23:17things what one of them kind of best

00:23:19examples of how important that is is a

00:23:22story called 40 shades of blue they’re

00:23:25absolutely loved because it’s one of

00:23:27those rare things where you can read

00:23:29both sides of the story online forty

00:23:32shades of blue led to kind of that the

00:23:34head of design at Google wanted to

00:23:37change the color of the links on the

00:23:39homepage for ads and the developers

00:23:42challenged him a bit

00:23:44and he wanted to change it to a

00:23:46particular blue collar overnight they

00:23:48ran 40 different colors of blue and

00:23:51measured how much people are clicking on

00:23:54that and then as a result of that

00:23:56depending whose side you read this guy

00:23:58either quitter was fired because the

00:24:02difference was something like 250

00:24:03million dollars between the current

00:24:06color and his color if you expand it to

00:24:08a whole whole year and 100% of the users

00:24:10so kind of things like that are

00:24:14incredibly important to test but for

00:24:18most people out there it was almost

00:24:20impossible to do it because production

00:24:22copy cost so much now when we started

00:24:24doing lambda I had this brilliant best

00:24:26idea in the world and I wanted to

00:24:29integrate wiki data graph of knowledge

00:24:33with our app so that when you press a

00:24:35question mark it automatically opens up

00:24:37related terms it was going to be amazing

00:24:38it was going to be brilliant it’s going

00:24:40to be fantastic and and my business

00:24:42partner said no that’s a shitty idea

00:24:43nobody’s going to use that let’s not

00:24:47waste time doing that so you know I kind

00:24:49of generally don’t like to ask for

00:24:52permission like to ask for forgiveness

00:24:53so I decided I’m going to do that anyway

00:24:55and he said I know you’re going to do it

00:24:56anyway do not touch the

00:24:58production code because if you put this

00:25:00in we’re going to have people you know

00:25:02screaming that we can’t support the

00:25:04right performance we’re gonna be bottom

00:25:05like here we’re gonna and I said okay

00:25:07and then I realized kind of with lambdas

00:25:10you don’t pay for stuff if you people

00:25:12don’t use them so I kind of spent two

00:25:15days knocked up a quick lambda version

00:25:18that was extended with this deployed a

00:25:20completely separate copy of our

00:25:23production and sent 20% of our users

00:25:26there and you know lo and behold I

00:25:29proved that it was a idea nobody

00:25:31wants to use it so we deleted that code

00:25:34with our disturbing uh production so but

00:25:39this is this is some you know something

00:25:41that traditionally we’d get into a fight

00:25:43who’s right who’s wrong guy would do it

00:25:45anyway and then that code would stay in

00:25:46the production and kind of you know cost

00:25:48us more to maintain it was amazing

00:25:50because we could run a quick experiment

00:25:51so because we’re paying for requests

00:25:53we’re not paying for reserve capacity it

00:25:56was exactly the same amount of money to

00:25:57send 80% of users to one version 20% to

00:26:00another or to send everybody to the same

00:26:03version or to have a version for

00:26:04Copenhagen or to have a version for

00:26:06go-to or to have a version for anything

00:26:08you want and actually kind of what this

00:26:10starts getting us to think about is

00:26:12moving away from thinking about

00:26:13production to thinking about multiple

00:26:15versions and multi versioning in lambda

00:26:17is incredibly well done I you know for

00:26:22my scenes like everybody else have done

00:26:24an Orion framework a logging framework

00:26:25and a data multi version in framework

00:26:27everybody’s done that when they young

00:26:28otherwise you don’t you know you don’t

00:26:30get to call yourself a developer and

00:26:32then you end up kind of suffering

00:26:34through maintaining that for 10

00:26:36years but that’s okay so a data multi

00:26:41versioning request multi versioning

00:26:42service multi versioning is really

00:26:43really difficult to do well at scale and

00:26:45Amazon probably have done it for their

00:26:47own needs and they’ve just exposed a T

00:26:49lambda so every lambda function gets a

00:26:51numerical version every time you deploy

00:26:53it and you can direct the calls for a

00:26:58particular function either to the last

00:27:01non-deployed version or to a particular

00:27:04numerical deployment or you can assign

00:27:06aliases to a numerical deployment like

00:27:08production testing staging so when we

00:27:10started doing this we started really

00:27:12kind of doing oh I have a production

00:27:14version a testing version and staging

00:27:15version and then we realized why I’ll

00:27:18just save a version to test this feature

00:27:19oh I’ll have a version to kind of you

00:27:21know when we’re doing something

00:27:22experimental and 5% of our users really

00:27:24really need this feature but it’s not

00:27:27ready for everybody else we can deploy a

00:27:29version for them and give it to them and

00:27:30test it on them early and it’s multi

00:27:33version is incredibly well done so I

00:27:35think this is a really interesting thing

00:27:36to start thinking about kind of how do

00:27:38we bundle our tasks because it has a

00:27:40massive impact on how we design how we

00:27:43deploy if we are going to design for a

00:27:45multi version universe where you know at

00:27:48the same time several versions of a

00:27:50wrapper running and different things are

00:27:51communicating with even I think the

00:27:52whole domain driven design concept of

00:27:54aggregates becomes incredibly more

00:27:56important because we want to make sure

00:27:59that kind of you know all the data for a

00:28:01particular version of this object

00:28:02travels together and because we are kind

00:28:06of don’t want to clog the communication

00:28:08things that we want to make sure

00:28:09aggregates are actually relatively

00:28:10minimal so that we can push them around

00:28:13and we can load them quickly but the

00:28:17movie to land has gotten us to thinking

00:28:20a lot more about what are our actual

00:28:21aggregates where the aggregate

00:28:23boundaries what needs to be in a

00:28:26particular version consistent with

00:28:28itself what can just kind of be

00:28:29different versions and and be okay so I

00:28:32think that’s a completely interesting

00:28:34kind of phenomenon that are not really

00:28:35seen in my code before that and because

00:28:38we’ve designed for multi versioning now

00:28:40up front as we’re migrating we can do

00:28:42lots of crazy things and the stuff that

00:28:44was the traditional available to you

00:28:46know companies that make billions and

00:28:48billions of dollars we can do now and

00:28:50we’re a two-person team and that that’s

00:28:53I think amazing as a kind of for what we

00:28:56can do from the platform so the next

00:29:00really interesting thing driven by the

00:29:03pricing model of AWS lambda is that

00:29:06different services charge for different

00:29:08things so you can save quite a lot of

00:29:10money by playing arbitrage available yes

00:29:13against AWS that’s amazing

00:29:15and remember kind of lambda is as a

00:29:18processing service charges for the

00:29:20number of requests in time so if you can

00:29:23delegate work from lambda to stuff that

00:29:26does not charge for the number of

00:29:28requests in time you can save a lot of

00:29:32cash for example kind of API gateway

00:29:35charges for the bytes being transferred

00:29:38the number of requests at the same time

00:29:41kind of s3 that’s the storage system

00:29:43just charges for transfer it doesn’t

00:29:45care about the number of requests so one

00:29:49example of that is that we own Heroku

00:29:52and before we started really thinking

00:29:53about this we would let people upload

00:29:58files for export to a server where the

00:30:02server would communicate with the

00:30:04storage and then the server would save

00:30:07stuff through storage it with rondo

00:30:08converter it would kind of upload it

00:30:10back from the storage to the user what

00:30:12that means is that during that whole

00:30:13time the server is busy we’re paying for

00:30:16the server now if you’re uploading a

00:30:19hundred megabyte file the transfer from

00:30:21you to

00:30:22Amazon and the transfer for Amazon to

00:30:25you is actually the most amount of time

00:30:26the conversion time is relatively quick

00:30:28so we’re paying kind of for this

00:30:30operation for a long time we’re because

00:30:33there’s three only pays s3 only charges

00:30:35for transfer if we can get people to

00:30:36upload directly to s3 and download

00:30:38directly from s3 we have reduced our

00:30:42server costs by a significant amount so

00:30:45some other service that are interesting

00:30:47to consider like kognito the Amazon

00:30:49authentication and session service only

00:30:50charges for the number of users it

00:30:53doesn’t charge for the number of

00:30:54requests those users make it doesn’t

00:30:55charge for the capacity of the sessions

00:30:58it only charges for the number of users

00:31:00so moving session state into kognito

00:31:02it’s a really really interesting way of

00:31:05kind of playing arbitrage with Amazon

00:31:06versus Amazon so kind of as a kind of an

00:31:11example I’ll show you later there’s also

00:31:12this thing called the IOT gateway that

00:31:14we abuse massively IOT gateway is

00:31:17designed to get low-power devices to

00:31:19talk to each other but we’ve built kind

00:31:21of real-time collaboration directly

00:31:23through we’re building real-time

00:31:25collaboration directly through IOT

00:31:26gateway which kind of because it only

00:31:28charges for the number of messages it

00:31:30doesn’t charge for processing time it

00:31:32doesn’t charge for data transfer again

00:31:34you can arbitrage things nicely so kind

00:31:36of we started moving a lot more for

00:31:40thinking about applications and and kind

00:31:43of managed apps like Heroku like Google

00:31:46App Engine to really the glue between

00:31:47different platform services for me

00:31:50lambda is mostly about what’s missing

00:31:53from amazon’s platform and how do i kind

00:31:55of glue those things things together and

00:31:57what’s the kind of what’s the real

00:31:59business value of my code because it’s

00:32:02unlikely that developing a queue

00:32:04interface system is where i can provide

00:32:06the most value i can provide value

00:32:08developing kind of the small bits of

00:32:10processing those queue messages and

00:32:12that’s really really interesting so kind

00:32:14of just as an example that this is kind

00:32:17of how to get people to use s3 directly

00:32:21from a browser so on a server we have

00:32:23something like this the s3 is the Amazon

00:32:26API SDK so you get a request for upload

00:32:30file you populated with you know where

00:32:32it needs to go you limit the file size

00:32:34you

00:32:35kind of provides security stuff and then

00:32:37you get a signature and then returned

00:32:39that to the browser that takes you know

00:32:4120 milliseconds then the browser spends

00:32:44ten minutes uploading the big file to s3

00:32:46where you’re just paying for transfer

00:32:48then kind of this thing goes very

00:32:50quickly you go back and people download

00:32:53it directly from s3 from a signed URL so

00:32:56amazon being amazon of course there’s

00:32:58this five different ways of authorizing

00:33:00requests

00:33:01there’s signed urls there’s the cig v4

00:33:05that they do for kind of approaches this

00:33:06is giving people access to kognito so

00:33:09this lots and lots of ways how you can

00:33:11get people to access directly one of

00:33:13these front-end services without going

00:33:16through a traditional server so kind of

00:33:19in to really benefit from that

00:33:21financially I think what we started

00:33:23thinking about a lot more is give the

00:33:25platform the roles that were

00:33:26traditionally associate with the server

00:33:28process things like being the Gateway

00:33:31keeper things like being the

00:33:32orchestrator things like keeping kind of

00:33:34sensation storage if you push that away

00:33:38from the lambda that’s kind of the

00:33:41processor to the other parts of the

00:33:43platform you can save a lot of cash now

00:33:46here’s why I think this is so insane in

00:33:54September our app had something like

00:33:57400,000 active users it’s it’s not you

00:34:01know Google but it’s not Mickey Mouse as

00:34:03well and just so that I’m not cheating

00:34:06this is V live Amazon page so if I go to

00:34:14my billing dashboard and I look at my

00:34:19costs for where is it the September bill

00:34:24bill details September so for four

00:34:29hundred thousand active users in

00:34:30September we have paid 53 cents

00:34:38for lambda now beat that with your

00:34:41hosting costs and we you know they of

00:34:45course there’s there’s some other

00:34:47services like we paid four dollars for

00:34:49data transfer and then we paid something

00:34:52for API gateway and something for dynamo

00:34:54and things like that but all in all that

00:34:56the bill was a hundred bucks for a you

00:34:59know four hundred thousand active users

00:35:01that are kind of collaborating in real

00:35:03time now this is insane

00:35:05completely insane if you look at kind of

00:35:07stuff equivalent stuff I was doing ten

00:35:11years ago that this is completely

00:35:13completely insane and you know add up

00:35:17all the multi versioning and everything

00:35:20else they provide for almost not free

00:35:23but included in the price that’s that’s

00:35:26why I think this is gorgeous and and and

00:35:29you know fantastic in so many ways so

00:35:30kind of the another thing that started

00:35:35happening here as we started thinking

00:35:37about more and more of arbitrage in

00:35:39different services against each other we

00:35:40realized that kind of what engrained in

00:35:44my head for the last 30 years is do not

00:35:48trust users to talk to Becky and

00:35:50resources like users are not allowed to

00:35:53talk to storage directly users are never

00:35:57ever ever allowed to connect your

00:35:58database directly they have to go

00:36:01through a gatekeeper they have to go

00:36:03through a server because on the server

00:36:05we discard invalid requests we validate

00:36:08stuff we you know everything before the

00:36:10server we don’t trust everything after

00:36:12the server we trust that’s how we you

00:36:14know did the whole web logic thing

00:36:17evolved we have application servers we

00:36:19have storage we have clients and I think

00:36:23kind of especially if you start to use

00:36:25the platform on Amazon none of these

00:36:28things are actually physically back-end

00:36:30resources anymore s3 is available over

00:36:33HTTP dynamo that’s the databases

00:36:35available over HTTP the fact that we are

00:36:39not letting users talk to you directly

00:36:42doesn’t mean it’s not available if

00:36:44somebody kind of guesses the name

00:36:46it’s there Amazon is making it available

00:36:48and because of that kind of Amazon is

00:36:51actually implementing really really good

00:36:54request level authorization policies

00:36:56each single request going from lambda to

00:37:01a database is authorized because if they

00:37:03have no idea if your lambda is talking

00:37:05to the database if somebody else Islam

00:37:06days talking to derivative is and it’s

00:37:08not your database anywhere it’s their

00:37:09database and it’s kind of things like

00:37:14this are really really interesting from

00:37:15a perspective of thinking about well you

00:37:17know if it’s authorized per request

00:37:22what’s the damage in actually kind of

00:37:25authorizing clients to go there so

00:37:27Amazon gives you three or four different

00:37:29mechanisms for authorizing these

00:37:31requests including say including saying

00:37:33that this user is only allowed to write

00:37:35to this particular key in the database

00:37:37and only allowed to read from these keys

00:37:39in the database hierarchically or this

00:37:41user is only allowed to read from this

00:37:43part of the queue and post to these

00:37:45parts of the queue and the same

00:37:48authorization policies exactly the same

00:37:50apply as if you talk to from the client

00:37:52there then if you go through the server

00:37:54and then I realized well you know we’re

00:37:57just introducing latency by putting a

00:37:58server in the middle we’re just paying

00:38:00more and we started kind of using this

00:38:03like mad my brain still does not allow

00:38:07me to or let users connect to the

00:38:09database because I’ve been a server-side

00:38:12developer for 20 years and it’s just

00:38:14wrong

00:38:15but we’re letting people talk to the

00:38:18storage directly well I think people

00:38:19talk to the queues and and kind of

00:38:21things like so you know maybe two years

00:38:23from now I’ll I’ll come into the talk

00:38:24and say no no everybody’s you know

00:38:25talking to the database and all our data

00:38:27is stolen and it’s horrible or you know

00:38:30but generally if you think about it’s

00:38:32not your database it’s Amazon’s database

00:38:33and it’s available using HTTP so it’s

00:38:38not back-end it’s content or its

00:38:41middlewares of some kind so kind of we

00:38:44started moving away from kind of

00:38:46three-tier models to smart clients not

00:38:48not dumb terminal smart terminals where

00:38:51things connect directly so here’s a URL

00:38:55and this is a

00:38:58a prototype we’ve developed for a chat

00:39:00app that works on all browsers and works

00:39:04on all mobile devices and things like

00:39:06that and kind of it

00:39:09the source code is on github you can

00:39:11look it up afterwards so kind of connect

00:39:24to this from your mobile phone or

00:39:26something like that and then you can

00:39:27kind of log in as a guest what that

00:39:30means is that we are now getting a

00:39:32authorization ID from Cognito I can make

00:39:36this login with the username and

00:39:37password log in through Google login

00:39:39through Facebook log in through many

00:39:41many different ways do two-factor

00:39:42authentication for a conference kind of

00:39:44show it’s you know open and then once I

00:39:49have this I can actually use their API

00:39:51directly from a browser to talk to any

00:39:54resource I want that I’m allowed to talk

00:39:55to so in this case we are talking

00:39:57directly to if you cannot get into the

00:40:00server listing just pump it up we are

00:40:02talking to the IOT gateway the IOT

00:40:08gateway is designed to kind of exchange

00:40:09messages between low-power devices but

00:40:11it’s actually allowing you to have a

00:40:15WebSocket interface as well how cool is

00:40:17that so you can use WebSockets on demand

00:40:20managed that paid $5 per million

00:40:26messages so peanuts and they’re

00:40:30completely managed so you know we can

00:40:33get a million people connecting to this

00:40:35now or we can get five people connecting

00:40:38to this and it’s it’s all done

00:40:40operationally the source code is 30

00:40:42lines of code so it is just completely

00:40:45insane what we can do now with this

00:40:47stuff and how much it costs and I you

00:40:50know ten years ago I remember kind of

00:40:52evaluating lots of different push

00:40:53mechanisms where they were doing

00:40:55degrading of a flash they were doing

00:40:57long pole they were doing all these

00:40:59amazing things and I think the best

00:41:00thing we’ll choose they were asking for

00:41:02something like a hundred thousand quid a

00:41:04month I can get this now for five

00:41:08dollars on a million messages if

00:41:10nobody’s using it

00:41:11I don’t pay if people are using it and

00:41:14probably making money of it so you know

00:41:15perfectly fine to pay but there’s no

00:41:18upfront cost there’s no monthly

00:41:20maintenance cost this is insane

00:41:22it’s completely since I think kind of

00:41:24this thing changes how we approach what

00:41:27we reserve what we what we kind of do

00:41:30and I said you just format see a nice ah

00:41:36good good good good good

00:41:38so um so yeah I said that is you know

00:41:42from here you can just go to go to the

00:41:44Lincoln and you can get the the source

00:41:46code for this so kind of in that respect

00:41:50what I want to say is good engineering

00:41:55who the architecture is driven by

00:41:57constraints cost is one of the key

00:41:59constraints we have to deal with and

00:42:01deploying on lambda fundamentally

00:42:03changes the cost structure so lots of

00:42:06stuff that you know have evolved as best

00:42:08practices over the last 20 years kind of

00:42:12no longer apply and our challenge as a

00:42:15community over the next five 10 years is

00:42:16going to be to figure out what are you

00:42:19know just the shackles of the old world

00:42:21that we’re running with I mean in it you

00:42:23know I I can talk what’s wrong about

00:42:27lambda 4 you know days on end and this

00:42:29is not a silver bullet it doesn’t solve

00:42:31all the problems but I think it’s a

00:42:33really really interesting perspective if

00:42:35you can you know run a four hundred

00:42:38thousand uses and pay fifty three cents

00:42:40for the whole thing it’s just insane and

00:42:43I think kind of the financial incentives

00:42:45of that are going to get pretty much

00:42:47everything that can run in lambda to run

00:42:49in lambda over the next five years and

00:42:52that’s why you know although most people

00:42:55here I assume are not really deploying

00:42:57things in production for lambda yet this

00:42:59will comes to start investigating that

00:43:01and in particular running cheap

00:43:03experiments is amazing if you need to

00:43:06run a kind of cheap stupid experiment

00:43:08and you don’t know if it’s going to work

00:43:09out or not this is brilliant for that

00:43:10and then even if you do and kind of

00:43:12on-premise deployment later then you

00:43:14know what to integrate and and and what

00:43:16to throw away so kind of I think one of

00:43:21the key things that was you know a mind

00:43:24shift for us is

00:43:25is to start letting clients connect to

00:43:27back-end resources because there are no

00:43:28back-end resources anymore and what that

00:43:30means is that all of the sudden your app

00:43:32is not really running just on fifty or

00:43:36five or five hundred virtual machines

00:43:38it’s running on four hundred thousand

00:43:40client processes as well because you can

00:43:41push if you let people talk to Becky and

00:43:44resources you can push orchestration you

00:43:46can push state to the client and all the

00:43:51stuff that kind of is difficult to

00:43:52manage in the distributed architectures

00:43:54push it to the clients machine where the

00:43:56client is a single client having a

00:43:58single state which simplifies things

00:44:00significantly so instead of kind of that

00:44:03that’s why we pay so little for lambda

00:44:04our app does not run on the VMS what

00:44:07runs on the VMS is a glue between

00:44:08different back-end services our app

00:44:11actually runs on four hundred thousand

00:44:14client processes that we do not have to

00:44:15pay for and that’s kind of the really

00:44:19really interesting mind shift here so

00:44:21kind of as two URLs for for more info

00:44:25this first one is my blog where I post a

00:44:29lot about this stuff because I’m

00:44:30incredibly excited about it the second

00:44:33one is the open source tool for

00:44:34deployment that I’ve shown you that kind

00:44:36of simplify stuff if you do in

00:44:38JavaScript that’s pretty much it thank

00:44:41you very much I hope I kind of tickled

00:44:43your imagination at least of it

00:44:45and if anybody’s posted any questions I

00:44:47guess we can talk about that now do we

00:44:50have any questions where’s the I can

00:44:53read it loud okay lovely how do you feel

00:44:56about locking kind of that that’s a

00:45:00really interesting question and

00:45:02unlocking is is problematic on several

00:45:05levels lots of people talk about locking

00:45:08in terms of locking with libraries

00:45:10locking with code if you use Oracle then

00:45:12you know you use Oracle’s libraries with

00:45:14lambda because the platform calls you

00:45:16not the other way around you’re actually

00:45:18not locked into the lambda API at all

00:45:20there aren’t it’s it’s moving away from

00:45:22there would be trivial the big problem

00:45:25is you locked into the platform if you

00:45:27really want to get the benefits of

00:45:28lambda then you’re letting clients talk

00:45:29to storage directly I think lines talk

00:45:31to the database correctly and that’s

00:45:33where the locking happens now for us

00:45:36we’ve kind of we we’ve decide

00:45:38commercially that going really for

00:45:41Amazon and and using Amazon for

00:45:42everything is a good commercial decision

00:45:46it gives us the risk Obama’s are not

00:45:48working but in my experience kind of

00:45:49they’re pretty solid and they’re much

00:45:51much better doing the OP stand I can so

00:45:55I I know that there are some tools like

00:45:57the serverless framework that allow you

00:45:59to deploy to multiple clouds and do kind

00:46:02of this hybrid thing but I guess the big

00:46:04problem then like you know doing a

00:46:06database independent deployment is you

00:46:09get to use the the least common

00:46:10denominator of the whole thing and

00:46:11you’re never really using the platform

00:46:13what it is so kind of that’s a

00:46:16commercial decision that I guess

00:46:17everybody needs to make on their own but

00:46:19there’s definitely things like you know

00:46:20ports and adapters or hexagonal

00:46:22architecture and things that you can

00:46:24design stuff so if you do actually

00:46:26decide to move on to different cloud

00:46:27provider you know you will I I don’t

00:46:30I’ve never worked with a company where

00:46:32the whole investment in being able to

00:46:35move from the primary database paid off

00:46:37because they never moved away from the

00:46:39primary database so it’s a commercial

00:46:41decision there’s only two of us building

00:46:43this thing so I’d rather spend stuff

00:46:45delivering successful features than

00:46:47building kind of an abstract software

00:46:49system but you know if you have five

00:46:51kind of developers why not keep them

00:46:54busy so so the this disorder locked your

00:46:59kind of can I get Apple pay on your

00:47:01phone as well when you unlock it okay so

00:47:06we have how do you keep the state we

00:47:09decision so we tend to push a lot to the

00:47:11clients directly and and we tend to keep

00:47:13the state in either incognito or in the

00:47:15users browsers kognito does automatic

00:47:17synchronization across devices we don’t

00:47:21tend to use that a lot because our our

00:47:22state is typically kind of the document

00:47:24you’re working on and we don’t need to

00:47:28keep a lot of that but kognito is a

00:47:29pretty good way of kind of synchronizing

00:47:32stuff across devices using dynamo using

00:47:35something like that to keep kind of the

00:47:36state per user is also pretty good

00:47:38because you can configure dynamo to

00:47:41allow users to write only to a

00:47:43particular key so you ko or or a sub key

00:47:46so you can write only to your own state

00:47:48and read only from your own state for

00:47:50example that

00:47:51that would be a possibility mm-hmm if

00:47:58I’m if I’m P dose so uh III don’t know

00:48:02that’s never happened to us kind of

00:48:04lambda n API gateway allow you to

00:48:07throttle things you can just configure

00:48:08throttling so you can say that you know

00:48:11I want to run up to a thousand

00:48:13concurrent functions of this or up to I

00:48:16think the limit by default is a thousand

00:48:18per function but then you can increase

00:48:21it or with the API gate where you can do

00:48:24throttling based on an API key an

00:48:26authorization key or kind of generally

00:48:29on an API endpoint so you can configure

00:48:31throught link so you can with the

00:48:34monitoring they have and things like

00:48:35that spot if you are kind of being DDoS

00:48:37again my assumption is that Amazon will

00:48:42protect against DDoS much much better

00:48:44than I can code that I don’t know kind

00:48:47of about anybody in the audience whether

00:48:49you feel you can build a better DDoS

00:48:51defense system than Amazon but certainly

00:48:54you can configure it to be to get an

00:48:57early warning and then figure out what

00:49:00to do from there so you don’t have to

00:49:02spend millions and millions and millions

00:49:04if you get the dust at the moment the

00:49:12biggest disadvantage is that lambda

00:49:14functions are limited to about five

00:49:17minutes run

00:49:18so anything that takes longer than five

00:49:20minutes you need to split into multiple

00:49:22executions which means you can’t keep an

00:49:24open socket we were trying to develop

00:49:27something that talks to the Twitter API

00:49:28s and Twitter doesn’t have a push API or

00:49:33you know if you’re on mortal you cannot

00:49:36get the push API you need to connect the

00:49:38socket and get them to kind of stream

00:49:41stuff to you and with lambda that’s not

00:49:43I mean it’s possible but you need to

00:49:45kind of load it every five minutes and

00:49:46then disconnect in the render state so

00:49:48for something like that I would still

00:49:50use ECS another kind of reasonable

00:49:53disadvantage for many people is that

00:49:55there’s this virtually no SLA the lambda

00:49:58or not virtual is actually no isolation

00:50:02and that they don’t offer any date and

00:50:04they don’t of

00:50:04Neela’s yet so our experience is that

00:50:08kind of the u.s. East one the region

00:50:13that is overloaded with everything

00:50:14because that’s the first one that

00:50:15started occasionally gets kind of

00:50:17hiccups where we get a bit of delay but

00:50:21we’ve never really had a full outage

00:50:23since February 2016 when we started kind

00:50:28of moving to this that doesn’t it’s not

00:50:29going to happen and I think you know as

00:50:32Lomb that becomes more and more

00:50:33important I assume they will start

00:50:35providing a slice for it at some point

00:50:38so that’s an interesting limitation B I

00:50:40guess those would be those would be the

00:50:46two key key limitations for people so I

00:50:49think we ran out of time I don’t know if

00:50:51we have kind of I’ll be around you know

00:50:54you can pick me up in in in in the

00:50:56corridor and then we’ll talk about this

00:50:57thing more I need to get other people to

00:50:59set up thank you very much

00:51:00[Applause]