Just over ten years ago, Tim Wagner walked down a chain-link fenced office corridor towards The Chop1. In his hand he held an Amazon 6-pager. Inside The Chop, already reading and annotating Wagner’s 6-pager, was Andy Jassy, then head of AWS and today CEO of the entire jungle. How stressed was Wagner feeling as he approached the door? This wasn’t a room named for charity and second chances.
Within The Chop’s walls, uncomforted by its Comfort Inn color scheme, Tim Wagner pitched. Andy Jassy was Jeff’s first ‘shadow’, having spent 18 months barely leaving the side of a man with a reputation for brilliance, micro-management, and scorn. By business magazine accounts, Jassy’s every superlative you can name. By the biographic facts, he’s a did-it-all-right Harvard grad from New York City’s upper-middle. And though his former reports will remark on his intensity, he was not likely to wither Wagner with a question of Bezos-like contempt — Are you lazy or just incompetent? Jassy was also (in one telling) the collaborative founder of AWS. In those days, all new services were conceived only with Jassy’s backing. Would Wagner’s 6-pager become a birth announcement or obituary?
Wagner’s golden moment
Wagner’s baby, AWS Lambda, emerged from the meeting with a green light — a new service was born. But what exactly had Tim pitched? Let’s situate things. Lambda was the headline release of November 2014, sandwiched between the major release of late-2013, Kinesis, and late-2015, Elastic Container Registry. Docker was two years old, and mega hot, but not yet the de facto industry standard. The idea that you’d fracture your monolith into many containerized microservices was hip and cool, but going beyond that to serverless functions was too much. So Wagner did not pitch a system to run your database or CRUD backend. “No one’s ever going to use this for video transcoding” asserted Wagner, prodded by a sales executive to further mark out the limits of the proposed service. He was wrong.
Wagner’s pitch was for an event-driven glue service. Reacting to events and performing a small amount of shuttling or transforming compute between other AWS services was its team-player role. It was originally sponsored by the AWS S3 team, born not to burn CPU cycles and print money in its own right, but instead to strengthen its peers and grow the overall AWS ecosystem.
Many will remember, and still see today, the indignation towards serverless, the scoffing that met anyone suggesting we might be done with servers. But Lambda’s 2014 announcement was mostly met with excitement and curiosity, not gatekeeping and derision. The service’s initial self-imposed limitations placed it safely and firmly in the ‘event-driven glue’ supporting role.
At launch, Lambda had the following restrictions:
- maximum 1 GiB RAM
- maximum 1 hyper-threaded vCPU core
- maximum 512 MiB filesystem space
- maximum 5 minute runtime limit
- only NodeJS runtime supported, and last, but certainly not least…
- maximum of 25 concurrent requests per account!
Lambda was from day dot a remarkable innovation in cloud computing, but obviously not ready to take-on and replace the traditional serverful compute provided by AWS EC2 VMs. In fact, Wagner was careful to avoid even suggesting the idea.
At the time, AWS offered EC2 instances with 32 vCPUs and almost 250GiB of RAM, and you could run them as long as you liked. Lambda was a pea-shooter by comparison, but really handy if you needed to reactively compute in response to changes in an S3 bucket. Over time, like the little engine that could, Lambda added more runtimes and lifted resource limits, its engineers satisfying a growing customer base’s requests with an I-think-I-can attitude, until the question could really be asked of the product’s true ambition.
It must not be named
“Don’t say serverless.”
That was a mental note Wagner carried with him to pitch-meetings and cross-team syncs. Lambda was sponsored by the S3 team and greenlit by Jassy. EC2, AWS’s largest product by revenue, was doing extremely well and had just picked up container support to ride the Docker DevOps hype train. The baby Lambda service was a team-player, and in no position to cannibalize its teammates.
Werner Vogels, CTO of Amazon, did not mention the word in his personal blog post announcing the innovative service. The Hackernews comment thread for the RE:Invent announcement does not mention it either — that thread is instead peppered with people thinking of using it to replace their task queues.
Though it was certainly a concept marinating in the mind of Wagner and some engineers, no servers vs. serverless architectural battle began. Who then popularized this devious term, who started the fight, and did they know what they were getting themselves into?
What is serverless, really?
When the definitional task is not thrown to the wolves marketing department, serverless products are generally accepted to have at least four key features:
- No management of server racks or VMs.
- No pre-provisioning of capacity: instead, fast auto-scaling in response to work offered.
- Usage-based pricing: service costs scale with actual work done (i.e usage), not with resources assigned to enable that work.
- Scale-to-zero: you don’t pay for idle compute capacity.
In the wild, some nominally serverless cloud offerings will drop one of these features, most often item four. For example, AWS Managed Streaming for Apache Kafka2 does not have scale-to-zero, charging a minimum over $500/month to run a serverless cluster. Not cool.
AWS should really know better. In the beginning (2006), when serverless didn’t even exist as a concept, it birthed the biggest and best of the serverless services. The service that should have all counting themselves as a serverless fan.
The king of serverless
AWS S3, the OG? It’s rarely thought of as part of the serverless clan, because it offers storage and not functions as-a-service, but it provides each of the four key features beautifully.
- Ever seen any S3 knob related to modifying the underlying servers or clusters? Of course you haven’t.
- Do you need to pre-provision a new bucket’s size before firehosing 1TB into it? No, you don’t.
- Do you pay for CPU cycles, EC2 VMs, or SSDs? No, you pay for bytes stored over time, because that makes most sense in a storage service.
- How much does it cost to store zero bytes in an S3 bucket? Nothing.
Store 1 byte or 1 zetabyte in S3, either way you enjoy all four features. It’s really a wonder of the modern world and 21st century infrastructure. In early 2022 AWS announced that S3 stores over 200 trillion objects for customers, and handles on average 100 million requests per second. Holy hell, imagine the amount of servers involved in handling that scale! Thankfully, that’s not our problem.
The peerless success of the S3 cloud service demonstrates that the four core features of serverless are hugely desirable to engineers and businesses. But serverless advocates must also grudgingly acknowledge this: S3 is not often thought of as a victory of the serverless paradigm for a reason. It is an application-specific service, doing one narrow thing very well. When an engineer hears serverless they think of general-purpose functions, they think about compute, not storage.
It is fair though to credit S3 as a triumph of serverless ideals; go to aws.amazon.com/serverless and see that S3 is there, listed first under the “datastores” section. But it is not by itself the fulfillment of today’s serverless vision. S3 did not follow any serverless vision. It did not need the term to exist, and it indeed did not. AWS S3 was released way back in 2006, the first service available on the nascent cloud provider. It defined itself and grew in the years before VMs migrated from on-premises to cloud.
No, Lambda is rightly thought of as the preeminent serverless offering because it does sometimes compete directly with cloud VM servers. The originators of serverless did need the term to exist because they were competing with serverful computing. They couldn’t point to S3 to get their point across. They needed a new marketing term. But AWS S3 isn’t called relationless storage. Taxis aren’t called carless transport. These originators of serverless, what were they thinking?
On the origin of the name
“We had no idea it was going to be this whole thing.” - Chad Arimura, Iron.io founder
Enter any internet thread where serverless is being acrimoniously discussed, and you’ll more often than not find the negativity directed not at serverless per se, but at its name.
- “Dumb name. If there’s no servers where does the code run then?”
- “This is server erasure!”
- “’serverless’ is perhaps the stupidest marketing buzzword developers have come up with.”
The peanut gallery is annoyed and surely someone is to blame. Who gave us this unpopular term? As best I can discern, it was Ken Elkabany, who in early 2010 was building a proto-serverless product and really needed CIOs to know he wasn’t selling them servers.
In 2009 Ken Elkabany and Aaron Staley founded PiCloud, one of the first serverless cloud platforms. Ahead of their time, they launched years before AWS Lambda and right as the most innovative IT organizations were moving off-prem and to cloud VM providers such as AWS and Rackspace. Ken would hold sales meetings with CIOs and CTOs who were only just coming to terms with EC2. He would try to pitch his platform as a different way to turn CapEx into OpEx, the cloud’s greatest promise. The executives weren’t getting it: “so I can rent servers from you?”
PiCloud needed a differentiating term, so Ken got thinking.
The best I came up with was serverless to unequivocally convey that we weren’t renting out servers like everybody else. I distinctly remember almost striking it out as the phrase was obnoxious to my technical side: of course there are servers underneath our abstraction, developers will roll their eyes! But from a marketing standpoint—we had no professional marketers—it seemed like an obvious choice. We went out with it in our first press release. - Ken Elkabany, Who coined the term serverless?
I spoke to Ken, and though he refused to claim the coinage, I can’t find evidence of an earlier usage. So it is to Ken that you can direct your ire. Don’t tell him I sent you.
A short spring season of serverless startups, by the sea shore
Ken’s PiCloud would go on to be killed in the market not by AWS Lambda, which it preceded, but by the most hyped developer tool of the 2010s: Docker. If you can remember or imagine back that far, there was a time when Docker did not exist and early containerization adopters like Ken and Aaron had to follow the Linux mailing list for the tinkering of a certain Serge and Stéphane who were building LXC. In Ken’s recollection, the 2013 release of Docker was immediately recognized as the beginning of the end for PiCloud. The company pivoted immediately from a Python-only custom runtime to being container-first, but didn’t catch hold of the whale’s flailing tail. By mid-2013, PiCloud was acqui-hired by Dropbox.
PiCloud was not the only early, pre-Lambda serverless platform. There are at least, and maybe only, two more: Iron.io and Sun Grid.
Iron.io was founded in 2011 by Chad Arimura and is remarkably still generally available. It started life as a true serverless platform written in Ruby and competing with Heroku cloud platform, which was newly acquired by Salesforce. Just a couple of years later, it joined PiCloud in also contending with Docker. Then a year later, in 2014, AWS Lambda was added to its problems. Public information on Iron.io’s growth is scant, but it can be assumed that it never gained a significant share in the market. Its blog tells a story of competing with and then conceding to the aforementioned larger and more popular products. While Iron.io does live on, its founders called time around 2017, joining Oracle.
Oracle would also acquire Sun Grid Engine, a 2000-2011 serverless compute offering so early that it’s dubious whether it should be called serverless at all. After all, this was pre-cloud. In order to enjoy a serverless compute product a company had to shell out the money for actual Sun Microsystems servers that they would then house and run ‘serverlessly’. Hmmm.
But Sun Grid Engine, Iron.io, and PiCloud were all genuinely pursuing the serverless goal of organizing compute around a unit other than the physical server or VM. Of course, each of these companies ran their customers’ code on servers, but the customer was higher up the abstraction ladder, in the clouds, contending with jobs, workers, or functions. At least in part, their new added level of abstraction was the problem for these startups.
As Ken found, their target customers were just beginning to wrestle with the idea of moving from on-premise data centers and their VMs to cloud VMs-over-API, a significant reimagining and rearchitecting of IT infrastructure. Remember, this was before the Docker craze — the hot new DevOps tool was Chef. To have customers leapfrog the new cloud VM model, these startups would have to provide compelling alternative methods of application distribution, configuration management, in-situ debug tooling (”where do I SSH?”), resource provisioning, and more. The first one of those concerns is meaty enough. Being startups of limited resources, PiCloud and Iron.io initially reduced the application distribution problem size significantly by focusing on single language runtimes. In PiCloud’s case, this was Python.
Building PiCloud’s platform for a single language cut down their platform engineering scope hugely, which increased overall quality in the single language developer experience they delivered. However, doing this also significantly reduces your addressable market, and while this might have been survivable, Docker exploded on the scene and captured the industry to such an extent that PiCloud and Iron.io found themselves pivoting and playing catch up in a rapidly changing cloud development landscape they didn’t at all control. Voyaging Americans chase after a great whale, and again, it does not end well.
Nearly all startups die, and so the cause of death for any particular startup is overdetermined. But in a post-mortem of these pre-Lambda serverless startups what seems most damaging is an inability to onboard a significant number of engineers and IT operators into a new and somewhat complicated cloud abstraction (serverless functions) before Docker came and dominated. It is telling that AWS Lambda released one year after Docker without container support— you uploaded .zip files —and didn’t add it for six years. The serverless vision of the startups was ambitious. They really did want engineers to code and architect applications in a dramatically new way. AWS Lambda, on the other hand, was, despite being 100x better resourced, ‘just’ an event-driven glue service, and thus shirked the fight with Docker.
In the earliest years of the cloud, there would be room only for one major evolution in developer practices, and since almost day one of the public Docker release, that space would be taken by containers and microservices, not serverless functions.
The wonderful microVMs
S3 is maybe the most successful serverless service ever, and it owes much of that success to its simplicity. The “blob”, in blob storage, is an undifferentiated sequence of bytes. S3 is ‘just’ a distributed string → string → blob
map; it is the GOAT CRUD service. But when an engineer thinks of serverless, they don’t think of blobs, they think of “functions”, of general compute.
Functions? What functions?
What are these “functions” in the serverless functions-as-a-service (FaaS) offerings? Despite more than a decade of usage, and the fact that multiple of the major cloud providers offer capital-F Functions as a cloud service, no clear technical definition or standardization exists.
The already overloaded mathematical and programming term was likely originally chosen because the interface exposed to customers involved implementing the body of a single program function which accepted a structured input event, executed once, and then returned a response to be handled by the service. Sure, an entire program was loaded, initialized, and then torn down, but because all that was done to run just a single function once, it seems sane to say that it’s function execution that the service is providing, not program execution or generalized compute.
But of course, from the service’s perspective, and at the operating system level, there’s no real tricks being played, or corner’s being cut. To run that single application function, in Python or Java or whatever, a process has to be started, inside a container, inside a x86 Linux VM, on a physical server running in a datacenter. Just like what would happen when running a server?
Yes! The serverful VM abstraction is standardized—mostly on account of the Linux operating system and the x86 instruction set. Server-centric application developers know what they’re dealing with: a (possibly containerized) process, running in a x86 VM with access to an IP address, some network bandwidth, a number of vCPU cores, RAM, disk, and possibly a GPU peripheral.
But you get all that with your serverless function too! That is the boldness, the promise, of serverless platform engineering. All the above is conjured up for your function execution on-demand, and when you no longer need it, it disappears away. This ephemerality, furious construction and destruction, has always been the engine of software creation, its escalation driving the cost of software development and maintenance down towards nothing.
We engineers don’t want to pre-provision anything. Who wants to pre-allocate a dedicated process pool before starting their browser, or mark out a region of memory before opening their IDE? No one. Give things to us cheap, quick, and easy. And send it to the garbage when you notice I’m done with it.
For all this serverless function creation and destruction to work at all, the creation must happen fast—ideally in much less than a second. Users won’t wait. If you scale-to-zero you also must also often start from zero. Each layer of the serverless function stack has to be specifically engineered to start quickly, and sub-second VM starts is the paradigm’s boldest feat yet.
Firecracker
Tim Wagner quit his role as founder and general manager of AWS Lambda, joining Coinbase in August 2018. Lambda was by this point a big success, and the Lambda engineering team had just launched into production, after two years in development, perhaps its biggest technical leap yet: AWS Firecracker.
The most encouraging recent developments in the serverless ecosystem have been the virtualization innovations of AWS Firecracker and Google GVisor. As outlined above, serverless functions require use of all the major server components you find in serverful computing, including VMs. But unlike serverful computing, serverless has scale-to-zero, which punishes providers, not customers, for idleness. Idleness eats into profit, so serverless systems must run extra hot with the rapid churn of running programs, and many more customers are packed together.
The economics and scale of serverless applications demand that workloads from multiple customers run on the same hardware with minimal overhead, while preserving strong security and performance isolation. - Firecracker paper
This same demand exists for EC2, but in taking on the cost of idleness serverless providers have produced an escalation of already existing cloud computing dynamics. Cloud platforms must provide, typically, the x86 Linux compute environment as quickly and ephemerally as is acceptable to customers and profitable for the business. Serverless isn’t acceptable or profitable if operated at EC2 speeds. To go faster and cheaper, AWS’s distinguished engineers developed a new Virtual Machine Monitor named Firecracker. Basically, Firecracker runs on a server and starts tiny microVMs on behalf of customers really fast — 125ms for a minimal VM, and with throughput of up-to 150 VMs per server per second. 🔥.
Firecracker (and GVisor) is a major step up in the serverless paradigm’s maturity. In the past, the first serverless startups used nascent LXC technology to provide container separation between customers but did not use a VM boundary, saving money and reducing latency but sacrificing security. Lambda originally used an EC2 VM boundary to separate customers but this sacrificed performance and bore added cost through reduced utilization. What got you here won’t get you there; neither tradeoff is acceptable anymore.
AWS Lambda has started more Firecracker VMs in the last four years than were started in the entire preceding history of hardware-assisted virtual machines (1972-2018), literally trillions of VMs. And this is just one piece of the stack which facilitates serverless functions. All over the stack the major cloud providers are pushing their systems engineers to retool cluster computing for performant, secure, and productive serverless execution.
Future plans
We’ve just passed AWS Lambda’s 10th birthday, and it has been 14 years since the first serverless startups. What lies ahead for the paradigm?
This last decade has seen the inception and maturation of serverless FaaS, but no serious attempt to replace EC2 as a general purpose computing paradigm. EC2 can run your web server, your message broker, your database. EC2 powers AWS S3, SQS, and RDS because it is general purpose. For all the strides made, Lambda cannot power those services.
Innovations at the container and VM loading layers have significantly improved cold-start performance for serverless functions, but the serverless computing paradigm still presents performance difficulties to users that serverless platforms need to remove. Cold-start remains the most significant complaint, and Lambda Snapstart is evidence that serverless system engineers are pushing into the programming language layer seeking decreased latency.
Application-level state management and networked data exchange are also important troubles. Serverless customers don’t have control over the placement of their code onto servers, and can’t expect that any two executing functions are running on the same machine. This increases the burden placed on serverless system engineers to improve performance of persistent state management and data-passing activities. Same-server levels of performance are essential to many kinds of applications.
For now, creating general serverless solutions to distributed state management is too challenging. Instead of general solutions, the biggest new successes in serverless will be application and domain specific. Simply counting the beloved services on aws.amazon.com/serverless shows this has long been a winning strategy: S3, EFS, DynamoDB, SQS, SNS, Lambda. When done well, engineers and businesses wrap both arms around application-specific serverless and don’t let go. In Neon DB we now have seen the advent of serverless Postgres, and AI/ML hype is driving creation of serverless vector databases and serverless GPU access. Underlying all this serverless startup creation is a belief that serverless features (e.g. scale-to-zero) produce market-leading cloud services, and that the cloud computing market is now so large and old that specific cloud applications and verticals can startup and grow to big valuations in the presence of AWS, GCP, and Azure.3
The original sin of the “serverless” name is that it seems to pose itself as a supplanter, not as an extender, an amplifier. Serverless will not supplant a single server; it will only wrap and interlink them. The best thing about serverless is that it demands more from computing infrastructure, and pushes the marginal cost of software development ever closer towards zero.
Serverless is the computing paradigm that reaches out and touches more servers than any other. This is not a paradox, just bad naming. For us engineers and our servers, less means more.
-
Yes, in 2014 this was apparently acceptable interior design. ↩
-
This service name needs scale-to-zero. ↩
-
Storm in the Stratosphere: how the cloud will be reshuffled ↩