Cloud Native Computing
Amazon Web Services recently joined the Cloud Native Computing Foundation, and I’m representing AWS as the CNCF board member, with Arun Gupta from our open source team coordinating technical engagement with projects and working groups. To explain what this is all about, I think it’s useful to look at what we mean by “cloud native,” and where the term came from.
Back in 2009, I was working at Netflix, and the engineering teams were figuring out some new application architecture patterns we would need to migrate to AWS. Some of us had learned how to automate deployments at scale from time spent working at eBay, Yahoo, and Google. We also learned new ideas from Werner Vogels and the AWS team. The result was a new set of fundamental assumptions that we baked into our architecture. In 2010, we started talking publicly about our cloud migration, and in 2012 we got the bulk of the platform released as a set of open source projects, collectively known as NetflixOSS.
While we didn’t invent most of these patterns, the fact that we gathered them together into an architecture, implemented it at scale, talked about it in public, and shared the code was influential in helping define what are often referred to as cloud native architectures.
Cloud native architectures take full advantage of on-demand delivery, global deployment, elasticity, and higher-level services. They enable huge improvements in developer productivity, business agility, scalability, availability, utilization, and cost savings.
On-demand delivery, taking minutes instead of weeks, is often the first reason that people move to cloud, but it doesn’t just reduce the deployment time for a traditional application: it also enables a new cloud native pattern of ephemeral and immutable deployments. In the old deployment model, where it takes weeks to get a resource, you’re going to hang on to it, order extra capacity in advance, and be reluctant to give it back, so you’ll figure out how to update it in place. The cloud native pattern, instead, is to bake instances or build containers, deploy many identical copies just as long as they are needed, shut them down when you are done, and create new images each time the code changes. NetflixOSS pioneered these concepts by baking Amazon Machine Images (AMIs). Docker subsequently used it as a core element of the container deployment model.
Deploying applications that span multiple datacenters is a relatively rare and complex-to-implement pattern, but cloud native architectures treat multi-zone and multi-region deployments as the default. To work effectively in this model, developers should have a good understanding of distributed systems concepts; a discussion of the “CAP Theorem” became a common interview topic at Netflix. Despite huge improvements in technology, the speed of light is a fundamental limit, so network latency, and in particular cross-regional latencies, are always going to be a constraint.
Cloud native architectures are scalable. When I first presented about Netflix’s use of AWS in 2010, we were running front end applications on a few thousand AWS instances, supporting about 16 million customers in the USA. Nowadays, Netflix is fully migrated to AWS, has over 100 million global customers, and is running on over 100,000 instances. The implementation details have changed over the years, but the architectural patterns are the same.
Over time, components of cloud native architectures move from being experimental, through competing implementations, to being well-defined external services. We’ve seen this evolution with databases, data science pipelines, container schedulers, and monitoring tools. This is one place where the Cloud Native Compute Foundation acts as a filter and aggregator. The Technical Oversight Committee of the CNCF reviews projects, incubates them, and adopts projects as they move from the experimental phase to the competing implementation phase. For customers who are trying to track a fast-moving and confusing world, it’s helpful to regard CNCF as a brand endorsement, for a loose collection of interesting projects. It’s a loose collection, rather than a single, integrated cloud native architecture, so there’s no particular endorsement of any one project over another, for members of CNCF, or for users of projects.
The CNCF currently hosts ten projects, and is incubating many more: Kubernetes for container orchestration, Prometheus for monitoring, Open Tracing for application flow monitoring, Fluentd for logging, Linkerd for service mesh, gRPC for remote procedure calls, CoreDNS for service discovery, Containerd and Rkt for container runtimes, and CNI for container native networking.
From the AWS perspective, we are interested in several CNCF projects and working groups. AWS were founding members of the Containerd project; we are excited about participating in the Containerd community, and have lots of ideas around how we can help our customers have a better experience. Our forthcoming ECS Task Networking capabilities are written as a CNI plugin, and we expect CNI to be the basis for all container-based networking on AWS. In addition, a recent CNCF survey reports that 63 percent of respondents host Kubernetes on Amazon EC2, and Arun is blogging about his experiences with several Kubernetes on AWS installers, starting with Kops. We have plans for more Kubernetes blog posts and code contributions, and think there are opportunities to propose existing and future AWS open source projects to be incubated by CNCF.
The charter of the open source team we are continuing to build at AWS is to engage with open source projects, communities, and foundations, as well as to help guide and encourage more contributions from AWS engineering. AWS is already a member of The Linux Foundation, which hosts CNCF, and we look forward to working with old and new friends on the shared goal to create and drive the adoption of a new computing paradigm optimized for modern distributed systems.
Please follow @AWSOpen to keep up to date on open source at AWS.