At Infinity Works, many of us use AWS routinely, but we’re less experienced with Google Cloud Platform. So I decided to find out more and signed up for a one day Google OnBoard event recently. I wasn’t surprised to see a lot of similarity between AWS and GCP. But there were a number of important differences, too — and in many ways GCP looked better. Some of the differences were small conveniences, while others were much more fundamental.
What’s it like to use?
GCP’s web console works pretty much like that for AWS, but the UI is noticeably clearer and easier to navigate. And it comes with one simple but extremely useful feature: rather than having to switch to a terminal and set up ssh keys you can drop straight into an interactive ssh session within your browser. This sounds like a gimmick, but watching the demos during the OnBoard day I can see how it could easily become the default way of jumping onto a box to quickly check something and could make debugging while you get the infrastructure right much quicker. The CLI tools are also cleaner and again give an easier way to authenticate: instead of having to copy and paste API access and secret keys you have the option to leverage your existing browser-based session to authenticate from the command line.
So what does GCP do?
GCP offers a slew of services from the basics like compute, networking, storage and database to the more esoteric areas of big data and machine learning. Arguably the big data and machine learning areas are where GCP has the strongest distinct advantage over other providers, but these are complex topics and were only covered briefly in the one day introductory course. The area that interested me most was the bread and butter compute and networking capabilities, because these are still the foundation of many deployments, and it’s important that they work well. And this gives a good benchmark by which to compare the two, because it’s an area I’m familiar with in AWS.
GCP Compute Services
GCP offers a sliding scale of compute services, from IaaS to PaaS to serverless functions.
The most basic unit of cloud compute is the humble on-demand VM. In GCP these are given the name Google Compute Engine (GCE). With GCE you’re free to run what you want on the VMs, but of course you don’t get any help from Google with that. You’re responsible for everything from the OS up.
But these days we tend to run many of our services as containers. And Google offers an excellent abstraction layer on top of GCE in the form of Google Container Engine (GKE), which allows you to deploy containers into managed Kubernetes clusters (hence the ‘K’ in GKE) running on top of managed — but user-visible — CGE VMs. This is particularly appealing to me because Kubernetes is complex to set up, but once up and running provides a much richer environment than AWS’s rather basic Elastic Container Service (ECS).
Google offers two different flavours of App Engine, known as Flex and Standard.
Flex provides automated deployment and scaling for your own Docker images on top of managed GCE instances — a simpler alternative GKE.
But in some ways, Standard is the more interesting of the two. It packages up your application and deploys it direct into Google’s native container scheduler, Borg — of which more later. This gives some distinct advantages. The Standard flavour responds to demand by scaling up the number of instances of your app, and does so phenomenally quickly: the scaling is triggered in seconds, and new instances start in milliseconds. You are also fully isolated from responsibility for the underlying platform, which means you no longer have to worry about keeping up to date with security patches and the like: Google does all that for you.
And it comes with some neat features. Your app can scale down to zero running instances and still receive requests, with instances being spun up on demand in time to service those requests. This lets you run small and quiet services virtually for free. Automatic tunable splitting of traffic between old and new versions of the app supports scenarios like multivariate testing and canary deploys. And you can instantly spin up an instance of any previously deployed version alongside the latest version, serving on its own subdomain for quick side-by-side comparison.
Of course, this all come with a cost. App Engine Standard only supports a limited set of languages at specific versions. And apps run in a sandbox which restricts certain actions such as direct file access, certain system calls and even allowable module imports. But if you can live with these limitations — and for many small targeted services this will be easy — then it’s definitely worth a closer look.
For those familiar with AWS, App Engine is analogous to Elastic Beanstalk, but it is built in quite a different way: whereas EB is a somewhat cobbled together wrapper around existing services, App Engine is a genuine abstraction layer and feels a lot cleaner.
Basic IaaS terminology
For those familiar with AWS, here is a comparison of the terms used in GCP and AWS.
- Account == AWS Account
- Project (no AWS equivalent): a wrapper around a set of resources
- Region == AWS Region
- Zone == AWS Availability Zone
- VPC ~ AWS VPC, except GCP VPCs are global, whereas an AWS VPC exists within a single region
- Subnet ~ AWS Subnet, except In GCP subnets can span zones, whereas an AWS subnet exists within a single zone
Also worth mentioning is that in GCP, as well as internal load balancers which operate within a single region (as AWS ELB/ALBs do), you may also have global load balancers, which balance traffic between regions at the packet level.
The global nature of GCP VPCs and the ease of global load balancing together make it much easier to configure systems which span regions for greater resilience.
Internet and NAT gateways
Networking is a key part of setting up a cloud IaaS system, and there are some subtle differences between how networking works between AWS and GCP.
- Each GCP VPC is Internet connected, and there is no need to explicitly define an Internet Gateway as in AWS.
- By default, the default route for GCP subnets is via the (implicit) Internet gateway, but it is possible to route via one or more NAT instances instead to allow outbound-only internet connectivity for instances which only have a private IP address.
- There is no equivalent for the AWS NAT Gateway. In GCP you need to manually configure one or more GCE VMs as NAT instances. This was the situation in AWS until relatively recently. I was surprised to see this was the case in GCP.
Google’s solid engineering credentials are evident in their development of Borg, and also shine through in their extremely high performance internal and inter-region networks.
- Google regions are connected via a dedicated mesh fibre network, which all inter-zone traffic transits, benefiting from high bandwidth and low and stable latency.
- Within each zone, servers are connected via a proprietary petabit network with sub-ms latency. (http://www.datacenterknowledge.com/archives/2015/06/18/custom-google-data-center-network-pushes-1-petabit-per-second/)
Borg, containers and VMs
Arguably, AWS is built on top of VMs, whereas GCP is built on top of containers.
- Everything in Google runs on Borg, the in-house container scheduler which was the inspiration for Kubernetes. This includes GMail, BigTable and all of Google Cloud Platform.
- Borg manages (Google-flavour) Linux containers to segregate processes and runs on bare metal.
- Google Cloud Platform allows users to provision Google Compute Engine VMs, including many flavours of Linux and Windows.
- Each VM runs as a single KVM process in a Borg-managed container. See Section 6.1 in the Borg paper.
- From experience, a GCE VM typically takes around 20 seconds to be available from the point of request.
- Although the implementation is unrelated, Rancher Labs have discussed and justified the approach in principle.
- Google Container Engine uses Kubernetes to manage Docker containers, and runs within a Linux VM, which in turn runs within a Borg-managed container.
All this got me thinking more deeply about virtualisation and containerisation.
- Here is an interesting article about how the lines between containerisation and virtualisation are being blurred: https://www.theregister.co.uk/2017/06/01/linux_open_source_container_threat_to_vmware_microsoft/
- And there are also ‘unikernels’, where a single application is compiled along with specialised libraries which contribute components of an OS and together form a single address space executable image which can be run directly on a hypervisor or even a bare metal machine. These specialist creations offer all the security benefits of traditional full OS virtualisation along with many of the startup speed and meagre resource requirements of containers, but at the cost of extra complexity: whereas the other two approaches allow arbitrary precompiled binaries to be packaged and run, unikernels require specific compilation. See http://unikernel.org/
- And here is a history of the different ways to achieve virtualisation: http://www.theregister.co.uk/2011/07/11/a_brief_history_of_virtualisation_part_one/
- Another, somewhat quaint, history of virtualisation: http://www.everythingvm.com/content/history-virtualization
I was impressed with my first taste of GCP, and it definitely looks worth trying on a small project.