Balancing, spikes & speed: Architecting media distribution cloud services

Senior Software Engineer

Tagged with:

Senior Software Engineer Henry Rodrick is part of the BBC Design & Engineering - Media Distribution team. He describes how the team tackled the architectural challenges faced when building a microservice with strict non-functional requirements.

Cloud technologies have radically changed how most organizations develop and deploy software, and the BBC is no exception. Within the BBC, we’re using the AWS (Amazon Web Services) platform a great deal, for instance the CloudFormation infrastructure management tool and our own AWS based deployment system Cosmos.

While CloudFormation enables developers to programmatically define their cloud infrastructure, it doesn’t really provide a great way to deal with actual software deployments. Cosmos solves this problem by automating the process of building release images (AMIs) and providing an easy-to-use interface where releases can be advanced through the different stages of deployment (integration, test and live).

This is a great improvement over the BBC’s traditional deployment cycle, which generally requires an operations engineer to carry out configuration and installation on in-house servers. Because of this, many teams have now moved to Cosmos for the majority of their development.

In the Media Distribution team, our main systems are content caching servers, known as caching cells, that make up the BIDI Content Distribution Network (CDN). For these systems, we still rely heavily on physical machine deployments to meet our performance requirements. The caching cells also need to be deployed in certain geographical locations, meaning that a cloud based architecture restricted to Amazon’s data centres wouldn’t work, even if the performance were good enough.

Not all of our services in BIDI have to be deployed like that, however. The “brain” of the platform, which deals with traffic management and configuration, is a suite of micro services which aren’t tied to any particular data centre. For these services, AWS/Cosmos promises faster deployments and less operational overhead than our internal deployment platforms. It’s often cheaper in the end as well, even though AWS’ fine-grained pricing model makes it hard to quickly determine whether that’s going to be the case.

One of the BIDI related services that we decided to deploy in the cloud is called Croupier and is used when making decisions about which CDN to redirect BBC iPlayer clients to. Croupier is a lookup service which provides an HTTP GET endpoint where the iPlayer client’s IP is a parameter. The service is internal: it is only used by another (AWS based) BBC service called Media Selector, which is called by iPlayer before the user actually gets to see any video.

The performance of Croupier directly affects Media Selector response times and thus iPlayer user experience as a whole. In addition, iPlayer traffic varies a lot over the day and we don’t want user experience to degrade in the evening when massive peak hour traffic kicks in, for instance.

Because of this, some strict performance and security requirements had to be considered when architecting this service:

  • Croupier should have a mean response time < 50ms. Requests are considered failures if they take more than 100ms. Only a small number of such failures is tolerated since 99.5% availability is required.
  • The service needs to be able to cope with an average of 3,000 requests per second, with point-in-time spikes up to 14,000 requests per second.
  • Deployments of new Croupier versions should not cause any downtime.
  • Clients must use a specific BBC issued SSL/TLS certificate to connect to the service.

Architecting for high availability and zero-downtime deployments

There’s a large number of architectural options available through AWS. And even though “serverless” architectures such as AWS Lambda now exist, it is a bit of a paradigm shift from what most developers are used to.

For example, the type of IP lookups performed by Croupier can be implemented efficiently using a read-only, in-memory data structure, which can be generated once and reused by subsequent lookups (and even from multiple threads). It’s not directly clear if it’s possible to translate this to efficient Lambda code, since no such state sharing is possible.

Another problem with Lambda is that, as with much of AWS, you get a great set of building blocks but not necessarily the tooling. Normally Cosmos would provide this, but its Lambda support was still in development when work on Croupier started in March 2016.

Because of these concerns, we settled for a more traditional architecture; basically a cluster of near-identical servers sitting behind some kind of load balancer. In AWS speak this means using Elastic Compute Cloud (EC2) instances, managed by an Auto-Scaling Group (ASG) and load balanced using the Elastic Load Balancer (ELB, also known as the Classic Load Balancer).

So what about the availability and deployment requirements in this setup?

The Auto-Scaling Group came in handy in multiple ways, but perhaps not in the way suggested by its name.
First of all, the ASG used together with the ELB makes up a pretty nice automatic instance manager that terminates and re-spawns instances based on custom health checks.

For Croupier, we simply defined the ASG MinSize to make sure that we always have enough server capacity to deal with our expected load (based on load test results). An important step here was to implement the health check in a way that actually tests whether real Croupier lookups can be performed – a simple ping-like dummy endpoint would potentially report the instance as healthy even when it isn’t.

To make sure that the capacity is maintained without any downtime when deploying new versions of Croupier, we used the ASG’s AutoScalingRollingUpdate update policy. This policy provides a few useful parameters such as MinInstancesInService and UpdateMaxBatchSize, which allows us to replace instances one by one while maintaining our required number of healthy instances (A caveat is that the ASG’s MaxSize parameter must be set to at least MinInstancesInService + UpdateMaxBatchSize, otherwise the rolling update will fail because of conflicting policies).

Croupier at scale

In theory, the “auto-scaling” aspect of ASG sounds great: Instances are automatically provisioned (and removed) based on some performance metric, such as CPU or memory usage, to make sure that there’s enough computational power without wasting money on idle instances. This might seem straightforward in theory, but as explained in many articles online (for instance the Netflix tech blog), it’s a bit of a double-edged sword. First of all it’s difficult to deal with sudden traffic spikes, but even with predictable “smooth” traffic patterns a lot of analysis and care is needed. It’s likely that you will end up with a policy that is really restrictive about scaling down (to avoid a disaster), which means that your ASG is over-provisioned most of the time.

For Croupier, we decided to avoid automatic scaling altogether. In addition to the general difficulties of getting the scaling policy right, the Croupier traffic patterns are not actually that smooth. Request rates are expected to sometimes spike from almost zero to thousands per second!

The compromise solution we ended up with is to use the so-called t2 class of EC2 instances. These instances have “burstable” performance, meaning that they can deal with shorter bursts of CPU usage, as long as the average CPU usage over time doesn’t go above a certain baseline.

By monitoring the performance of these instances using CloudWatch, we can tell if the traffic baseline has increased over time and manually scale up the size of the ASG.

ELBs and TLS termination

The Elastic Load Balancer has two possible modes of operation: HTTP and TCP. Initially, HTTP mode looked like the better option for Croupier since the ELB would have the obvious advantage of knowing what’s logically happening over the wire, as opposed to just distributing TCP connections across a number of EC2 instances. In HTTP there would also be the advantage of built-in HTTPS termination.

Unfortunately, the TLS termination requirements for Croupier made it impossible to use HTTP mode. HTTPS configuration for ELB is fairly basic and doesn’t provide a way to perform the issuer verification we’re required to do (allowing certain BBC certificates only). Because of this, we had to configure the ELB to operate in TCP mode and add custom TLS termination on the EC2 instances.

Our options here were either to implement HTTPS directly in the Croupier server software (a Java application) or put it behind a HTTPS capable web server such as Apache or Nginx using reverse proxying. In the end, we decided to terminate HTTPS traffic through Apache because a lot of BBC specific configuration to deal with certificate verification and revocation was readily available.

Latency turned out to be the biggest issue with this solution at first. Apache out of the box is not optimised for long HTTP keep-alives (it actually seems to be optimised to mitigate memory leaks from things like bad old PHP scripts by closing connections often). Opening new connections results in extra TLS handshakes, which in turn require extra round trips causing latency spikes. Fortunately, most web servers (including Apache) allow you to monitor and configure keep-alive handling and other relevant things, but be prepared to spend some time load testing and tuning!

Latency spikes caused by load test clients simultaneously reconnecting, before tuning Apache.

The not-so-elastic load balancer

Similar to ASG, the Elastic Load Balancer attempts to make full use of the dynamic nature of the cloud and provision computational resources as needed. In this case, it means the size of the load balancer itself (how ELBs work under the hood is not publicly documented, but “size” in this context basically means how much traffic they can handle).

One important thing we learned about ELBs is that they start out in a default state, but are actually allowed to scale down to some minimum size if idle. This state turned out to have a huge impact on request latency as seen in the following load test graph (the majority of requests failing by definition, by taking longer than 100ms).

Load test performed using a “cold” ELB instance.

Amazon’s guide to load testing with ELBs suggests ramping up traffic slowly. However in our case, we really had to warm up first before testing and going live since the cold ELB wasn’t good enough (in latency terms) for any amount of traffic at all.

It’s possible to ask AWS support to “pre-warm” your ELBs to avoid scaling down between tests. The way we did this was by running load tests (simulating peak traffic) and then tell support at what time the ELB reached the warmed-up state.

Unfortunately, there isn’t much monitoring available to tell you what ELBs are actually up to. However, ELB load balancing is essentially DNS based, so regular name server lookups (using a tool such as dig) will give you an idea of when the ELB’s underlying instances are being replaced.

17:44:46 GMT        60 IN A        60 IN A
17:45:46 GMT        60 IN A        60 IN A

The ELB IPs will change roughly every 15 minutes or so when scaling up, so if the DNS entry has stayed the same for 30 minutes it’s likely that the scaling has settled down (as suggested by our testing, at least – Amazon doesn’t provide a lot of information about this).

Wrapping up

AWS is a great toolbox for cloud development. That being said, the “elastic” features are not a magic solution to scaling and still requires a lot of careful tuning and testing, except perhaps for services with a very trivial workload.

The ELB is not a silver bullet for load balancing, and many teams at the BBC rely on custom solutions instead (plain DNS load balancing through AWS elastic IPs and Route53 for example). The platform is constantly evolving however and a new Application Load Balancer was recently introduced by AWS, but we’re still waiting for a truly general solution to the problem.

Tagged with:


More Posts