BBC News on HTTPS
Principal Software Engineer, BBC News
A few weeks ago the BBC News website finished transitioning to HTTPS. The green padlock you should now see next to the web address is probably the biggest publicly visible technical change to the site since it relocated from news.bbc.co.uk in 2011. Even so, a question we’re often asked is “why did it take so long?”
Before answering that, it’s worth remembering why HTTPS (or more accurately, TLS) has come to be seen as a must-have feature for all web applications. In the early days, secure technologies such as SSL were largely the preserve of e-commerce websites. The padlock assured the user of both the site’s authenticity and the encryption of their credit card details in transit. The use of these technologies has expanded in recent years, with campaigns such as HTTPS Everywhere and Secure the News promoting the adoption of HTTPS across the board. Meanwhile, browser vendors such as Google are taking steps to identify sites that do not use HTTPS as ‘Not secure’. Clearly changes to the web landscape and user expectations mean that universal HTTPS is here to stay.
As a public service, we have to ensure that BBC News is available to the widest possible audience, regardless of device, browser or use of assistive technology. We champion the ideal of graceful degradation of service as far as possible. But in a climate of anxiety around fake news, it’s vital that users are able to determine that articles have not been tampered with and that their browsing history is private to them. HTTPS achieves both of these as it makes it far more difficult for ISPs to track which articles and videos you’re looking at or selectively suppress individual pieces of content. We've seen cases outside the UK, with some of our World Service sites where foreign governments have tried to do this.
Our plan for migrating the News website was relatively straightforward, built on extensive groundwork already done to move World Service sites (such as BBC Hindi) to HTTPS. Until recently, anyone accessing BBC News over HTTPS was redirected (‘downgraded’) to HTTP. This changed in March when we enabled access via both protocols and began an iterative process of chasing down a multitude of bugs, while we worked on updating links, feeds and metadata to reflect the new address. Colleagues in BBC bureaux around the world helped us detect access issues in different geographical areas early (we discovered, for example, that in India a government-mandated network block initially made the site totally inaccessible).
At the same time, we compared the page load performance of real users across HTTP and HTTPS, which revealed that many of those on HTTPS received a slower experience, due to the relatively large number of domains our assets are served from and the overhead of negotiating multiple TLS connections. To balance out this impact, we decided to extend the project to include some performance improvements to the site. Our final step was to reapply the redirect in the other direction, ‘upgrading’ HTTP users to HTTPS in sections (though even here we had to proceed with caution, initially making the redirect temporary in case it had to be reversed).
There were other challenges. The work had to be fitted around major events that place restrictions on our platforms, including a Royal Wedding and local elections in the UK.
Many of the bugs mentioned above fall into the class of ‘mixed content’, where the browser detects non-HTTPS assets being loaded on an otherwise secure page. This is a particular challenge for BBC News due to the site’s long and complex history, since almost every page published since the site launched in 1997 is still available. Though it appears externally to be a single website, it is really a patchwork of technical architectures, mainly because of differing requirements. Our election coverage demands real-time updates combined with scalability to cope with huge traffic levels, while one-off interactives need a flexibility and richness of experience that goes beyond our standard templates.
Over the last twenty years, publishing systems for content on News pages have come and gone, having been replaced or made obsolete. Although newer content is published through dynamic web applications that can be readily modified, what lies beneath this sometimes resembles layers of sedimentary rock. This means in practice that tracking down historical mixed content and working out how to change it is not always straightforward. We developed our own ‘crawler’ to help us find such problems, and had to come up with some crafty workarounds to address some of the most inaccessible bugs, and a number of these tasks are still in progress. We also have a major ongoing project to convert some older audiovisual content to a format that can be delivered securely, but this will take time.
Even then, some mixed content just cannot be fixed economically, and one or two errors will remain. Such pages still work, with the occasional browser warning, similar to how BBC News pages from the late 90s are still usable. We confined our efforts to content available on www.bbc.co.uk, leaving older domains as a historical record. We think users would rather we spent more of our time on building the future of the website.
BBC News is now only available over HTTPS, and the padlock (combined with the web address) hopefully gives users of the site confidence that what they read and watch was published by the BBC and is private to them. We hope you agree it was worth the wait.