Keeping data free, reliable and up-to-date

Dr Jeni Tennison, technical director of the Open Data Institute, on using open data principles to create a flexible, intuitive and trustworthy website, Legislation.gov.uk

The National Archives is the official government archive for the UK, which holds over 1000 years of the nation’s records for public use and provides expertise to hundreds of public sector bodies on how to store, manage and share government data. Five years ago, it was decided that the two websites containing all UK legislation data needed to be integrated into one comprehensive public resource. To this end, the National Archives joined forces with Her Majesty’s Stationery Office, the official publisher of all Acts of Parliament, to create what would eventually become Legislation.gov.uk.

In this article, we look at how the creators of Legislation.gov.uk used open data principles to create a flexible, intuitive and trustworthy website, which provides all UK legislation data for anyone to access and re-use. 

In website terms, ‘data’ can refer to:

  • pages that users can link to
  • snippets of content that users can embed into their own pages
  • dumps of data that can be downloaded into users’ own databases
  • searches over information
  • updates on that information

The priorities for the team creating the Legislation.gov.uk website were to signpost the new website clearly for users who were not confident about how legislation works and to make it easily accessible to users who wanted to re-use the data available.

"We were particularly interested in the re-user experience." – Dr. Jeni Tennison

This meant that the data published on Legislation.gov.uk needed to be available in a variety of formats and be openly licensed, so that people could build commercial applications on top of it.

Legislation.gov.uk’s system architecture supports this flexible re-use of data, starting from an XML database at the back end. The team found that XML is the best starting format for data because it is easily translatable.

System architecture

"Good open data is available in a standard, structured format so that it can be easily processed"

Legislation.gov.uk’s strategy was to ensure that the same data was underlying all the different formats that people could access. In addition to a standard PDF, users can also access the XML data behind the document and can re-use that same data in an iPhone app. The same was done for browses/searching – underlying any search is an atom feed of the data itself, which a data re-user can ‘hook into’.

This potential for data re-use has resulted in intermediaries creating apps such as MobileLegislate, which allows people to read legislation on their phones. The consumer is guaranteed that the data they are reading comes from an official government source, and the publisher also benefits as the app brings a wider audience to them. Since legislation.gov.uk does not have the resources to create apps or eBooks, having re-users willing to develop and tailor these products is incredibly valuable.

Another example is the Longman Law bespoke app, which allows law professors to create a customised law textbook from the legislation data from the website. The user specifies the sections they want to include and the app creates a book which professors can then sell to their students.

"Good open data has guaranteed availability and consistency over time"

One of the major problems with legislation publishing is the fact that laws are constantly altering and affecting other laws, requiring major resources to keep up with the changes. Making changes to legislation data is a painstaking task; the effects of changing a piece of legislation have to be individually identified by hand, then input using an editing interface. 

ODI’s in house team can apply about 10,000 effects per year, but parliaments and assemblies create about 15,000 effects per year, meaning the process is always behind. There is also a backlog of 100,000 unapplied effects. This is obviously a major problem which compromises the publisher’s ability to guarantee that data is up to date and accurate.

“When we ask people to make their data available for free, they want to know how they can get something back”

The benefit of the open data strategy is it encourages intermediaries with a vested interest in the integrity of certain data (in this case, those working in the legal sector) to invest in the maintenance of that data. Legislation.gov.uk now has partners like the Practical Law Company who are investing time, expertise and money to help update legislation.

visualisation

"We’re no longer just publishers. We’ve had to revise the way we support the new requirements"

The open data strategy means that the Legislation.gov.uk website now works more interactively with its partners and users. Instead of the previous ‘read’ platform, it is now a ‘read-write’ platform – a dynamic native web interface allowing people to interact with the information behind the scenes. This helps re-users to interpret the information they are seeing, as they are able to access the HTML behind a page and see exactly how the data has been presented.

The team working on updating legislation have also benefited from the new level of interactivity, with new visualisations which allow them to track their progress. Team members can look at any piece of legislation and see a dashboard showing the points in time where it has been updated, and what still needs to be done to it.

RDF is an ideal format for creating graphs and visualisations like the one below:

Partners

 

"We’ve adopted the principle of using the right format for the right job"

The project has revealed that different formats have different benefits.

HTML
HTML (HyperText Markup Language), is the ‘lingua franca’,  a common language that everyone understands. It is simple to understand, easy to maintain and is supported by every browser.

JSON
JSON (JavaScript Object Notation) is simple and easy to read. It is data-oriented rather than document-oriented. 

XML

 

XML (Extensible Markup Language) is flexible and supports multiple data formats, multiple views of the same content are easily rendered.

RDF
RDF (Resource Description Framework) can capture information from unstructured, semi-structured or structured sources. 

The only change to the system architecture has been the introduction of triplestore to store the RDF data. This sits side-by-side with the back-end XML database.

"URLs are key to enabling the different sources of data to work together"

Although underpinned by one single ‘work’, an item of legislation will have different ‘expressions’ according to time and location. URLs address these different versions. There are also URLs which address the different formats, meaning that by adding on suffixes /data.pdf, /data.xml or /data.htm, the user can view the data in whichever format they wish.

 "Our model creates a feedback loop where users can help improve the quality of your data" 

The benefit of open data for Legislation.gov.uk is clear, as it has provided an incentive for outside parties to help maintain the information on the website. It has also provided a wider audience for published information by making the data available for re-use via apps and eBooks, and has provided new analysis techniques that those working to update the website can take advantage of. 

If this works for legislation, then it has the potential to work in many other places. Any company dissatisfied with the quality of their data should consider using the open data model, as it can help them to find partners who can improve it.

 

This article is based on a presentation given by Dr Jeni Tennison, technical director of the Open Data Institute (ODI), at BBC’s Develop 2012 conference. Jeni is one of the country's foremost developers in open data and web technologies. She pioneered the use of open data APIs within the public sector through legislation.gov.uk. She is known internationally for her work with both XML and Linked Data, and is a member of the W3C's Technical Architecture Group.