« Previous | Main | Next »

Zeitgeist - the most shared BBC links on Twitter

Post categories:

Theo Jones | 10:45 UK time, Wednesday, 14 July 2010

Zeitgeist is a prototype to highlight the most shared BBC webpages on Twitter, a digest to link people to the hottest BBC pages. The project is part of a larger area of exploration to see how the BBC can use real-time trending data to enrich user experiences. One of our recent projects Music Trends shows how the artists played on BBC radio are trending on other music services, such as Last.fm and We are Hunted.

We developed Zeitgeist as a simple information source for users and to provide insight into users' interests and behaviours for our production teams. There are some interesting commercial alternatives available such as backtweets, twitturls, twitturly and tweetmeme, which are worth checking out but we had some specific requirements for our prototype.

The system combines a custom built ingest chain using Twitter's public APIs to search for tweets containing a BBC URL. As it's running in real-time these links come and go depending on what Twitter users are talking about. You can see the 'liveness' in the Last 24 hours view or take a broader view of the Last 7 days.

Zeitgeist uses the web page's URL and metadata to determine where it comes from and assign it a category, e.g Technology, Entertainment, Blogs or iPlayer. These give links a context for the user and a means of navigating deeper.

Zeitgeist homepage screenshot

The links are ranked by a tweet count (including retweets) for the chosen time period. Each entry details the page title, category, media type, short description and when it was first tweeted. The date of publication is indicated where available as it's not just new links that seem to get picked up on Twitter.

We have a different view for BBC employees (shown below), which allows us to see; the tweet history of each page, a full list of tweets, most retweeted messages, hashtags and keywords. We are unable to show this to everyone as the messages would need to be moderated.

Zeitgeist detail page screenshot

We use the Twitter streaming API to access the Gardenhose sample stream, which provides a subset of the full Twitter message stream, at a rate of about 100 messages per second and to track "BBC" as a keyword. These messages are then fed into a pipeline of processes written in Ruby connected by queues provided by RabbitMQ, a fast and reliable messaging server.

These are the stages that each incoming tweets goes through:

  1. Twitter combines retweets with it's original tweet, these are split to deliver both messages to the pipeline
  2. A tweet from the API contains a lot of extraneous data which needs to be removed, such as the user's page background colour
  3. Links in the message are extracted and resolved following through redirections and expanding shortened links, bit.ly provide a dedicated API for this
  4. Only tweets containing links to BBC pages are kept. Automatically generated BBC tweets from accounts such as on_bbc1 are filtered out and links to the BBC Homepage are also removed as they skew the results
  5. These are saved to the database
  6. The link category is determined by its domain and in-page metadata
Zeitgeist ingest chain diagram

We split these steps into separate processes for two reasons: it's easier to develop and test a process if it does only one thing; and more importantly, it allows us to balance different parts of the system depending on load. For example, there is only one process required to strip data out of tweets, but ten to resolve the URL. By load balancing this way, we can maintain a steady throughput of messages that does not get overloaded at any point.

To make Zeitgeist, we have had to handle large data sets at high speed. As a rough guide, the Zeitgeist ingest chain handles about 300,000 tweets an hour, of that 900 contain links, 500 of which link to the BBC. Finally, short lists work well as there's a steep drop-off of tweets lower down the chart and as you might expect the majority of links point to BBC News articles.

Zeitgeist is now up and running for a limited period and we trust that you'll find it an interesting resource. We think a system like this could feed into BBC Search as a ranking algorithm, as an additional real-time feed for News recommendations, or as a 'news on the move' mobile service. In any case it shows how audiences can help shape and prioritise content.

Visit the BBC Zeitgeist prototype



More from this blog...

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.