« Previous | Main | Next »

Why the BBC removed microformat DateTime patterns from bbc.co.uk...

Post categories:

Jonathan Hassell | 14:33 UK time, Friday, 4 July 2008

... and what we are doing to bring them back

A couple of weeks ago we made the decision to start removing microformats from BBC sites that used the DateTime pattern, the most popular of which is hCalendar.

This pattern was provided to give non-BBC programmers an API (an "application programming interface" - these allow computers as well as human readers to use BBC data) with which they could create software using the information on sites such as /programmes and iPlayer.

Unfortunately the pattern had a number of flaws, which I'll summarise here:

In terms of accessibility: using the DateTime pattern causes some screen readers (in non-default configurations) to read out the contents of the title attribute rather than the text content of the element, meaning users will hear data which is designed to be understood by computers rather than information designed to be understood by people

In terms of usability:
using the DateTime pattern causes a tooltip to appear containing this machine-readable data when the user hovers the mouse over the text content. Some technical users may understand "1998-03-12T09:30:00-05:00", but the majority of BBC users will not.

Because of the above problems, we changed our semantic markup standard, adding a rule that the title attribute MUST contain human-readable data.

This is why microformats have started to disappear from BBC sites.

We need to uphold the needs of our users, and see if we can find alternative patterns which do not have these negative usability and accessibility side-effects before programmers start building too much software which depends on the DateTime pattern in its current form.

The BBC have engaged with the microformats community to come up with alternative patterns. While this is a complex process, I hope that through this engagement an alternative pattern will be found which satisfies all the demands on it, from a programming, web standards, usability and accessibility perspective.

My colleague Jake Archibald (who is a Senior Client Side developer in BBC FM&T) has more technical detail on this decision below, and the latest summary of the debate around these alternative patterns.

Jonathan Hassell is Head of User Experience & Accessibility, BBC Future Media & Technology

The Microformat DateTime Pattern

jakearchibald.jpg

Michael Smedhurst blogged about the removal of this pattern from bbc.co.uk/programmes on the BBC Radio labs blog a couple of weeks ago, and then the world went mad.

The RDFa guys started claiming victory and a small war broke out in the microformats community around alternatives to the pattern.

I'd like to clear up exactly why we don't support the current pattern, and what alternatives have been proposed.

What are microformats?

The HTML elements we use in modern web development are from a specification released in 1999. The web has evolved considerably since then and there are notable gaps in the specification. A developer can, using HTML, identify some content as computer code, but cannot identify a telephone number or a date.

microformat_arm_wrestling.jpg

The microformats community create HTML patterns for describing things such as contact details and calendar events. Other programs and websites can read this pattern and present the data in another way. An example of this is the Operator plugin for Firefox, which recognises the hCalendar microformat and lets the user add detected events to their Google calendar. (n.b image from nennett on flickr)

What's wrong with the current datetime pattern?

<p>
  To be held on
  <abbr class="dtstart" title="1998-03-12T08:30:00-05:00">
    12 March 1998 from 8:30am EST
  </abbr>
  until
  <abbr class="dtend" title="1998-03-12T09:30:00-05:00">
    9:30am EST
  </abbr>
</p>

The screen reader issue

This is the most commonly discussed issue, but it's not as big as people suggest. Some screen readers in non-default configurations will read out the contents of the title attribute rather than the text content of the element, meaning users may hear the machine data rather than the human data.

Personally, I believe that screen readers should read the title attribute rather than the text content by default (I'll come back to this later), but they don't, so it's not that much of an issue.

The tooltip issue

This is the biggest issue in my opinion. When you use the above pattern, a tooltip will appear containing the content of the title attribute when the user hovers the mouse over the text content. Like the screen reader issue, this is presenting machine data to the human user. Some technical users may understand "1998-03-12T09:30:00-05:00", but the majority of BBC users will not.

The semantic issue

The HTML4 and XHTML2 specifications say the <abbr> element is for marking up an abbreviated form with the expanded form in the title attribute, and the <abbr> element should be used around each instance of the abbreviated form. The HTML4 specification says the content of the title attribute may be presented to the user, so you can conclude that the content is intended for humans.

On the other hand, the XHTML2 spec is vague, defining the title attribute as "meta-information about the element on which it is set". In XHTML2 land, the microformat use of <abbr> seems valid.

HTML5 defines <abbr> as an abbreviation or acronym, with an optional expansion via the title attribute. In my opinion, this is the best definition of <abbr>. Expansions should only be used when they're needed. So it would be used like this:

<p>I am 6<abbr title="foot">ft</abbr> tall and work for the BBC</p>

Here I have expanded 'ft', because I read it as 'foot', whereas I read 'BBC' as each letter individually. This is why I believe screen readers should read from the title attribute of <abbr> elements rather than their text content.

The BBC's decision

Because of the above issues, we changed our semantic markup standard, adding a rule that the title attribute MUST contain human readable data. This is why some microformats have started to disappear from BBC sites.

What are the alternatives?

RDFa is a possible alternative but BBC sites will require an exemption from our standards and guidelines before they can use them, because they don't validate as XHTML strict.

Microformats are an excellent way of adding additional semantic value to a page without compromising validation. However, we can't use them if they create usability or accessibility issues.

Alternatives to the datetime pattern have already been proposed which attempt to solve the current problems. Here's a quick overview of 3 proposals...

Empty elements with title:
<p>
  To be held on
  <span class="dtstart" title="1998-03-12T08:30:00-05:00"></span>
  12 March 1998 from 8:30am EST until
  <span class="dtend" title="1998-03-12T09:30:00-05:00"></span>
  9:30am EST
</p>

Here, empty elements are used to create key-value pairs using class and title. Screen readers ignore the empty element and the hover area for the tooltip is zero-width so it won't appear to the user in normal circumstances. For microformat parsers, there's little change from the current implementation, as most (if not all) do not require an <abbr> element.

However, it has the same semantic issues with title as the current standard has, and should an empty element even have a title?

It's also been raised that some CMS / tidying systems have issues with empty elements, making them self-closing or removing them completely.

Data in the class attribute:
<p>
  To be held on
  <span class="dtstart data-1998-03-12T08:30:00-05:00">
    12 March 1998 from 8:30am EST
  </span>
  until
  <span class="dtend data-1998-03-12T09:30:00-05:00">
    9:30am EST
  </span>
</p>

Here, the machine data is moved into the class attribute. The content of the class attribute is never presented as human readable data and the spec proposes using it "For general purpose processing by user agents". The developer is free to use whatever element is semantically best. Microformat parsers would have to find the element with the identifying class, such as 'dtstart', then look in the same attribute for the data class beginning 'data-'. Elements could have many identifying classes, but only one data class.

However, despite the "general purpose" definition of the class attribute, it's an unusual use of the attribute and not in line with the object-oriented concept of 'class'. Also, a principle of microformats is to keep data visible to humans, whereas this proposal intentionally hides data in the class attribute.

Here's a link to furtherdiscussion of the data class proposal

Date and time separation using value excerption
<p>
  To be held on
  <span class="dtstart dtend">
    <abbr class="value" title="1998-03-12">
      12 March 1998
    </abbr>
  </span>
  from
  <span class="dtstart">
    <abbr class="value" title="08:30">
      8:30am
    </abbr>
    <abbr class="value" title="-0500">
      EST
    </abbr>
  </span>
  until
  <span class="dtend">
    <abbr class="value" title="09:30">
      9:30am
    </abbr>
    <abbr class="value" title="-0500">
      EST
    </abbr>
  </span>
</p>

Here, the time information is split up into separate parts. The <abbr> element and title attribute are used to provide the machine alternative to individual date parts. The machine data is still displayed to user via a tooltip and potentially read by a screen reader, but splitting it up makes it feel more human (and keeps the data visible). A single span can represent both the start and end times, ideal for situations like the above where the start date is only mentioned once, as it is also the end date. Parsers would have to collect all the elements with an identifying class such as 'dtstart', then gather all the <abbr> elements within with class 'value'. The parser would recognise the string patterns in the title attributes (as they are not in a particular order) and construct a full date from them.

However, this pattern seems complicated for both implementors and parsers, involves more elements and can require multiple 'dtstart' and 'dtend' classes, as in the example above. The semantic issues around <abbr> and title remain, as does the possibility of screen readers reading the machine value rather than the human value. Tooltips would still be presented to the user, which may be unwanted and potentially confusing. Non-technical users may not be used to seeing dates year first, or timezones represented in that way.

Here's a link to furtherdiscussion of the separation proposal

Where now?

It's clear that none of the proposals are ideal, and as usual semantics play a big part in the debates between them. The microformats community need to come up with a solution that solves the issues with the current pattern and doesn't create any new ones. Once they do that, microformats such as hCalendar (in their new form) will begin to reappear on BBC sites.

Jake Archibald is a Senior CSD in BBC Future Media & Technology

Comments

  • Comment number 1.

    is it using php or what language? great it is good

  • Comment number 2.

    It is ok for me that the BBC did these changes...

  • Comment number 3.

    "The RDFa guys started claiming victory"

    Actually I was quite relieved at the lack of triumphalism from RDF enthusiasts when this news broke. The only post I saw which really cast things as a conflict/fight was John Resig's. And now yours. I've been encouraging folks lately not to exagerrate the nature of this supposed "conflict" (eg. see http://www.slideshare.net/danbri/one-big-happy-family/%29 ... any chance you could show people holding hands next time instead of arm-wrestingly? ;)

  • Comment number 4.

    Hi Dan,

    Both Jake and I saw your talk at the vEvent session you did, I completely agreed with your sentiment.

    I would personally like to see sensible adoption of both Microformats and RDFa across bbc.co.uk and I suspect I'm not alone. Hopefully in the not too distant future we can demonstrate a one-big-happy-family product using both RDFa and Microformats.

    @vanwebid - Microformats and the various proposals for the datetime pattern issue mentioned above are more agreed conventions for standardised data representation in HTML than a language. See http://microformats.org/about/

  • Comment number 5.

    Looks like the bbc did kick off a year old debate again; and yes, you did spark some action to rectify semantics and clarity.

    However, some of your concerns are not as dire as intimated. Tooltips aren't really a problem of microformats specifically, it's just what (some) browsers do; and if someone can read the tooltip then they have also read the text.

    As for screen readers, can you report on actual complaints? Those that use them are pretty nifty around their environment.

    By killing them off, Id say you've hindered usability and the programmable web as I can no longer save programmes with Operator and the like.

  • Comment number 6.

    Tooltips are a problem of the datetime pattern, they put data which isn't human readable in a place that is presented to the user. Yes, it's the browser that's presenting this information, but microformat patterns must take current user agent behaviour into consideration.

    Having data which isn't human readable presented to the user is unacceptable, even if it's alongside a human readable version.

    The assumption that all screen reader users have advanced skills in IT is common yet wrong. The screen reader users in screencasts, articles and presentations are skilled in IT, but this is no more representative of the general user base than the authors of any web development blog. Screen reader users are not born with these skills.

  • Comment number 7.

    All perspective isn't it? The title attribute wasn't designed for tooltipping.

    Why is it 'unacceptable'? All sorts of innovation happens on the web which enhance usability. There are always compromises when actually getting things done. It's all too easy to criticise.

    Assumptions; which none were made, should also not be made on how we potentially hinder users.

  • Comment number 8.

    The HTML4 spec refers to the title attribute being used for tooltips, but that doesn't really matter because it's what browsers do, it happens in the real world and is seen by real users.

    I can't see how it's acceptable to have machine data visible to the user. I don't think a user viewing the page should have to work around bits of machine data while viewing a page. Machine data should be visible to the machine only. The 'view' should be a human friendly representation of machine data. "1998-03-12T09:30:00-05:00" is not human friendly.

  • Comment number 9.

    It's quite amusing that we agree about browsers; as I said the same at 5 in this thread!

    As for the spec, it states that the title attribute is advisory information and that the 'tool tip' is a browser vendor's interpretation of this. So as I said, it wasn't designed as such (note the inverted commas).

    I understand the issue around what is viewed, hence the whole debate. However, my point is - is it really such a big one? As I also said at 5: People viewing the site will see and read the text easily; they're not likely to start hovering over everything and get annoyed at the ISO datetimes. Are they?

  • Comment number 10.

    @ritchielee - we've just added ical support to /programmes so now you can add and subscribe to programmes via your calendar. It's rather more powerful than the original hCal implentation

    http://www.bbc.co.uk/blogs/radiolabs/2008/07/some_ical_views_onto_programme.shtml

    No individual episode addition so far but that will come soon

  • Comment number 11.

    What up with Value Class Pattern?
    http://microformats.org/blog/2009/05/12/value-class-pattern/

    You back on board?

  • Comment number 12.

    I understand the issue around what is viewed, hence the whole debate. Thank you.. http://www.birsesver.com

 

More from this blog...

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.