Future Media Standards & Guidelines

HTML Metadata Standards v1.0.2 (superseded)

1.1. The HTML metadata standard specifies requirements and recommendations for embedding metadata - information about a document - in all new HTML pages published on BBC online (bbc.co.uk).

1.2. The provision of accurate metadata in HTML is essential to the success of any site. Metadata significantly improves search results and helps users find relevant material.

1.3. Absent, inconsistent or inappropriate metadata is a barrier to users.

1.4. Careful consideration should be given to metadata early in theHTML authoring process to ensure that there is a consistent scheme for generating it appropriately.

1.5. Title

1.5.1. The title of a page appears in the title bar of a browser and is intended to be readable by users.

1.5.2. The title is used by both internal and external search engines to index the contents of a page and is given extra weighting in determining relevant results to a search query. (Ref. 1)

1.5.3. The title is displayed in search results and is an important way for users to assess the relevance of a page.

1.5.4. Every page MUST have a unique title. The title should be concise and descriptive.

Properties

[site]
The name of the site
[section]
The name of the section or subsection of the site (optional)
[page title]
The title of the page

Production rule

BBC - [site] [section] - [title]

Examples

<title>BBC -  Eastenders -  News</title>
<title>BBC - Buffy - Angel</title>

Examples with site, section and page title:

<title>BBC -  Five Live Football -  FA Cup</title>
<title>BBC - Manchester Travel - Trains</title>
<title>BBC - GCSE Bitesize Chemistry - Patterns of Behaviour</title>

Top of page

2. <meta> and <link> elements

2.1. HTML lets HTML authors express metadata within the document, with the <meta> element, or outside the document, by linking to the metadata with a <link> element. (Ref. 2)

2.2. Metadata is used by both internal and external search engines in indexing the contents of a page and is given extra weighting in determining relevant results to a search query. (Ref. 1)

2.3 Keywords

2.3.1. The keyword metadata element is used by the BBC's search engines when indexing the contents of a page. The keywords it contains are given extra weighting when determining relevant results to a search query.

2.3.2. Keywords in metadata have no significant positive effect on major external search engines. On the contrary, repeating or overusing key terms in metadata (sometimes called "stuffing" or "stacking") may negatively affect search rankings. (Ref. 1)

2.3.3. A specific set of keywords SHOULD only appear on the page that would be most relevant to a search query that included those terms.

2.3.4. Section and sub-section home pages MUST have a unique set of keywords specific to the section.

2.3.5. Content pages MUST NOT repeat the keywords of their parent section home page as this significantly reduces the effectiveness of indexing and the relevance of search results.

2.3.6. Content pages SHOULD have unique keywords specific to that particular page. If this is not practical, the content of the keywords tag should be left empty.

2.3.7. Keywords SHOULD be as specific as possible and SHOULD be in order of priority. Individual words and phrases MUST be separated by a comma. The total number of keywords SHOULD NOT exceed twenty words and individual keywords MUST NOT be repeated as this will be penalised by the major external search engines.

2.3.8. There is no need to include the keywords "BBC" or "British Broadcasting Corporation" in the HTML metadata for every page.

Properties

[keywords]
A list of keywords and key phrases, separated by commas

Production rule

<meta name="keywords" content="[keywords]">

Example

<meta name="keywords" content="Albert Square, location, plan">

2.4. Description

2.4.1. The description is used by both internal and external search engines in indexing the contents of a page and is given extra weighting in determining relevant results to a search query.

2.4.2. Every page MUST have a description.

2.4.3. The description may be used by search engines to present a summary of the page to the searcher. Generally speaking, if the search term exactly matches a word or phrase in the description, and the description appears to be a coherent sentence rather than a list of keywords, the description will be used. Google follows this rule strictly; the BBC's internal search engine usually does. Therefore the description should be an accurate, appealing summary of the contents of a particular page and MUST NOT exceed thirty words.

2.4.4. If the search term does not match a phrase in the description, the search engine will use its own algorithm to decide how to describe the page. This may be a short sentence in the page which contains the exact search term; it may be an summary generated from content in the page; or it may be a description of the page from the DMOZ project.

2.4.5. (If necessary, pages may include meta tags which will tell the Google and MSN crawlers never to use the DMOZ description.) (Ref. 13, 14)

2.4.6. Section and sub-section home pages MUST have a unique description specific to the section.

2.4.7. Content pages SHOULD have a unique description specific to that particular page. If this is not practical, the description may repeat that of the parent section.

Properties

[description]
An accurate and appealing summary or abstract of the page contents.

Production rule

<meta name="description" content="[description]">

Example

http://news.bbc.co.uk/1/hi/entertainment/tv_and_radio/2352193.stm

<meta name="description" content="EastEnders actress Barbara Windsor is leaving the soap after eight years - but promises she will return in 2004.">

2.5. Dates

2.5.1. These elements are used by the BBC's own search engine to help order and present search results.

2.5.2. This standard uses metadata terms from the Dublin Core Metadata Initiative (Ref. 3) and follows their guidelines for encoding those terms in HTML. (Ref. 4)

2.5.3. The use of standard metadata terms and encoding means that the information is useful to the widest range of processing applications. It is worth noting however that the major search engines infer a document's age from the date it was first encountered by their spiders, not from its metadata. (Ref. 1)

2.5.4. Created Date

2.5.4.1. The "DCTERMS.created" metadata property records the original publication date of the document.

2.5.4.2. Every page MUST have a "DCTERMS.created" metadata property.

2.5.4.3. Every page with a "DCTERMS.created" metadata property MUST also contain a link to the dcterms schema in the head of the document:

Properties
[date]
ISO8601 date.
The date MUST follow the W3C Profile of ISO 8601 (Ref. 19) with a granularity of "Complete date:" or finer. Time zones may either be handled by expressing dates in UTC or expressing a time zone offset, as described in the W3C profile.
Production rule
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="[date]" />
Examples
<head>
<title>BBC - Eastenders - News</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31" />
</head>

<head>
<title>BBC - Buffy - Angel</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31T08:49:37+00:00" />
</head>

2.5.5. Modified date

2.5.5.1. The "DCTERMS.modified" metadata property records the date the contents of the page were last changed.

2.5.5.2. Every page SHOULD have a "DCTERMS.modified" metadata property. If it is not practical to produce this information reliably, the element MUST be omitted.

Properties
[date]
ISO8601 date.
The date MUST follow the W3C Profile of ISO 8601 (Ref. 19) with a granularity of "Complete date:" or finer. Time zones may either be handled by expressing dates in UTC or expressing a time zone offset, as described in the W3C profile.
Production rule
<meta name="DCTERMS.modified" content="[date]" />
Examples
<head>
<title>BBC - Eastenders - News</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31" />
<meta name="DCTERMS.modified" content="2001-01-02" />
</head>

<head>
<title>BBC - Buffy - Angel</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31T08:49:37+00:00" />
<meta name="DCTERMS.modified" content="2000-01-02T017:32Z" />
</head>

2.5.6. Date of reference

2.5.6.1. If the contents of the page is strongly related to a particular date, it is possible to advertise this in the page metadata.

2.5.6.2. If HTML authors choose to do this, they MUST use the "temporal" metadata term from the Dublin Core standard (Ref. 3), with the "Period" encoding scheme (Ref. 15) following the encoding rules for HTML. (Ref. 4)

Examples

A page about the Great Depression:

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="name=The Great Depression; start=1929; end=1939;" />

A page about events occurring on 28th November 1990:

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;" />

2.6. Copyright and license

2.6.1. If a page is copyrighted it MUST link to the copyright statement in the head of the document, using the standard W3C defined copyright 'link' element type.

2.6.2. The standard BBC terms and conditions are at http://www.bbc.co.uk/terms/.

2.6.3. If a page includes a link to the copyright or terms and conditions in the body of the page, the link SHOULD contain a rel="copyright" attribute.

2.6.4. Some documents may be released under a particular license, such as the creative commons or creative archive license. Links to such license statements SHOULD include a rel="license" attribute. (Ref. 12) This helps search engines which search for documents by license terms.

Examples

Copyright link in head of document

<link rel="copyright" href="http://www.bbc.co.uk/terms/" />

Copyright link in body of document

<a rel="copyright" href="/go/homepage/int/foot/-/terms/" class="bo">© MMVI</a>

License link in body of document

<a rel="license" href="http://creativecommons.org/licenses/by-nd/2.0/" >Some rights reserved</a>

2.7. Authorship

2.7.1. If an article is attributable to an author with a byline, that information SHOULD be reflected in the HTML metadata.

Properties

[author]
The full name of the bylined author. This field SHOULD NOT contain authors' names if they do not receive a byline in the article. It SHOULD NOT be filled with a non-specific name such as "BBC" or "British Broadcasting Corporation".

Production rule

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.creator" content="[author]" />

Examples

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.creator" content="Humphrey Hawksley" />

2.8. Other metadata

2.8.1. HTML authors are free to use any other metadata elements which suit their purposes.

2.8.2. If possible it is better to use standard elements rather than a BBC-specific scheme. HTML authors SHOULD at least check the Dublin Core standard to see if it already describes the metadata they wish to encode. (Ref. 3)

2.8.3. If the metadata is not part of the XHTML specification, the document MUST link to the relevant profile which identifies the metadata. (Ref. 17) HTML authors MAY place an XMDP (Ref. 18) specification at the profile URI, but this is not necessary.

Examples

Linking to the Dublin Core metadata profile:

<head profile="http://dublincore.org/documents/dcq-html/">

A document which includes the hCard and hCalendar microformats: (Ref. 16)

<head profile="http://www.w3.org/2006/03/hcard http://microformats.org/wiki/hcalendar-profile">

Top of page

3. Robot exclusion

3.1. Robots (sometimes called "spiders") are programs that traverse many pages in the Web by recursively retrieving linked pages. The most significant robots are those which trawl the web to create indexes for search engines.

3.2. In some cases it may be desirable to exclude particular pages from search engine indexes.

3.3. For example, a page SHOULD be excluded from search indexes in any of the following cases:

  • Any page without standard toolbar navigation, for instance a page that will appear in a pop-up window.
  • Any page that is not independently meaningful out of context, for instance a page presented in response to a form submission.
  • Any page that should not be accessible from a search, for instance the answers to a quiz.

3.4. The robots exclusion protocol (Ref. 8) is a method that allows Web site administrators to indicate which part of their sites should not be visited by robots. The most reliable and convenient way to exclude a site is to add it to the robots.txt file at www.bbc.co.uk/robots.txt. Contact the Assistant Producer for Search to add your site to this file.

3.5. Excluding pages from search indexes using the Robots <meta> tag

3.5.1. Sometimes it is more convenient to exclude individual pages by placing a robots <meta> tag in the HTML. (Ref. 11) All the large search engine robots support this tag (Ref. 9, 10), as does the BBC's search engine.

Example

Instruct robots not to index the page and not to follow any of the links on the page:

<meta name="robots" content="noindex,nofollow" />

Top of page

4. Pragma: no-cache

4.1. Occasionally HTML authors insert the following line into their documents in the mistaken belief that it will prevent the page being cached by browsers and proxies.

<meta http-equiv="Pragma" content="no-cache" />

4.2. The HTTP specification does not set any guidelines for Pragma response headers; instead, Pragma request headers (the headers that a browser sends to a server) are discussed. Although a few caches may honour this header, the majority won't, and it won't have any effect. (Ref. 5) Internet Explorer supports a special (non-standard) usage of the HTTP Pragma: no-cache response header for backwards compatibility with HTTP 1.0 servers, which only prevents caching when used over a secure connection. (Ref. 6)

4.3. In addition to the problems with the Pragma HTTP header, there is at least one known problem with the Pragma HTTP-EQUIV <meta> tag. (Ref. 7)

4.4. Therefore the Pragma http-equiv <meta> tag MUST NOT be used.

Top of page

5. References

Top of page

6. Examples

An interactive plan of Albert Square:

<head profile="http://dublincore.org/documents/dcq-html/">
<title>BBC - On This Day - 28th November - 1990: Tearful farewell from Iron Lady</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<link rel="copyright" href="http://www.bbc.co.uk/terms/" >
<meta name="keywords" content="28, 28th, November, 1990, politics and protest UK, politicians, history, news, archive, video, audio">
<meta name="description" content="Margaret Thatcher formally tenders her resignation to the Queen and leaves Downing Street for the last time.">
<meta name="DCTERMS.created" content="2002-12-07T08:14:38Z">
<meta name="DCTERMS.modified" content="2002-12-19T10:25:06Z">
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;">

An article about a political event:

<head profile="http://dublincore.org/documents/dcq-html/">
<title>BBC - On This Day - 28th November - 1990: Tearful farewell from Iron Lady</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<link rel="copyright" href="http://www.bbc.co.uk/terms/" >
<meta name="keywords" content="28, 28th, November, 1990, politics and protest UK, politicians, history, news, archive, video, audio">
<meta name="description" content="Margaret Thatcher formally tenders her resignation to the Queen and leaves Downing Street for the last time.">
<meta name="DCTERMS.created" content="2002-12-07T08:14:38Z">
<meta name="DCTERMS.modified" content="2002-12-19T10:25:06Z">
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;">

Top of page

7. Document history

DateVersionChangeAuthor
18/11/2008 v1.0.2 Minor revisions around usage of term bbc.co.uk. Victoria Jolliffe
30/09/2008 v1.0.1 Revised to include capitalisation of DC and DCTERMS, and to replace 'dc.author' with 'DC.creator'. Kevin Hinde
04/12/2006 v1.0 Completely revised, incorporating proposals from Working Group Kevin Hinde
01/02/2002 v0.7 Endorsed by Web Standards group
07/01/2002 v0.6 Revision of title tag section from AS
01/11/2001 v0.5 Removed version field
01/11/2001 v0.4 Revisions to date metadata
31/10/2001 v0.3 Minor revisions after comments from AS
27/10/2001 v0.2 Substantially redrafted and reformatted William Cooper
16/10/2001 v0.1 Initial draft Anne Smith

Document editor: Editor, Standards & Guidelines. If you have any comments, questions or requests relating to this document, please contact the Editor, Standards & Guidelines.

Like all other Future Media Standards & Guidelines, this page is updated on a regular basis, through the process described on About Standards & Guidelines.

Top of page

BBC navigation

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.