Future Media Standards & Guidelines

HTML Metadata Standards v1.2

1.1 The HTML metadata standard specifies requirements and recommendations for embedding metadata - information about a document - in all new HTML pages published on BBC online (bbc.co.uk).

1.2 The provision of accurate metadata in HTML is essential to the success of any site. Metadata significantly improves search results and helps users find relevant material.

1.3 Absent, inconsistent or inappropriate metadata is a barrier to users.

1.4 Careful consideration SHOULD be given to metadata early in the HTML authoring process to ensure that there is a consistent scheme for generating it appropriately.

1.5 Title

1.5.1 The title of a page appears in the title bar of a browser and is intended to be readable by users.

1.5.2 The title is used by both internal and external search engines to index the contents of a page and is given extra weighting in determining relevant results to a search query. (Ref. 1)

1.5.3 The title is displayed in search results and is an important way for users to assess the relevance of a page.

1.5.4 Every page MUST have a unique title. The title should be concise and descriptive.

1.5.5 The following production rule MUST be followed for all title tags of BBC websites.

BBC - [combined context] - [page title] [: optional text including keywords]

where:

[combined context] is the most important 1 or 2 levels of context (e.g. site and section) separated by a space, some good examples are: Football Weymouth; CBeebies Charlie and Lola; Bitesize KeyStage 4.

[page title] is the title of the page

[ : optional text including keywords] is optional text including keywords which must be preceded by a space, a colon and a space

1.5.6 A title tag without optional text SHOULD contain no more than 64 characters. If optional text is included the title tag SHOULD contain no more than a total of 128 characters.

1.5.7 A title tag SHOULD only contain 'BBC' once, i.e. do not have BBC in both the first and second part of your title tag.

Examples of well written title tags

<title>BBC - Eastenders - News</title>
<title>BBC - The Apprentice - Candidates</title>

<title>BBC - Five Live Football - FA Cup</title>
<title>BBC - Manchester Travel - Trains</title>
<title>BBC - GCSE Bitesize Chemistry - Patterns of Behaviour</title>
<title>BBC - London Radio - Vanessa Feltz : phone-in, current, news, topical</title>

Examples of title tags over 64 characters in length

When the search terms "minibeasts mr tumble" are typed into Google, the first search result shows the first 64 characters of the title tag only. The term "minibeasts" has been lost from the title.

Minibeasts Mr Tumble search result

When the search terms "weymouth stephen beer" are searched for using Google within the BBC site, the first 3 search results shows the first 64 characters of the title tags, in each case the term "stephen beer" has been lost from the title.

Weymouth Stephen Beer search result

Top of page

2 <meta> and <link> elements

2.1 HTML allows authors to express metadata within the document using the <meta> element, or outside the document by linking to the metadata with a <link> element. (Ref. 2)

2.2 Metadata is used by both internal and external search engines in indexing the contents of a page and is given extra weighting in determining relevant results to a search query. (Ref. 1)

2.3 Keywords

2.3.1 The keyword metadata element is used by the BBC's search engines when indexing the contents of a page. The keywords it contains are given extra weighting when determining relevant results to a search query.

2.3.2 Keywords in metadata have no significant positive effect on major external search engines. On the contrary, repeating or overusing key terms in metadata (sometimes called "stuffing" or "stacking") can negatively affect search rankings. (Ref. 1)

2.3.3 A specific set of keywords SHOULD only appear on the page that would be most relevant to a search query that included those terms.

2.3.4 Section and sub-section home pages MUST have a unique set of keywords specific to the section.

2.3.5 Content pages MUST NOT repeat the keywords of their parent section home page as this significantly reduces the effectiveness of indexing and the relevance of search results.

2.3.6 Content pages SHOULD have unique keywords specific to that particular page. If this is not practical, the content of the keywords tag SHOULD be left empty.

2.3.7 Keywords SHOULD be as specific as possible and SHOULD be in order of priority.

2.3.8 Individual keywords and phrases MUST be separated by a comma.

2.3.9 The total number of keywords SHOULD NOT exceed twenty words and individual keywords MUST NOT be repeated as this will be penalised by the major external search engines.

NOTE There is no need to include the keywords "BBC" or "British Broadcasting Corporation" in the HTML metadata for every page.

2.3.10 The following production rule MUST be followed for keyword metadata tags on all pages.

<meta name="keywords" content="[keywords]">

where:

[keywords] is a list of keywords and key phrases, separated by commas

Example

<meta name="keywords" content="Albert Square, location, plan">

2.4 Description

2.4.1 The description is used by both internal and external search engines in indexing the contents of a page and is given extra weighting in determining relevant results to a search query.

2.4.2 Every page MUST have a description.

2.4.3 The description should be an accurate, appealing summary of the contents of a particular page and MUST NOT exceed thirty words.

The description can be used by some search engines to present a summary of the page to the searcher. Generally speaking, if the search term exactly matches a word or phrase in the description, and the description appears to be a coherent sentence rather than a list of keywords, the description will be used. Google follows this rule strictly; the BBC's internal search engine usually does.

2.4.4 If the search term does not match a phrase in the description, the search engine will use its own algorithm to decide how to describe the page. This may be a short sentence in the page which contains the exact search term; it may be an summary generated from content in the page; or it may be a description of the page from the DMOZ project.

2.4.5 If necessary, pages MAY include meta tags which will tell the Google and MSN crawlers never to use the DMOZ description. (Ref. 13, 14)

2.4.6 Section and sub-section home pages MUST have a unique description specific to the section.

2.4.7 Content pages SHOULD have a unique description specific to that particular page. If this is not practical, the description MAY repeat that of the parent section.

2.4.8 The following production rule MUST be followed for descriptions on all pages:

<meta name="description" content="[description]">

where:

[description] is an accurate and appealing summary or abstract of the page contents.

Example

An example of a good description for the article here http://news.bbc.co.uk/1/hi/entertainment/tv_and_radio/2352193.stm is given below

<meta name="description" content="EastEnders actress Barbara Windsor is leaving the soap after eight years - but promises she will return in 2004.">

2.5 Dates

2.5.1 These elements are used by the BBC's own search engine to help order and present search results.

2.5.2 This standard uses metadata terms from the Dublin Core Metadata Initiative (Ref. 3) and follows their guidelines for encoding those terms in XHTML. (Ref. 4)

2.5.3 The use of standard metadata terms and encoding means that the information is useful to the widest range of processing applications. It is worth noting however that the major search engines infer a document's age from the date it was first encountered by their web crawlers (also calledspiders and robots), not from its metadata. (Ref. 1)

2.5.4 Created Date

2.5.4.1 The "DCTERMS.created" metadata property records the original publication date of the document.

2.5.4.2 Every page MUST have a "DCTERMS.created" metadata property.

2.5.4.3 Every page with a "DCTERMS.created" metadata property MUST also contain a link to the dcterms schema in the head of the document:

2.5.4.4 The date MUST follow the W3C Profile of ISO 8601 (Ref. 19) with a granularity of "Complete date:" or finer. Time zones may either be handled by expressing dates in UTC or expressing a time zone offset, as described in the W3C profile.

2.5.4.5 The following production rule MUST be followed for created dates on all pages:

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="[date]" />

where:

[date] is an ISO 8601-compliant date, i.e. yyyy-mm-dd

Examples
<head>
<title>BBC - Eastenders - News</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31" />
</head>

<head>
<title>BBC - Buffy - Angel</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31T08:49:37+00:00" />
</head>

2.5.5 Modified date

2.5.5.1 The "DCTERMS.modified" metadata property records the date the contents of the page were last changed.

2.5.5.2 Every page SHOULD have a "DCTERMS.modified" metadata property. If it is not practical to produce this information reliably, the element MUST be omitted.

2.5.5.3 The date MUST follow the W3C Profile of ISO 8601 (Ref. 19) with a granularity of "Complete date:" or finer. Time zones may either be handled by expressing dates in UTC or expressing a time zone offset, as described in the W3C profile.

2.5.5.4 The following production rule MUST be followed for modified dates on all pages:

<meta name="DCTERMS.modified" content="[date]" />
Examples
<head>
<title>BBC - Eastenders - News</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31" />
<meta name="DCTERMS.modified" content="2001-01-02" />
</head>

<head>
<title>BBC - Buffy - Angel</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31T08:49:37+00:00" />
<meta name="DCTERMS.modified" content="2000-01-02T017:32Z" />
</head>

2.5.6 Date of reference

2.5.6.1 If the contents of the page is strongly related to a particular date, it is possible to advertise this in the page metadata.

2.5.6.2 If HTML authors choose to do this, they MUST use the "temporal" metadata term from the Dublin Core standard (Ref. 3), with the "Period" encoding scheme (Ref. 15) following the encoding rules for XHTML. (Ref. 4)

Examples

A page about the Great Depression:

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="name=The Great Depression; start=1929; end=1939;" />

A page about events occurring on 28th November 1990:

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;" />

2.6. Copyright and license

2.6.1 If a page is copyrighted it MUST link to the copyright statement in the head of the document, using the standard W3C defined copyright 'link' element type.

2.6.2 The standard BBC terms and conditions are at http://www.bbc.co.uk/terms/.

2.6.3 If a page includes a link to the copyright or terms and conditions in the body of the page, the link SHOULD contain a rel="copyright" attribute.

2.6.4 Some documents are released under a particular license, such as the creative commons or creative archive license. Links to such license statements SHOULD include a rel="license" attribute. (Ref. 12) This helps search engines which search for documents by license terms.

Examples

Copyright link in head of document

<link rel="copyright" href="http://www.bbc.co.uk/terms/" />

Copyright link in body of document

<a rel="copyright" href="/go/homepage/int/foot/-/terms/" class="bo">© MMVI</a>

License link in body of document

<a rel="license" href="http://creativecommons.org/licenses/by-nd/2.0/" >Some rights reserved</a>

2.7 Authorship

2.7.1 If an article is attributable to an author with a byline, that information SHOULD be reflected in the HTML metadata.

2.7.2 The following production rule MUST be followed for inserting author details on pages:

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.creator" content="[author]" />

where:

[author] is the full name of the bylined author.

The author field SHOULD NOT contain authors' names if they do not receive a byline in the article. It SHOULD NOT be filled with a non-specific name such as "BBC" or "British Broadcasting Corporation".

Example

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.creator" content="Humphrey Hawksley" />

2.8 Other metadata

2.8.1 HTML authors are free to use any other metadata elements which suit their purposes.

2.8.2 If possible it is better to use standard elements rather than a BBC-specific scheme. HTML authors SHOULD at least check the Dublin Core standard to see if it already describes the metadata they wish to encode. (Ref. 3)

2.8.3 If the metadata is not part of the HTML specification, the document MUST link to the relevant profile which identifies the metadata. (Ref. 17) HTML authors MAY place an XMDP (Ref. 18) specification at the profile URI, but this is not necessary.

Examples

Linking to the Dublin Core metadata profile:

<head profile="http://dublincore.org/documents/dcq-html/">

A document which includes the hCard and hCalendar microformats: (Ref. 16)

<head profile="http://www.w3.org/2006/03/hcard http://microformats.org/wiki/hcalendar-profile">

Top of page

3 Excluding your pages from search engines

3.1 Robots (sometimes called "spiders" or "webcrawlers") are programs that traverse many pages in the Web by recursively retrieving linked pages. The most significant robots are those which trawl the web to create indexes for search engines.

3.2 In some cases it may be desirable to exclude particular pages from search engine indexes.

3.3 For example, a page SHOULD be excluded from search indexes in any of the following cases:

  • Any page without standard toolbar navigation, for instance a page that will appear in a pop-up window.
  • Any page that is not independently meaningful out of context, for instance a page presented in response to a form submission.
  • Any page that should not be accessible from a search, for instance the answers to a quiz.

3.4 The robots exclusion protocol (Ref. 8) is a method that allows Web site administrators to indicate which part of their sites should not be visited by robots. The most reliable and convenient way to exclude a site is to add it to the robots.txt file at www.bbc.co.uk/robots.txt. Contact the Assistant Producer for Search to add your site to this file.

3.5 Excluding pages from search indexes using the Robots <meta> tag

3.5.1 Sometimes it is more convenient to exclude individual pages by placing a robots <meta> tag in the HTML. (Ref. 11) All the large search engine robots support this tag (Ref. 9, 10), as does the BBC's search engine.

Example

Instruct robots not to index the page and not to follow any of the links on the page:

<meta name="robots" content="noindex,nofollow" />

Top of page

4 Pragma: no-cache

4.1 Occasionally HTML authors insert the following line into their documents in the mistaken belief that it will prevent the page being cached by browsers and proxies.

<meta http-equiv="Pragma" content="no-cache" />

4.2 The HTTP specification does not set any guidelines for Pragma response headers; instead, Pragma request headers (the headers that a browser sends to a server) are discussed. Although a few caches may honour this header, the majority won't, and it won't have any effect. (Ref. 5) Internet Explorer supports a special (non-standard) usage of the HTTP Pragma: no-cache response header for backwards compatibility with HTTP 1.0 servers, which only prevents caching when used over a secure connection. (Ref. 6)

4.3 In addition to the problems with the Pragma HTTP header, there is at least one known problem with the Pragma HTTP-EQUIV <meta> tag. (Ref. 7)

4.4 Therefore the Pragma http-equiv <meta> tag MUST NOT be used.

Top of page

5 References

Top of page

6 Examples

An interactive plan of Albert Square:

<head profile="http://dublincore.org/documents/dcq-html/">
<title>BBC - On This Day - 28th November - 1990: Tearful farewell from Iron Lady</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<link rel="copyright" href="http://www.bbc.co.uk/terms/" >
<meta name="keywords" content="28, 28th, November, 1990, politics and protest UK, politicians, history, news, archive, video, audio">
<meta name="description" content="Margaret Thatcher formally tenders her resignation to the Queen and leaves Downing Street for the last time.">
<meta name="DCTERMS.created" content="2002-12-07T08:14:38Z">
<meta name="DCTERMS.modified" content="2002-12-19T10:25:06Z">
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;">

An article about a political event:

<head profile="http://dublincore.org/documents/dcq-html/">
<title>BBC - On This Day - 28th November - 1990: Tearful farewell from Iron Lady</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<link rel="copyright" href="http://www.bbc.co.uk/terms/" >
<meta name="keywords" content="28, 28th, November, 1990, politics and protest UK, politicians, history, news, archive, video, audio">
<meta name="description" content="Margaret Thatcher formally tenders her resignation to the Queen and leaves Downing Street for the last time.">
<meta name="DCTERMS.created" content="2002-12-07T08:14:38Z">
<meta name="DCTERMS.modified" content="2002-12-19T10:25:06Z">
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;">

Top of page

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.