1.1. The HTML metadata standard specifies requirements and recommendations for embedding metadata - information about a document - in all new HTML pages published on BBC online (bbc.co.uk).
1.2. The provision of accurate metadata in HTML is essential to the success of any site. Metadata significantly improves search results and helps users find relevant material.
1.3. Absent, inconsistent or inappropriate metadata is a barrier to users.
1.4. Careful consideration should be given to metadata early in theHTML authoring process to ensure that there is a consistent scheme for generating it appropriately.
1.5.1. The title of a page appears in the title bar of a browser and is intended to be readable by users.
1.5.2. The title is used by both internal and external search engines to index the contents of a page and is given extra weighting in determining relevant results to a search query. (Ref. 1)
1.5.3. The title is displayed in search results and is an important way for users to assess the relevance of a page.
1.5.4. Every page MUST have a unique title. The title should be concise and descriptive.
[site][section][page title]BBC - [site] [section] - [title]
<title>BBC - Eastenders - News</title>
<title>BBC - Buffy - Angel</title>
<title>BBC - Five Live Football - FA Cup</title>
<title>BBC - Manchester Travel - Trains</title>
<title>BBC - GCSE Bitesize Chemistry - Patterns of Behaviour</title>
<meta> and <link> elements2.1. HTML lets HTML authors express metadata within the document, with the <meta> element, or outside the document, by linking to the metadata with a <link> element. (Ref. 2)
2.2. Metadata is used by both internal and external search engines in indexing the contents of a page and is given extra weighting in determining relevant results to a search query. (Ref. 1)
2.3.1. The keyword metadata element is used by the BBC's search engines when indexing the contents of a page. The keywords it contains are given extra weighting when determining relevant results to a search query.
2.3.2. Keywords in metadata have no significant positive effect on major external search engines. On the contrary, repeating or overusing key terms in metadata (sometimes called "stuffing" or "stacking") may negatively affect search rankings. (Ref. 1)
2.3.3. A specific set of keywords SHOULD only appear on the page that would be most relevant to a search query that included those terms.
2.3.4. Section and sub-section home pages MUST have a unique set of keywords specific to the section.
2.3.5. Content pages MUST NOT repeat the keywords of their parent section home page as this significantly reduces the effectiveness of indexing and the relevance of search results.
2.3.6. Content pages SHOULD have unique keywords specific to that particular page. If this is not practical, the content of the keywords tag should be left empty.
2.3.7. Keywords SHOULD be as specific as possible and SHOULD be in order of priority. Individual words and phrases MUST be separated by a comma. The total number of keywords SHOULD NOT exceed twenty words and individual keywords MUST NOT be repeated as this will be penalised by the major external search engines.
2.3.8. There is no need to include the keywords "BBC" or "British Broadcasting Corporation" in the HTML metadata for every page.
<meta name="keywords" content="[keywords]">
<meta name="keywords" content="Albert Square, location, plan">
2.4.1. The description is used by both internal and external search engines in indexing the contents of a page and is given extra weighting in determining relevant results to a search query.
2.4.2. Every page MUST have a description.
2.4.3. The description may be used by search engines to present a summary of the page to the searcher. Generally speaking, if the search term exactly matches a word or phrase in the description, and the description appears to be a coherent sentence rather than a list of keywords, the description will be used. Google follows this rule strictly; the BBC's internal search engine usually does. Therefore the description should be an accurate, appealing summary of the contents of a particular page and MUST NOT exceed thirty words.
2.4.4. If the search term does not match a phrase in the description, the search engine will use its own algorithm to decide how to describe the page. This may be a short sentence in the page which contains the exact search term; it may be an summary generated from content in the page; or it may be a description of the page from the DMOZ project.
2.4.5. (If necessary, pages may include meta tags which will tell the Google and MSN crawlers never to use the DMOZ description.) (Ref. 13, 14)
2.4.6. Section and sub-section home pages MUST have a unique description specific to the section.
2.4.7. Content pages SHOULD have a unique description specific to that particular page. If this is not practical, the description may repeat that of the parent section.
<meta name="description" content="[description]">
http://news.bbc.co.uk/1/hi/entertainment/tv_and_radio/2352193.stm
<meta name="description" content="EastEnders actress Barbara Windsor is leaving the soap after eight years - but promises she will return in 2004.">
2.5.1. These elements are used by the BBC's own search engine to help order and present search results.
2.5.2. This standard uses metadata terms from the Dublin Core Metadata Initiative (Ref. 3) and follows their guidelines for encoding those terms in HTML. (Ref. 4)
2.5.3. The use of standard metadata terms and encoding means that the information is useful to the widest range of processing applications. It is worth noting however that the major search engines infer a document's age from the date it was first encountered by their spiders, not from its metadata. (Ref. 1)
2.5.4.1. The "DCTERMS.created" metadata property records the original publication date of the document.
2.5.4.2. Every page MUST have a "DCTERMS.created" metadata property.
2.5.4.3. Every page with a "DCTERMS.created" metadata property MUST also contain a link to the dcterms schema in the head of the document:
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="[date]" />
<head>
<title>BBC - Eastenders - News</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31" />
</head>
<head>
<title>BBC - Buffy - Angel</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31T08:49:37+00:00" />
</head>
2.5.5.1. The "DCTERMS.modified" metadata property records the date the contents of the page were last changed.
2.5.5.2. Every page SHOULD have a "DCTERMS.modified" metadata property. If it is not practical to produce this information reliably, the element MUST be omitted.
<meta name="DCTERMS.modified" content="[date]" />
<head>
<title>BBC - Eastenders - News</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31" />
<meta name="DCTERMS.modified" content="2001-01-02" />
</head>
<head>
<title>BBC - Buffy - Angel</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.created" content="2000-12-31T08:49:37+00:00" />
<meta name="DCTERMS.modified" content="2000-01-02T017:32Z" />
</head>
2.5.6.1. If the contents of the page is strongly related to a particular date, it is possible to advertise this in the page metadata.
2.5.6.2. If HTML authors choose to do this, they MUST use the "temporal" metadata term from the Dublin Core standard (Ref. 3), with the "Period" encoding scheme (Ref. 15) following the encoding rules for HTML. (Ref. 4)
A page about the Great Depression:
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="name=The Great Depression; start=1929; end=1939;" />
A page about events occurring on 28th November 1990:
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;" />
2.6.1. If a page is copyrighted it MUST link to the copyright statement in the head of the document, using the standard W3C defined copyright 'link' element type.
2.6.2. The standard BBC terms and conditions are at http://www.bbc.co.uk/terms/.
2.6.3. If a page includes a link to the copyright or terms and conditions in the body of the page, the link SHOULD contain a rel="copyright" attribute.
2.6.4. Some documents may be released under a particular license, such as the creative commons or creative archive license. Links to such license statements SHOULD include a rel="license" attribute. (Ref. 12) This helps search engines which search for documents by license terms.
Copyright link in head of document
<link rel="copyright" href="http://www.bbc.co.uk/terms/" />
Copyright link in body of document
<a rel="copyright" href="/go/homepage/int/foot/-/terms/" class="bo">© MMVI</a>
License link in body of document
<a rel="license" href="http://creativecommons.org/licenses/by-nd/2.0/" >Some rights reserved</a>
2.7.1. If an article is attributable to an author with a byline, that information SHOULD be reflected in the HTML metadata.
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.creator" content="[author]" />
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.creator" content="Humphrey Hawksley" />
2.8.1. HTML authors are free to use any other metadata elements which suit their purposes.
2.8.2. If possible it is better to use standard elements rather than a BBC-specific scheme. HTML authors SHOULD at least check the Dublin Core standard to see if it already describes the metadata they wish to encode. (Ref. 3)
2.8.3. If the metadata is not part of the XHTML specification, the document MUST link to the relevant profile which identifies the metadata. (Ref. 17) HTML authors MAY place an XMDP (Ref. 18) specification at the profile URI, but this is not necessary.
Linking to the Dublin Core metadata profile:
<head profile="http://dublincore.org/documents/dcq-html/">
A document which includes the hCard and hCalendar microformats: (Ref. 16)
<head profile="http://www.w3.org/2006/03/hcard http://microformats.org/wiki/hcalendar-profile">
3.1. Robots (sometimes called "spiders") are programs that traverse many pages in the Web by recursively retrieving linked pages. The most significant robots are those which trawl the web to create indexes for search engines.
3.2. In some cases it may be desirable to exclude particular pages from search engine indexes.
3.3. For example, a page SHOULD be excluded from search indexes in any of the following cases:
3.4. The robots exclusion protocol (Ref. 8) is a method that allows Web site administrators to indicate which part of their sites should not be visited by robots. The most reliable and convenient way to exclude a site is to add it to the robots.txt file at www.bbc.co.uk/robots.txt. Contact the Assistant Producer for Search to add your site to this file.
<meta> tag3.5.1. Sometimes it is more convenient to exclude individual pages by placing a robots <meta> tag in the HTML. (Ref. 11) All the large search engine robots support this tag (Ref. 9, 10), as does the BBC's search engine.
Instruct robots not to index the page and not to follow any of the links on the page:
<meta name="robots" content="noindex,nofollow" />
4.1. Occasionally HTML authors insert the following line into their documents in the mistaken belief that it will prevent the page being cached by browsers and proxies.
<meta http-equiv="Pragma" content="no-cache" />
4.2. The HTTP specification does not set any guidelines for Pragma response headers; instead, Pragma request headers (the headers that a browser sends to a server) are discussed. Although a few caches may honour this header, the majority won't, and it won't have any effect. (Ref. 5) Internet Explorer supports a special (non-standard) usage of the HTTP Pragma: no-cache response header for backwards compatibility with HTTP 1.0 servers, which only prevents caching when used over a secure connection. (Ref. 6)
4.3. In addition to the problems with the Pragma HTTP header, there is at least one known problem with the Pragma HTTP-EQUIV <meta> tag. (Ref. 7)
4.4. Therefore the Pragma http-equiv <meta> tag MUST NOT be used.
<meta> tagrel="license" microformatAn interactive plan of Albert Square:
<head profile="http://dublincore.org/documents/dcq-html/">
<title>BBC - On This Day - 28th November - 1990: Tearful farewell from Iron Lady</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<link rel="copyright" href="http://www.bbc.co.uk/terms/" >
<meta name="keywords" content="28, 28th, November, 1990, politics and protest UK, politicians, history, news, archive, video, audio">
<meta name="description" content="Margaret Thatcher formally tenders her resignation to the Queen and leaves Downing Street for the last time.">
<meta name="DCTERMS.created" content="2002-12-07T08:14:38Z">
<meta name="DCTERMS.modified" content="2002-12-19T10:25:06Z">
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;">
An article about a political event:
<head profile="http://dublincore.org/documents/dcq-html/">
<title>BBC - On This Day - 28th November - 1990: Tearful farewell from Iron Lady</title>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<link rel="copyright" href="http://www.bbc.co.uk/terms/" >
<meta name="keywords" content="28, 28th, November, 1990, politics and protest UK, politicians, history, news, archive, video, audio">
<meta name="description" content="Margaret Thatcher formally tenders her resignation to the Queen and leaves Downing Street for the last time.">
<meta name="DCTERMS.created" content="2002-12-07T08:14:38Z">
<meta name="DCTERMS.modified" content="2002-12-19T10:25:06Z">
<meta name="DCTERMS.temporal" scheme="DCTERMS.Period" content="start=1990-11-28; end=1990-11-28;">
| Date | Version | Change | Author |
|---|---|---|---|
| 18/11/2008 | v1.0.2 | Minor revisions around usage of term bbc.co.uk. | Victoria Jolliffe |
| 30/09/2008 | v1.0.1 | Revised to include capitalisation of DC and DCTERMS, and to replace 'dc.author' with 'DC.creator'. | Kevin Hinde |
| 04/12/2006 | v1.0 | Completely revised, incorporating proposals from Working Group | Kevin Hinde |
| 01/02/2002 | v0.7 | Endorsed by Web Standards group | |
| 07/01/2002 | v0.6 | Revision of title tag section from AS | |
| 01/11/2001 | v0.5 | Removed version field | |
| 01/11/2001 | v0.4 | Revisions to date metadata | |
| 31/10/2001 | v0.3 | Minor revisions after comments from AS | |
| 27/10/2001 | v0.2 | Substantially redrafted and reformatted | William Cooper |
| 16/10/2001 | v0.1 | Initial draft | Anne Smith |
Document editor: Editor, Standards & Guidelines. If you have any comments, questions or requests relating to this document, please contact the Editor, Standards & Guidelines.
Like all other Future Media Standards & Guidelines, this page is updated on a regular basis, through the process described on About Standards & Guidelines.