Introduction
This page provides a brief introduction to the use of HTML <meta> tags, which can be used to make searching more effective and search results more useful. That is achieved by providing additional information to search engines, helping them to find relevant documents and provide more informative descriptions in search results. The "robots" <meta> tag is covered separately - see Excluding search engines.
Search engines vary in how (or whether) they use information from meta tags. The advice here applies to the server providing the University's site-wide search facility and will also benefit searches using many of the Internet-wide search engines.
Comments and questions about the topics covered in this document can be sent to search-support@uis.cam.ac.uk
The description and keywords meta tags
When indexing an HTML page, search engines typically extract the initial part of the document (excluding HTML tags and other "extraneous" content) to form a summary which will give an indication of what the document is about. The summary and document title will be the reader's main indication of whether it is worth following the link to the document. The words found in the text as a whole contribute to the search engine's database and will be used to match documents to search requests. That is often sufficient, but in other cases it may be helpful to give additional information through meta tags, to provide a better summary or to list additional keywords for which the document would be relevant.
If the default summary will not give a clear indication of what the document is about and distinguish it from others (e.g. a collection of reports all starting with near-identical standard preamble) or there is little or no coherent descriptive text (e.g. a page containing primarily non-text content such as an graphical buttons linking to other documents), the description meta tag can be used to provide a summary.
The keywords meta tag can be used to provide supplementary words which will be checked against search terms and may help to locate documents that are relevant, even though their text does not include the words. Examples of when this may be useful include related technical terms that do not appear in the document but for which it would be a good match, or where variations in spelling (UK versus US English, etc.) or usage (e.g. course "prospectus" versus "catalog") could mean that documents would be missed by searches unless additional keywords are supplied.
Some search engines (including the University's site-wide search engine) give additional "weight" to text in the keywords and description meta tags as they should be especially descriptive of the page, leading to a higher "relevance ranking" in the search results. In consequence it may be useful for the keywords meta tag to include some words that also appear in the text - though scattered mentions in the main text may be just as effective. It's also important to note that excessive repetition of words may have the reverse effect - a very low relevance score - in order to counter abuse by sites using such tricks in order to appear in search results before their competitors.
There is no standardisation of how search engines will interpret the keywords meta tag, so there are no definitive guidelines. What's needed may depend (for example) on whether the search engine does case-sensitive matching. At the same time, the list of keywords should be kept reasonably short. Lowercase is likely to be best for normal words, but with uppercase where that is how the word would normally be seen (e.g. Cambridge). [When using the University's site-wide search engine, lowercase in search terms will also match uppercase in the documents, but uppercase in search terms will only match uppercase in the documents.]
All meta tags must be placed within the HEAD section of the web page, and the text within the meta tags must not itself contain any HTML tags. It is recommended that the text supplied by a description meta tag should be limited to at most 50 words and preferably fewer, since search engines vary in how much of text they will use. The same applies to keywords meta tags, and keywords should not be repeated except possibly to include variations in case - e.g. Cambridge and cambridge. Some search engines require distinct words and phrases in the keywords to be separated by comma, and doing so is likely to be harmless for those that do not need it.
Suppose your page contains:
<meta name="keywords" content="tertiary education, innovation, Cambridge University">
<meta name="description" content="The University of Cambridge is one of the oldest universities in the world and one of the largest in the United Kingdom. Its reputation for outstanding academic achievement is known world-wide and reflects the intellectual achievement of its students, as well as the world-class original research carried out by the staff of the University and the Colleges.">
the site search will do two things with these tags:
- It will index both fields as words, so a search on either Cambridge or "tertiary education" will match.
- It will show the "description" with the search results. Instead of showing the first few of lines of the page as the summary, it would be shown as:
University of Cambridge
The University of Cambridge is one of the oldest universities in the world and one of the largest in the United Kingdom. Its reputation for outstanding academic achievement is known world-wide and reflects the intellectual achievement of its students, as well as the world-class original research carried out by the staff of the University and the Colleges.
Dublin Core metadata
There is another form of metadata, known as Dublin Core (because it was produced as a result of a meeting in Dublin, Ohio), which is designed to give much more description of the information. Our site-wide search engine does collect Dublin Core metadata, although most search engines do not.
Unless you have special requirements, such as wanting Dublin Core information on your resources collected into a specific index or using RDF Dublin Core as on pages at http://www.direct.gov.uk/, it probably isn't worth the time and effort putting Dublin Core metadata into your files. Simple Dublin Core gives a standard 15 elements from the scheme, without further information, and should be enough. Should you want to find out more, a good starting point is the Dublin Core Metadata Initiative website at http://dublincore.org/.
The following pairs of "traditional" and Dublin Core meta tags are treated as synonyms by the University's site-wide search engine:
- date & dc.date
- publisher & dc.publisher
- description & dc.description
- keywords & dc.subject
so that, for example, a dc.description tag would have the same effect as a description tag.
Last updated: April 2021