W3C home > Mailing lists > Public > public-i18n-geo@w3.org > June 2004

Why should I use the language attribute?

From: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
Date: Fri, 18 Jun 2004 00:31:27 +0100
Message-ID: <418B7E44473AC34488C9E730D09FF3CF027F8C79@bbcxue204.national.core.bbc.co.uk>
To: <public-i18n-geo@w3.org>
Why should I use the language attribute in web pages?

The language attribute unambiguously specifies the 'natural language' of web page content.
HTML, XHTML and XML vary in the way the language attribute is specified. See http://www.w3.org/International/O-HTML-tags.html link
Applications exist that can use natural language metadata about content to deliver users the most relevent information based on the language preferences of end users. The more content that is tagged and tagged correctly, the more useful and pervasive such applications will become. Metadata that indicates content language can assist many audiences for a particular page or section of a page. For example, authoring tools can supply appropriate spelling and grammer checking based on the language of a segment. Translation tools can use the tags to help recognize sections of text in a particular language. Search engines can group or filter results based on the user's preferences. And user-agents can (and do) use the content language to select language-appropriate fonts, which improves the overall user experience of the page.
The language attribute should always be used to indicate the primary language of the web page (in the main page container element). If the language changes within the main page container element this should be reflected in a sub container element, eg, span, div, td, p, etc.

The 'lang' attribute assists speech synthesizers and Braille translators; it is required by the W3C Web Accessibility
Initiative (WAI) and enforced governmental policies in some countries, eg, UK - Disability Discrimination Act (UK) 
Page rendering
CSS2 uses the language attribute powerfully as a pseudo class (http://www.w3.org/International/questions/qa-css-lang.html). For example, you might want to use different font size depending on the language:
<style type="text/css">
:lang(ar)   {
    font-family: Traditional Arabic, serif;
    font-size: 125%;
:lang(fr)   {
    font-size: 100%;
Currently, this does not work in Microsoft Internet Explorer.
A common use for meta is to specify keywords that a search engine may use to improve the quality of search results. When several meta elements provide language-dependent information about a document, search engines may filter on the xml:lang attribute to display search results using the language preferences of the user.
The 'xml:lang' attribute is the standard way to identify language information in XML. [Information about tasks]
cf Google

You might think information about natural language could be inferred from the character encoding. However, character encoding does not enable unabiguous identification of a natural language: there *must* be a 1:1 mapping between encoding and language for this inference to work... and there isn't one. For example, a single character encoding could be used for many languages, eg, Latin 1 (iso-8859-1) could encode both French and English, as well as a great many other languages. In addition, the character encoding can vary over a single language, eg, Arabic could be encoded with 'windows-1256' or 'iso-8859-6' or 'utf-8' (or another Unicode encoding). 
FAQ: HTTP and meta language information - http://www.w3.org/International/questions/qa-http-and-lang.html

Character encoding - http://www.w3.org/TR/2004/WD-charmod-20040225/#sec-Digital

XML 1.0 - http://www.w3.org/TR/REC-xml/#sec-lang-tag

[Will check following - from previous]
HTML 4.01 Specification W3C Recommendation 24 December 1999: http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.3.

XHTML 2.0 W3C Working Draft 5 August 2002 http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html

Web Accessbility Initiative: lang attribute - http://www.w3.org/TR/WCAG10/#gl-abbreviated-and-foreign

Tutorial: Language markup in XHTML and CSS (DRAFT): http://www.w3.org/International/tutorials/tutorial-lang.html

Authoring Techniques for XHTML & HTML Internationalization: Specifying the language of content 1.0 - http://www.w3.org/International/geo/html-tech/tech-lang.html

FAQ: Styling using the lang attribute: http://www.w3.org/International/questions/qa-css-lang.html

FAQ: Two-letter or three-letter language codes: http://www.w3.org/International/questions/qa-lang-2or3.html

From the usability perspective: http://diveintoaccessibility.org/day_7_identifying_your_language.html

An interesting view on Google usage across cultures:



http://www.bbc.co.uk/ - World Wide Wonderland

This e-mail (and any attachments) is confidential and may contain
personal views which are not the views of the BBC unless specifically
If you have received it in error, please delete it from your system. 
Do not use, copy or disclose the information in any way nor act in
reliance on it and notify the sender immediately. Please note that the
BBC monitors e-mails sent or received. 
Further communication will signify your consent to this.
Received on Thursday, 17 June 2004 19:31:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:01 UTC