FAQ: Why should I use the 'lang' attribute? from Deborah Cawkwell on 2004-06-14 (public-i18n-geo@w3.org from June 2004)

From: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
Date: Mon, 14 Jun 2004 23:20:46 +0100
To: "GEO" <public-i18n-geo@w3.org>
Message-ID: <418B7E44473AC34488C9E730D09FF3CF027F8C40@bbcxue204.national.core.bbc.co.uk>

Hi All

In feedback about my last attempt at this FAQ, the group suggest I make a strong initial argument. I hope I've done that (but I'm sure there will be some input - which of course I welcome).

I have to admit that I've not really worked (more) on the 'applications' part... So I welcome any text fragments that I could incorporate tomorrow night (from 19:00 BST/GMT+0100), so that on our Wednesday teleconference, we might get somewhere near publishing this.

Best regards to all & thanks (to any contributors)

Deborah

-------------------------------------

QUESTION

Why should I use the 'lang' attribute?

ANWSWER

Overview

The 'lang' attribute contains information about the 'natural' language of content.

A 'natural' ('human') language is a language with which people communicate with one another such as Arabic or Brazilian Portuguese. This stands in comparison to an 'artificial' language, such as C or Perl, with which people communicate with machines.

It is useful to identify the language of content and to make that language information 'semantically' available, so that it can serve people's needs better. For example, when searching for information, it is useful to narrow that search to the languages that the searcher can understand. In addition, it may be desirable to display different natural languages in ways known by users of those languages, for example, quotation marks have different written representations in different languages.

The 'lang' attribute serves to uniquely identify the 'language of content'. Other means of identifying that language of content, such as 'character encoding', do not uniquely identify the natural language and may change over time. Currently, natural language could be identified by 'character encoding'. However, that character encoding does not uniquely identify a natural language. One character encoding can be used for multiple natural languages, eg, Latin 1 (iso-8859-1) can encode both French and English. In addition, the character encoding can vary over a single language, eg, Arabic could be encoded with 'windows-1256' or 'iso-8859-6' or 'utf-8' (or another Unicode encoding).

Unicode - which can encode all languages - is likely to become the dominant encoding form, bacause it can resolve many problems. Therefore, character encoding will cease to have any use at all for identifying natural language(s) of web content. An additional problem is that character encoding may be specified in different places: in the http header and/or in a metatag, where that encoding relates to the whole page (forms can be an exception).

The more pages that are correctly marked up with appropriate semantic language information, the more applications will emerge to harness it, to deliver information relevant to people in the languages they understand.

Implementation

The 'lang' attribute can be applied to the HTML container of the whole web page, ie, the HTML element, or to individual HTML elements (span, div, td, p, etc) when the language varies from that specified as the 'primary' language. [What will happen when people use multiple languages as a matter of course - with Unicode, I think this is inevitable?]

The 'lang' attribute of an HTML element is specified slightly differently in HTML and XML, eg:

<html lang="en" xml:lang="en"...

lang='en' = HTML markup
xml:lang='en' = X(HT)ML markup

When using XHTML both syntaxes should be used.

Application

Accessibility
The 'lang' attribute assists speech synthesizers and Braille translators; it is required by the W3C Web Accessibility Initiative (WAI) and enforced governmental policies in some countries, eg, UK - Disability Discrimination Act (UK) [other countries? contact WAI? and/or specifically request this information from users - useful way to get people more involved?]

Page rendering
CSS2 uses the 'lang' attribute powerfully as a pseudo class. (http://www.w3.org/International/questions/qa-css-lang.html).
Unfortunately it doesn't work in IE yet. [Clarify scope - changes with versions and operating systems - in order to keep FAQs up-to-date - refer to tests & results from tests] But the concept of language specific styling is a very powerful one. [Need to add some examples.]

Search
A common use for meta is to specify keywords that a search engine may use to improve the quality of search results. When several meta elements provide language-dependent information about a document, search engines may filter on the xml:lang attribute to display search results using the language preferences of the user. (http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html)
XML
The 'xml:lang' attribute is the standard way to identify language information in XML. [Information about tasks]
cf Google

Processing
eg XSLT

USEFUL LINKS

FAQ: HTTP and meta language information - http://www.w3.org/International/questions/qa-http-and-lang.html

[Will check following - from previous]
HTML 4.01 Specification W3C Recommendation 24 December 1999: http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.3.

XHTML 2.0 W3C Working Draft 5 August 2002 http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html

Web Accessbility Initiative: lang attribute - http://www.w3.org/TR/WCAG10/#gl-abbreviated-and-foreign

Tutorial: Language markup in XHTML and CSS (DRAFT): http://www.w3.org/International/tutorials/tutorial-lang.html

Authoring Techniques for XHTML & HTML Internationalization: Specifying the language of content 1.0 - http://www.w3.org/International/geo/html-tech/tech-lang.html

FAQ: Styling using the lang attribute: http://www.w3.org/International/questions/qa-css-lang.html

FAQ: Two-letter or three-letter language codes: http://www.w3.org/International/questions/qa-lang-2or3.html

From the usability perspective: http://diveintoaccessibility.org/day_7_identifying_your_language.html

An interesting view on Google usage across cultures:
http://www.google.com/press/zeitgeist2003.html

http://www.google.com/press/zeitgeist.html

http://www.bbc.co.uk/ - World Wide Wonderland

This e-mail (and any attachments) is confidential and may contain
personal views which are not the views of the BBC unless specifically
stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in
reliance on it and notify the sender immediately. Please note that the
BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Received on Monday, 14 June 2004 18:20:48 UTC