Re: FAQ: Why should I use the 'lang' attribute? from Najib Tounsi on 2004-06-15 (public-i18n-geo@w3.org from June 2004)

From: Najib Tounsi <ntounsi@emi.ac.ma>
Date: Tue, 15 Jun 2004 19:17:17 +0000
To: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
Cc: GEO <public-i18n-geo@w3.org>
Message-ID: <40CF4B3D.50201@emi.ac.ma>
Hi Deborah,

Thanks for your FAQ. Really good. Please find here some small comments that I hope helpful.

Best Regards,

Najib

-- 
Najib TOUNSI (mailto:tounsi@w3.org)
Bureau W3C au Maroc (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
Phone : +212 (0) 37 68 71 74  Fax : +212 (0) 37 77 88 53
Mobile: +212 (0) 61 22 00 30


Deborah Cawkwell wrote:

>Hi All
>
>In feedback about my last attempt at this FAQ, the group suggest I make a strong initial argument. I hope I've done that (but I'm sure there will be some input - which of course I welcome).
>
>I have to admit that I've not really worked (more) on the 'applications' part... So I welcome any text fragments that I could incorporate tomorrow night (from 19:00 BST/GMT+0100), so that on our Wednesday teleconference, we might get somewhere near publishing this. 
>
>Best regards to all & thanks (to any contributors)
>
>Deborah
>
>-------------------------------------
>
>QUESTION
>
>Why should I use the 'lang' attribute?
>
>
>ANWSWER
>
>Overview
>
>The 'lang' attribute contains information about the 'natural' language of content. 
>
>A 'natural' ('human') language is a language with which people communicate with one another such as Arabic or Brazilian Portuguese. This stands in comparison to an 'artificial' language, such as C or Perl, with which people communicate with machines.
>
>It is useful to identify the language of content and to make that language information 'semantically' available, so that it can serve people's needs better. For example, when searching for information, it is useful to narrow that search to the languages that the searcher can understand. In addition, it may be desirable to display different natural languages in ways known by users of those languages, for example, quotation marks have different written representations in different languages.
>
>The 'lang' attribute serves to uniquely identify the 'language of content'. Other means of identifying that language of content, such as 'character encoding', do not uniquely identify the natural language and may change over time. Currently, natural language could be identified by 'character encoding'. However, that character encoding does not uniquely identify a natural language. One character encoding can be used for multiple natural languages, eg, Latin 1 (iso-8859-1) can encode both French and English. In addition, the character encoding can vary over a single language, eg, Arabic could be encoded with 'windows-1256' or 'iso-8859-6' or 'utf-8' (or another Unicode encoding).
>  
>
However, windows-1256 and iso-8859-6 are ASCII extensions and UTF-8 is 
Unicode. It is not relevant, but may be you should emphasis the Unicode 
fact? ;-)

>Unicode - which can encode all languages - is likely to become the dominant encoding form, bacause it can resolve many problems.  Therefore, character encoding will cease to have any use at all for identifying natural language(s) of web content. An additional problem is that character encoding may be specified in different places: in the http header and/or in a metatag, where that encoding relates to the whole page (forms can be an exception). 
>
>The more pages that are correctly marked up with appropriate semantic language information, the more applications will emerge to harness it, to deliver information relevant to people in the languages they understand.
>
>
>Implementation
>
>The 'lang' attribute can be applied to the HTML container of the whole web page, ie, the HTML element, or to individual HTML elements (span, div, td, p, etc) when the language varies from that specified as the 'primary' language. [What will happen when people use multiple languages as a matter of course - with Unicode, I think this is inevitable?] 
>
>The 'lang' attribute of an HTML element is specified slightly differently in HTML and XML, eg: 
>
><html lang="en" xml:lang="en"...
>
>lang='en' = HTML markup
>xml:lang='en' = X(HT)ML markup
>
>When using XHTML both syntaxes should be used.
>  
>
Always?

>
>Application
>
>Accessibility
>The 'lang' attribute assists speech synthesizers and Braille translators; it is required by the W3C Web Accessibility Initiative (WAI) and enforced governmental policies in some countries, eg, UK - Disability Discrimination Act (UK) [other countries? contact WAI? and/or specifically request this information from users - useful way to get people more involved?] 
>
>Page rendering
>CSS2 uses the 'lang' attribute powerfully as a pseudo class.  (http://www.w3.org/International/questions/qa-css-lang.html).
>Unfortunately it doesn't work in IE yet. [Clarify scope - changes with versions and operating systems - in order to keep FAQs up-to-date - refer to tests & results from tests]  But the concept of language specific styling is a very powerful one. [Need to add some examples.]
>  
>
Yes, the lang pseudo class is useful. As an example, you might want to 
use different font size depending on the language.

<style type="text/css">
:lang(ar)   {
    font-family: Traditional Arabic, serif;
    font-size: 125%;
}

:lang(fr)   {
    font-family:arial;
    font-size: 100%;
}
</style>

(see http://www.w3c.org.ma/Tests/lang.html for a view test)

In bilingual pages, I often  increase my arabic characters size with the 
"Traditional Arabic" font. This font looks very small in front of other 
latin  characters.

>Search
>A common use for meta is to specify keywords that a search engine may use to improve the quality of search results. When several meta elements provide language-dependent information about a document, search engines may filter on the xml:lang attribute to display search results using the language preferences of the user. (http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html)
>XML
>The 'xml:lang' attribute is the standard way to identify language information in XML. [Information about tasks]
>cf Google
>
>Processing
>eg XSLT
>
>
>USEFUL LINKS
>
>FAQ: HTTP and meta language information - http://www.w3.org/International/questions/qa-http-and-lang.html
>[Will check following - from previous]
>HTML 4.01 Specification W3C Recommendation 24 December 1999: http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.3.
>XHTML 2.0 W3C Working Draft 5 August 2002 http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html
>Web Accessbility Initiative: lang attribute - http://www.w3.org/TR/WCAG10/#gl-abbreviated-and-foreign
>Tutorial: Language markup in XHTML and CSS (DRAFT): http://www.w3.org/International/tutorials/tutorial-lang.html
>Authoring Techniques for XHTML & HTML Internationalization: Specifying the language of content 1.0 - http://www.w3.org/International/geo/html-tech/tech-lang.html
>FAQ: Styling using the lang attribute: http://www.w3.org/International/questions/qa-css-lang.html
>FAQ: Two-letter or three-letter language codes: http://www.w3.org/International/questions/qa-lang-2or3.html
>From the usability perspective: http://diveintoaccessibility.org/day_7_identifying_your_language.html
>An interesting view on Google usage across cultures:
>http://www.google.com/press/zeitgeist2003.html
>http://www.google.com/press/zeitgeist.html
>
>
>
>http://www.bbc.co.uk/ - World Wide Wonderland
>
>This e-mail (and any attachments) is confidential and may contain
>personal views which are not the views of the BBC unless specifically
>stated.
>If you have received it in error, please delete it from your system. 
>Do not use, copy or disclose the information in any way nor act in
>reliance on it and notify the sender immediately. Please note that the
>BBC monitors e-mails sent or received. 
>Further communication will signify your consent to this.
>  
>
Received on Tuesday, 15 June 2004 15:15:14 UTC