- From: Richard Ishida <ishida@w3.org>
- Date: Wed, 16 Jun 2004 13:28:02 +0100
- To: "'Deborah Cawkwell'" <deborah.cawkwell@bbc.co.uk>, "'GEO'" <public-i18n-geo@w3.org>
Hi Deborah, I agree with all of Addison's excellent notes. My notes below... ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ W3C Internationalization: http://www.w3.org/International/ > -----Original Message----- > From: public-i18n-geo-request@w3.org > [mailto:public-i18n-geo-request@w3.org] On Behalf Of Deborah Cawkwell > Sent: 14 June 2004 23:21 > To: GEO > Subject: FAQ: Why should I use the 'lang' attribute? > > Hi All > > In feedback about my last attempt at this FAQ, the group > suggest I make a strong initial argument. I hope I've done > that (but I'm sure there will be some input - which of course > I welcome). > > I have to admit that I've not really worked (more) on the > 'applications' part... So I welcome any text fragments that I > could incorporate tomorrow night (from 19:00 BST/GMT+0100), > so that on our Wednesday teleconference, we might get > somewhere near publishing this. > > Best regards to all & thanks (to any contributors) > > Deborah > > ------------------------------------- > > QUESTION > > Why should I use the 'lang' attribute? I feel like we should limit this to '... in HTML', or widen it to wording that would include xml:lang. Dunno. At least we should say very early that xml:lang is relevant too. > > > ANWSWER > > Overview > > The 'lang' attribute contains information about the 'natural' > language of content. Mention xml:lang here. > > A 'natural' ('human') language is a language with which > people communicate with one another such as Arabic or > Brazilian Portuguese. This stands in comparison to an > 'artificial' language, such as C or Perl, with which people > communicate with machines. > > It is useful to identify the language of content and to make > that language information 'semantically' available, so that > it can serve people's needs better. For example, when > searching for information, it is useful to narrow that search > to the languages that the searcher can understand. In > addition, it may be desirable to display different natural > languages in ways known by users of those languages, for > example, quotation marks have different written > representations in different languages. > There are some issues with the next two paragraphs that I think Addison described well. I also see the stuff related to character encoding as somewhat tangential to the main argument, so I think it should either appear under a subheading, or possibly even as a note in the margin. > The 'lang' attribute serves to uniquely identify the > 'language of content'. >Other means of identifying that > language of content, such as 'character encoding', do not > uniquely identify the natural language and may change over > time. Currently, natural language could be identified by > 'character encoding'. However, that character encoding does > not uniquely identify a natural language. One character > encoding can be used for multiple natural languages, eg, > Latin 1 (iso-8859-1) can encode both French and English. In > addition, the character encoding can vary over a single > language, eg, Arabic could be encoded with 'windows-1256' or > 'iso-8859-6' or 'utf-8' (or another Unicode encoding). > > Unicode - which can encode all languages - is likely to > become the dominant encoding form, bacause it can resolve > many problems. Therefore, character encoding will cease to > have any use at all for identifying natural language(s) of > web content. An additional problem is that character encoding > may be specified in different places: in the http header > and/or in a metatag, where that encoding relates to the whole > page (forms can be an exception). I really like Addison's proposed text for the next paragraph, but his text sums up much of what you have in the answer so far (excluding the character related stuff). > > The more pages that are correctly marked up with appropriate > semantic language information, the more applications will > emerge to harness it, to deliver information relevant to > people in the languages they understand. > > > Implementation > > The 'lang' attribute can be applied to the HTML container of > the whole web page, ie, the HTML element, or to individual > HTML elements (span, div, td, p, etc) when the language > varies from that specified as the 'primary' language. I think this is useful information, but your next question worries me... >[What > will happen when people use multiple languages as a matter of > course - with Unicode, I think this is inevitable?] Nothing changes. Unicode documents have a primary language just like documents in any other encoding. I think you are letting yourself be confused by the idea that character encodings express language. They don't. > > The 'lang' attribute of an HTML element is specified slightly > differently in HTML and XML, eg: > > <html lang="en" xml:lang="en"... > > lang='en' = HTML markup > xml:lang='en' = X(HT)ML markup > > When using XHTML both syntaxes should be used. The implementation detail is a little more complex than you describe it here because lang should not be used in xhtml 1.1 (nor XML). I would prefer to see a pointer to the language declaration tutorial and the relevant techniques doc, rather than an attempt to re-state how to do it. This FAQ is about *why* one should do it, not how. As I said before, I think one should allude to the fact that xml:lang may be required in addition to / in place of lang. But I think we should do so in the very first para of the answer. > > > Application Applications? > > Accessibility > The 'lang' attribute assists speech synthesizers and Braille > translators; it is required by the W3C Web Accessibility > Initiative (WAI) and enforced governmental policies in some > countries, eg, UK - Disability Discrimination Act (UK) > [other > countries? contact WAI? and/or specifically request this > information from users - useful way to get people more involved?] You could look through http://www.w3.org/WAI/Policy/ but I think your example of the UK is sufficient to make your point. > > Page rendering > CSS2 uses the 'lang' attribute powerfully as a pseudo class. > (http://www.w3.org/International/questions/qa-css-lang.html). > Unfortunately it doesn't work in IE yet. [Clarify scope - > changes with versions and operating systems - in order to > keep FAQs up-to-date - refer to tests & results from tests] > But the concept of language specific styling is a very > powerful one. [Need to add some examples.] > > Search > A common use for meta I think you talking about the meta element specifically, rather than meta information in general, so you should say so. > is to specify keywords that a search > engine may use to improve the quality of search results. When > several meta elements provide language-dependent information > about a document, search engines may filter on the xml:lang > attribute to display search results using the language > preferences of the user. > (http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html) Language information expressed with the lang attribute might also be useful for searching. I don't know much about this area, but folks from the information science community at the Unicode conference seemed to be requesting that the lang (and xml:lang) attributes be fully deployed to help their searches. > XML > The 'xml:lang' attribute is the standard way to identify > language information in XML. [Information about tasks] cf Google Not sure how this is relevant here. > > Processing > eg XSLT You could also mention that this is/will be useful for spellchecking document during authoring. > > > USEFUL LINKS > > FAQ: HTTP and meta language information - > http://www.w3.org/International/questions/qa-http-and-lang.html > [Will check following - from previous] > HTML 4.01 Specification W3C Recommendation 24 December 1999: > http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.3. > XHTML 2.0 W3C Working Draft 5 August 2002 > http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html > Web Accessbility Initiative: lang attribute - > http://www.w3.org/TR/WCAG10/#gl-abbreviated-and-foreign > Tutorial: Language markup in XHTML and CSS (DRAFT): > http://www.w3.org/International/tutorials/tutorial-lang.html > Authoring Techniques for XHTML & HTML Internationalization: > Specifying the language of content 1.0 - > http://www.w3.org/International/geo/html-tech/tech-lang.html > FAQ: Styling using the lang attribute: > http://www.w3.org/International/questions/qa-css-lang.html > FAQ: Two-letter or three-letter language codes: > http://www.w3.org/International/questions/qa-lang-2or3.html > From the usability perspective: > http://diveintoaccessibility.org/day_7_identifying_your_language.html > An interesting view on Google usage across cultures: > http://www.google.com/press/zeitgeist2003.html > http://www.google.com/press/zeitgeist.html Note that I've started to link to the topic index for general pointers to additional information on a topic. See for example the right hand column of http://www.w3.org/International/questions/qa-lang-priorities.html Of course you can still link to specific articles of particular interest. Hope that helps. RI > > > > http://www.bbc.co.uk/ - World Wide Wonderland > > This e-mail (and any attachments) is confidential and may > contain personal views which are not the views of the BBC > unless specifically stated. > If you have received it in error, please delete it from your system. > Do not use, copy or disclose the information in any way nor > act in reliance on it and notify the sender immediately. > Please note that the BBC monitors e-mails sent or received. > Further communication will signify your consent to this. >
Received on Wednesday, 16 June 2004 08:28:09 UTC