- From: Richard Ishida <ishida@w3.org>
- Date: Wed, 16 Jun 2004 13:28:02 +0100
- To: "'Deborah Cawkwell'" <deborah.cawkwell@bbc.co.uk>, "'GEO'" <public-i18n-geo@w3.org>
Hi Deborah,
I agree with all of Addison's excellent notes.
My notes below...
============
Richard Ishida
W3C
contact info:
http://www.w3.org/People/Ishida/
W3C Internationalization:
http://www.w3.org/International/
> -----Original Message-----
> From: public-i18n-geo-request@w3.org
> [mailto:public-i18n-geo-request@w3.org] On Behalf Of Deborah Cawkwell
> Sent: 14 June 2004 23:21
> To: GEO
> Subject: FAQ: Why should I use the 'lang' attribute?
>
> Hi All
>
> In feedback about my last attempt at this FAQ, the group
> suggest I make a strong initial argument. I hope I've done
> that (but I'm sure there will be some input - which of course
> I welcome).
>
> I have to admit that I've not really worked (more) on the
> 'applications' part... So I welcome any text fragments that I
> could incorporate tomorrow night (from 19:00 BST/GMT+0100),
> so that on our Wednesday teleconference, we might get
> somewhere near publishing this.
>
> Best regards to all & thanks (to any contributors)
>
> Deborah
>
> -------------------------------------
>
> QUESTION
>
> Why should I use the 'lang' attribute?
I feel like we should limit this to '... in HTML', or widen it to wording that would include xml:lang. Dunno. At least we should say very early that xml:lang is relevant too.
>
>
> ANWSWER
>
> Overview
>
> The 'lang' attribute contains information about the 'natural'
> language of content.
Mention xml:lang here.
>
> A 'natural' ('human') language is a language with which
> people communicate with one another such as Arabic or
> Brazilian Portuguese. This stands in comparison to an
> 'artificial' language, such as C or Perl, with which people
> communicate with machines.
>
> It is useful to identify the language of content and to make
> that language information 'semantically' available, so that
> it can serve people's needs better. For example, when
> searching for information, it is useful to narrow that search
> to the languages that the searcher can understand. In
> addition, it may be desirable to display different natural
> languages in ways known by users of those languages, for
> example, quotation marks have different written
> representations in different languages.
>
There are some issues with the next two paragraphs that I think Addison described well. I also see the stuff related to character encoding as somewhat tangential to the main argument, so I think it should either appear under a subheading, or possibly even as a note in the margin.
> The 'lang' attribute serves to uniquely identify the
> 'language of content'.
>Other means of identifying that
> language of content, such as 'character encoding', do not
> uniquely identify the natural language and may change over
> time. Currently, natural language could be identified by
> 'character encoding'. However, that character encoding does
> not uniquely identify a natural language. One character
> encoding can be used for multiple natural languages, eg,
> Latin 1 (iso-8859-1) can encode both French and English. In
> addition, the character encoding can vary over a single
> language, eg, Arabic could be encoded with 'windows-1256' or
> 'iso-8859-6' or 'utf-8' (or another Unicode encoding).
>
> Unicode - which can encode all languages - is likely to
> become the dominant encoding form, bacause it can resolve
> many problems. Therefore, character encoding will cease to
> have any use at all for identifying natural language(s) of
> web content. An additional problem is that character encoding
> may be specified in different places: in the http header
> and/or in a metatag, where that encoding relates to the whole
> page (forms can be an exception).
I really like Addison's proposed text for the next paragraph, but his text sums up much of what you have in the answer so far (excluding the character related stuff).
>
> The more pages that are correctly marked up with appropriate
> semantic language information, the more applications will
> emerge to harness it, to deliver information relevant to
> people in the languages they understand.
>
>
> Implementation
>
> The 'lang' attribute can be applied to the HTML container of
> the whole web page, ie, the HTML element, or to individual
> HTML elements (span, div, td, p, etc) when the language
> varies from that specified as the 'primary' language.
I think this is useful information, but your next question worries me...
>[What
> will happen when people use multiple languages as a matter of
> course - with Unicode, I think this is inevitable?]
Nothing changes. Unicode documents have a primary language just like documents in any other encoding. I think you are letting yourself be confused by the idea that character encodings express language. They don't.
>
> The 'lang' attribute of an HTML element is specified slightly
> differently in HTML and XML, eg:
>
> <html lang="en" xml:lang="en"...
>
> lang='en' = HTML markup
> xml:lang='en' = X(HT)ML markup
>
> When using XHTML both syntaxes should be used.
The implementation detail is a little more complex than you describe it here because lang should not be used in xhtml 1.1 (nor XML). I would prefer to see a pointer to the language declaration tutorial and the relevant techniques doc, rather than an attempt to re-state how to do it. This FAQ is about *why* one should do it, not how.
As I said before, I think one should allude to the fact that xml:lang may be required in addition to / in place of lang. But I think we should do so in the very first para of the answer.
>
>
> Application
Applications?
>
> Accessibility
> The 'lang' attribute assists speech synthesizers and Braille
> translators; it is required by the W3C Web Accessibility
> Initiative (WAI) and enforced governmental policies in some
> countries, eg, UK - Disability Discrimination Act (UK)
> [other
> countries? contact WAI? and/or specifically request this
> information from users - useful way to get people more involved?]
You could look through http://www.w3.org/WAI/Policy/ but I think your example of the UK is sufficient to make your point.
>
> Page rendering
> CSS2 uses the 'lang' attribute powerfully as a pseudo class.
> (http://www.w3.org/International/questions/qa-css-lang.html).
> Unfortunately it doesn't work in IE yet. [Clarify scope -
> changes with versions and operating systems - in order to
> keep FAQs up-to-date - refer to tests & results from tests]
> But the concept of language specific styling is a very
> powerful one. [Need to add some examples.]
>
> Search
> A common use for meta
I think you talking about the meta element specifically, rather than meta information in general, so you should say so.
> is to specify keywords that a search
> engine may use to improve the quality of search results. When
> several meta elements provide language-dependent information
> about a document, search engines may filter on the xml:lang
> attribute to display search results using the language
> preferences of the user.
> (http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html)
Language information expressed with the lang attribute might also be useful for searching. I don't know much about this area, but folks from the information science community at the Unicode conference seemed to be requesting that the lang (and xml:lang) attributes be fully deployed to help their searches.
> XML
> The 'xml:lang' attribute is the standard way to identify
> language information in XML. [Information about tasks] cf Google
Not sure how this is relevant here.
>
> Processing
> eg XSLT
You could also mention that this is/will be useful for spellchecking document during authoring.
>
>
> USEFUL LINKS
>
> FAQ: HTTP and meta language information -
> http://www.w3.org/International/questions/qa-http-and-lang.html
> [Will check following - from previous]
> HTML 4.01 Specification W3C Recommendation 24 December 1999:
> http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.3.
> XHTML 2.0 W3C Working Draft 5 August 2002
> http://www.w3.org/TR/2002/WD-xhtml2-20020805/mod-meta.html
> Web Accessbility Initiative: lang attribute -
> http://www.w3.org/TR/WCAG10/#gl-abbreviated-and-foreign
> Tutorial: Language markup in XHTML and CSS (DRAFT):
> http://www.w3.org/International/tutorials/tutorial-lang.html
> Authoring Techniques for XHTML & HTML Internationalization:
> Specifying the language of content 1.0 -
> http://www.w3.org/International/geo/html-tech/tech-lang.html
> FAQ: Styling using the lang attribute:
> http://www.w3.org/International/questions/qa-css-lang.html
> FAQ: Two-letter or three-letter language codes:
> http://www.w3.org/International/questions/qa-lang-2or3.html
> From the usability perspective:
> http://diveintoaccessibility.org/day_7_identifying_your_language.html
> An interesting view on Google usage across cultures:
> http://www.google.com/press/zeitgeist2003.html
> http://www.google.com/press/zeitgeist.html
Note that I've started to link to the topic index for general pointers to additional information on a topic. See for example the right hand column of http://www.w3.org/International/questions/qa-lang-priorities.html
Of course you can still link to specific articles of particular interest.
Hope that helps.
RI
>
>
>
> http://www.bbc.co.uk/ - World Wide Wonderland
>
> This e-mail (and any attachments) is confidential and may
> contain personal views which are not the views of the BBC
> unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor
> act in reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
Received on Wednesday, 16 June 2004 08:28:09 UTC