W3C home > Mailing lists > Public > www-international@w3.org > October to December 2009

RE: what's the language of a document ?

From: Phillips, Addison <addison@amazon.com>
Date: Tue, 27 Oct 2009 11:03:00 -0400
To: Tex Texin <textexin@xencraft.com>, "'Ian Hickson'" <ian@hixie.ch>, "'John Cowan'" <cowan@ccil.org>
CC: "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <C7A5719F1E562149BA9171F58BEE2CA41298590727@EX-IAD6-B.ant.amazon.com>
Tex,

You have never been allowed to tag individual elements with more than one language tag. I think Hixie is saying is that, for any given span of text in the document, there can be exactly one language associated with it. When the language cannot be determined (perhaps due to conflicting information, such as a list), no language is applied. 

In the Internationalization WG's tutorial on tagging language in HTML, the lang attribute is called the "document processing language". The outer-most element in an HTML document is <html> and the language declared on that element is the default for the document. This does not make HTML monolingual.

The Content-Language header (and associated META tag), by contrast, can be used to declare the intended audience of a document. Certainly a document can serve more than one audience and be in more than one language.

I don't think I agree that the "default" value (when that language is not declared) ought to be the tag 'und', but I don't think that's what Hixie is saying. There is a subtle difference between "the language of this document has not been determined" and making the tag actually be 'und'. I hope that the default value remains the empty tag, not something else.

> 
> So if someone attempts to be specific and declares content-language
> to be "es-mx,es-ar" for mexico and argentina,
> or perhaps declares "en, en-us" then that information is thrown
> away in favor of unknown?

I would say "that information isn't artificially applied to specific elements in the document".

> 
> Also, does this change to the document default language impact just
> html behavior, or embedded scripting languages as well?

Actually, it's not a change: this has always been true.

Embedded scripting languages aren't "in a language" from the point of view of HTML. When they access the DOM tree, they can access the language tagging hierarchy like any other DOM processor (although this isn't always convenient).

> 
> If there were code that checks for language and performs different
> actions based on languages in the document, that is affected as
> well?

Code where?

Presumably code processing a document would process spans of text, not the entire document all at once.

> 
> Why does the default need to be monolingual?

Because that's how xml:lang and lang work. Besides, they aren't "monolingual" per-se. They are "one language tag for a given context", with nesting. If one wishes to mark up say a French word in an English sentence, use a <span> (or other element) to do it:

   <p lang="en">In no sense is this sentence in two languages, even thought it 
                contains the word <q lang="fr">raclette</q></p>.

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.




Received on Tuesday, 27 October 2009 15:03:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 15:03:42 GMT