RE: Language tags on root

On Thu, 27 Jan 2005, Misha Wolf wrote:

> I like the idea of specifying the primary language on the root,
> and then specifying other languages as they occur in the tree.

That's a very natural approach, so natural that it's difficult to suggest
any alternative. Even in a bilingual dictionary, one of the languages can
usually be classified as primary (by its use in explanations, for
example). If two (or more) languages are used in a completely balanced
way, one just needs to pick up one of them.

> The code for multiple languages is "mul":
>    http://www.loc.gov/standards/iso639-2/englangn.html#mn

That might be suitable for characterizing documents in some situations,
but the language of an HTML document can be described in much more detail.

> The appropriate way to provide a list of languages is to use
> the appropriate HTTP header and/or the <meta> element.  HTML
> allows:
>
>    <meta http-equiv="Content-Language" content="fr, de, en">

HTML per se allows <meta http-equiv="foo" content="bar"> as well.

But the confusion around Content-Language is interesting. Some authoring
tools seem to generate meta tags with it, instead of generating lang or
xml:lang attributes. But what's it for? If you read its description in the
HTTP protocol carefully, you'll notice a difference. It does not actually
specify the content language. Instead, it specifies the language of the
intended audience. It's difficult to say what the difference is in
practice, but they are two different things.

Moreover, Content-Language: fr, de, en would mean (by the protocol) that
the document is intended for people who know French or German or English.
Most naturally, this would mean that the document contains the same
content in all of those languages. More liberally, one might say that it
is sufficient that _some_ of the content is understandable to people who
know French, etc.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Thursday, 27 January 2005 16:50:12 UTC