W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > October to December 2002

RE: Multiple languages in the lang attribute

From: Jukka Korpela <jukka.korpela@tieke.fi>
Date: Thu, 21 Nov 2002 16:10:00 +0200
Message-ID: <621574AE86FAD3118D1D0000E22138A95BDF65@TIEKE1>
To: w3c-wai-ig@w3.org

Stephen Garcia wrote:

> It is my understanding that only one language can be specified in the
> lang attribute of any tag.

That is correct. The prose descriptions are somewhat vague and dated (e.g.,
HTML 4.01 specification refers to RFC 1766, which has been superseded), but
the formal syntax makes it clear that only one code is permitted: the HTML
lang attribute is declared as taking a NAME value and the xml:lang attribute
as taking NMTOKEN value.

Since lang attributes can be used with arbitrary granularity, introducing
new elements (like <span>) if needed, there would be little point in having
several language codes in one lang attribute. Instead of saying that
something is English and Spanish, you can say that an element has content in
English and use lang="es" for those inner elements that have content in
Spanish. The lang attribute only specifies the dominant or basic language of
the content.

(Or is that so clear? Could multiple languages be used to indicate, say,
'this is in a language A or B, I don't know which', or to indicate 'this
word is originally in language A but partially adapted to language B as a
loanword'? Anyway, currently a single language is permitted.)

> But in the following example several
> languages may be specified in the http response:
> 
> <meta http-equiv="Content-Language" content="en-gb, es" />

That's actually not an HTTP response but an "emulated" (or, dare I say,
"fake") response - originally such constructs were meant to be potentially
handled by _servers_, which would use them to send actual HTTP headers, but
this really never caught fire; instead, some browsers started peeking at
such <meta> tags and behaving, in part, _as if_ a server had sent actual
HTTP header.

Anyway, by the HTTP protocol,
Content-Language: en-gb, es
would say that the document is intended for people who know British English
_or_ Spain. Typically, for a document that contains the same information in
both languages. This implies in practice that those languages are used in
the content, but it implies something much more specific too.

Ref.: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.12

> as a solution of having known mixtures of languages, we could allow
> multiple languages to be specified in the lang attribute (even though
> this currently suggested by the standards):
> 
> <html xml:lang="en-gb, es" lang="en-gb, es">
> 
> should then both the british english and spanish dictionaries 
> be loaded to interpret the page?

An interesting idea.

How would such words be handled that could be either English or Spanish?
There are surely many such words. What about words that do not appear in any
dictionary, such as proper names or neologisms?

Maybe the order could be significant, a priority order.

-- 
Jukka Korpela, senior adviser 
TIEKE Finnish Information Society Development Centre
http://www.tieke.fi/
Diffuse Business Guide to Web Accessibility and Design for All:
http://www.diffuse.org/accessibility.html
Received on Thursday, 21 November 2002 09:10:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:14:07 GMT