Re: XHTML: Suggestion to add a attribute for multi language documents

On Sat, 15 Apr 2006, Matthias Mauch wrote:

> If an XHTML document have following language attribute
>
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
>
> it's possible that this document uses english language.

I couldn't say it any better - as a description of the real-world 
situation. In reality, the lang and xml:lang attributes are more or less 
just code decoration and technobabble. They are so often wrong that they 
cannot be trusted. That's why search engines have always ignored them.

Weeks ago, I noticed that a page declares lang="sa" (which means Sanskrit) 
but actually contains Northern Sámi, which is of course a completely 
different language. I informed the page owners and got an autoreply that 
the message had been received and will be handled. No other reply, no fix, 
despite my explicit explanation of the situation and the way to correct 
it. It's a page by the Supreme Court of Finland. My conclusion: nobody 
there even understands the issue, or cares to consult anyone who does.

I'm afraid language markup is a lost cause. It is in reality much more 
reliable to deduce the language from the actual content, heuristically. 
For short fragments of text, things might be different. But who wants to 
declare the language of each foreign name or other foreign word? Nobody, 
not even the W3C, despite claiming conformance to W3C WAI recommendations 
that say that _all_ changes in document language be marked up.

So much for reality. We now return to a visit to fantasyland of language 
markup.

> But if the
> content uses english and german language there ist no language
> attribute for multi language.

The lang attribute is supposed to declare the _main_ language used inside 
an element (such as <html>), to be overridden for inner elements as 
needed.

You could use lang="mul", but you are advised against it. The 
specifications are somewhat obscure, though.

> At present the only way to do this
> is to define in the DocType the first language

The DocType has nothing to do with this.

> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
>
> and switch in the content with
>
> <span lang="de" xml:lang="den">
>
> to the german part.

You mean xml:lang="de", right? Anyway, that's what you are supposed to do. 
It expresses the same idea as you wish to express but much more clearly: 
instead of saying that the document is mixed language, or mixed English 
and German, you express the latter _and_ indicate exactly which parts are 
English and which parts are German. There's nothing fundamentally wrong 
with this idea, except that nobody really cares about language markup.

> This should works with screenreaders.

It works with a handful of specialized browsers. The problem is that the 
vast majority of pages don't do language markup, or do it _wrong_, so even 
the small number of people using those browsers don't benefit much. This 
in turn means that there's little motivation to authors to use language 
markup.

> Is it
> possible the add an attribute who defines a document as multi
> language?

You already have two attributes that could be used for that, lang and 
xml:lang. I guess you mean that they should be extended to allow values 
like lang="de,en", but that would really just add to the confusion.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Saturday, 15 April 2006 20:47:42 UTC