- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sat, 15 Apr 2006 23:47:27 +0300 (EEST)
- To: Matthias Mauch <matthias.mauch@aadmm.de>
- cc: www-html@w3.org
- Message-ID: <Pine.GSO.4.64.0604152330290.11009@korppi.cs.tut.fi>
On Sat, 15 Apr 2006, Matthias Mauch wrote: > If an XHTML document have following language attribute > > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> > > it's possible that this document uses english language. I couldn't say it any better - as a description of the real-world situation. In reality, the lang and xml:lang attributes are more or less just code decoration and technobabble. They are so often wrong that they cannot be trusted. That's why search engines have always ignored them. Weeks ago, I noticed that a page declares lang="sa" (which means Sanskrit) but actually contains Northern Sámi, which is of course a completely different language. I informed the page owners and got an autoreply that the message had been received and will be handled. No other reply, no fix, despite my explicit explanation of the situation and the way to correct it. It's a page by the Supreme Court of Finland. My conclusion: nobody there even understands the issue, or cares to consult anyone who does. I'm afraid language markup is a lost cause. It is in reality much more reliable to deduce the language from the actual content, heuristically. For short fragments of text, things might be different. But who wants to declare the language of each foreign name or other foreign word? Nobody, not even the W3C, despite claiming conformance to W3C WAI recommendations that say that _all_ changes in document language be marked up. So much for reality. We now return to a visit to fantasyland of language markup. > But if the > content uses english and german language there ist no language > attribute for multi language. The lang attribute is supposed to declare the _main_ language used inside an element (such as <html>), to be overridden for inner elements as needed. You could use lang="mul", but you are advised against it. The specifications are somewhat obscure, though. > At present the only way to do this > is to define in the DocType the first language The DocType has nothing to do with this. > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> > > and switch in the content with > > <span lang="de" xml:lang="den"> > > to the german part. You mean xml:lang="de", right? Anyway, that's what you are supposed to do. It expresses the same idea as you wish to express but much more clearly: instead of saying that the document is mixed language, or mixed English and German, you express the latter _and_ indicate exactly which parts are English and which parts are German. There's nothing fundamentally wrong with this idea, except that nobody really cares about language markup. > This should works with screenreaders. It works with a handful of specialized browsers. The problem is that the vast majority of pages don't do language markup, or do it _wrong_, so even the small number of people using those browsers don't benefit much. This in turn means that there's little motivation to authors to use language markup. > Is it > possible the add an attribute who defines a document as multi > language? You already have two attributes that could be used for that, lang and xml:lang. I guess you mean that they should be extended to allow values like lang="de,en", but that would really just add to the confusion. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Saturday, 15 April 2006 20:47:42 UTC