W3C home > Mailing lists > Public > www-international@w3.org > July to September 2008

Re: meta content-language

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Mon, 25 Aug 2008 17:49:58 +0900
Message-Id: <6.0.0.20.2.20080825120759.03121180@localhost>
To: "Mark Davis" <mark.davis@icu-project.org>, "Julian Reschke" <julian.reschke@gmx.de>
Cc: "Leif Halvard Silli" <lhs@malform.no>, "Ian Hickson" <ian@hixie.ch>, "HTML WG" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>

At 23:04 08/08/22, Mark Davis wrote:
>I'm kinda lost in this thread so far.

This may be due to the fact that you don't seem to be too
familliar with existing practice and history.


>It seems to me the questions at had are:
>
>1. Distinction in Language. Should there be a distinction in interpretation between the language set via lang attribute and meta content?
>
><html lang="foo">
>and
><meta http-equiv="Content-Language" content="foo"/>
>
>My take is that any such distinction would be a departure from current practice, and too fine a distinction for the vast majority of people to be able to follow.

Such a distinction IS current practice. The former can only
contain one language, the later can contain a priority list.
Also, the former is used on the browser side or by editing tools,
whereas the later is used by the server side (see e.g. the
examples that Roy gave).

As for "too fine a distinction for the vast majority of people
to be able to follow", the people that we need to follow this
distinction are Web page/contents creators for pages with
multilingual content.

The distinction is clearly given at
http://www.w3.org/International/tutorials/language-decl/#Slide0060.
If you think this is too difficult, and can be improved upon,
please tell us why/how.


>2. Language Inheritance. If there are conflicting languages, what should win? (or in other words, what's the inheritance?)
>
>(HTTP) Content-Language: lang1
><meta http-equiv="Content-Language" content="lang2"/>
><html lang="lang4" xml:lang="lang3">
><p lang="lang5">

[please note that <meta> comes after <html> in an HTML document]


>My take is that HTML5 has it right, that the winner/inheritance should be in the above order: lang5 wins over lang4 over lang3 over lang2 over lang1.

What HTML5 currently says may make some sense if argued ab initio.
Based on existing standards and practice, ignoring lang2 for
language-oriented is well justified because it is wide practice.


>3. Language Values. Should the value of any of these fields be a single language tag or also allow a priority list (both as defined by BCP47)? 
>
>Note that it can be zero (""), which is equivalent to "und" (Unknown language) in BCP 47.
>
>Here I think we'd be somewhat better off if the value could be a priority list, eg "de, fr, en". For example, if the html lang value were "de, fr, en", that would mean that there wasn't any substantial amount of linguistic content other than these three, and that the relationship was de >= fr >= en. Due to the ordering, if you had software that could only handle a single language, then de would be that value.
>
>Documents may contain a mixture of languages, and allowing them to be tagged at a high level with a priority list would allow people to reflect that reality without having to tag each and every element with the right language. Software can make use of that information, for example, in ranking the document with respect to the language of search queries. With a search query in "fr", a document with html lang of "de, fr" could be treated differently than if it just had "de".
>
>However, that may be too big a departure from current practice.

As you say in a followup post, HTTP Content-Language and <meta
(because it is equivalent to HTTP Content-Language) take a language
priority list, but the lang and xml:lang attributes don't.

My take is that this is as it should be: Documents are often enough
multilingual that it would be a bad idea to ignore this case.

On the other hand, individual document pieces can at some level be
identified as being in one (or no) language. Allowing multiple
languages for document pieces would only bring very, very limited
benefits at significantly higher costs (even if we could design
HTML and XML anew and would not have to consider the existing base).

There are multiple possible semantics for multiple languages
(I'm using the attribute name multilang to not confuse people):
- Alternative, unclear (e.g. <span multilang='en, fr'>cat</span>)
- Alternative, both (e.g. <span multilang='en, fr'>excellent</span>;
  sure there are better examples)
- Summary (e.g. <p multilang='en, fr'>He said "Oui"</p>

Obviously, having all of these doesn't help much for applications,
and having only one of these eliminates the others. Probably the
last one is what most people might expect, but it isn't really
necessary assuming that the markup is reasonably designed, i.e.
we can say <p lang='en'>He said "<span lang='fr'>Oui</span>"</p>.

And given that most XML applications (e.g. XSLT) have difficulties
to handle even simple language information correctly, it doesn't
seem a good idea to bother applications with something more
complicated.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Monday, 25 August 2008 08:53:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:18 GMT