- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Mon, 25 Aug 2008 17:49:58 +0900
- To: "Mark Davis" <mark.davis@icu-project.org>, "Julian Reschke" <julian.reschke@gmx.de>
- Cc: "Leif Halvard Silli" <lhs@malform.no>, "Ian Hickson" <ian@hixie.ch>, "HTML WG" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>
At 23:04 08/08/22, Mark Davis wrote: >I'm kinda lost in this thread so far. This may be due to the fact that you don't seem to be too familliar with existing practice and history. >It seems to me the questions at had are: > >1. Distinction in Language. Should there be a distinction in interpretation between the language set via lang attribute and meta content? > ><html lang="foo"> >and ><meta http-equiv="Content-Language" content="foo"/> > >My take is that any such distinction would be a departure from current practice, and too fine a distinction for the vast majority of people to be able to follow. Such a distinction IS current practice. The former can only contain one language, the later can contain a priority list. Also, the former is used on the browser side or by editing tools, whereas the later is used by the server side (see e.g. the examples that Roy gave). As for "too fine a distinction for the vast majority of people to be able to follow", the people that we need to follow this distinction are Web page/contents creators for pages with multilingual content. The distinction is clearly given at http://www.w3.org/International/tutorials/language-decl/#Slide0060. If you think this is too difficult, and can be improved upon, please tell us why/how. >2. Language Inheritance. If there are conflicting languages, what should win? (or in other words, what's the inheritance?) > >(HTTP) Content-Language: lang1 ><meta http-equiv="Content-Language" content="lang2"/> ><html lang="lang4" xml:lang="lang3"> ><p lang="lang5"> [please note that <meta> comes after <html> in an HTML document] >My take is that HTML5 has it right, that the winner/inheritance should be in the above order: lang5 wins over lang4 over lang3 over lang2 over lang1. What HTML5 currently says may make some sense if argued ab initio. Based on existing standards and practice, ignoring lang2 for language-oriented is well justified because it is wide practice. >3. Language Values. Should the value of any of these fields be a single language tag or also allow a priority list (both as defined by BCP47)? > >Note that it can be zero (""), which is equivalent to "und" (Unknown language) in BCP 47. > >Here I think we'd be somewhat better off if the value could be a priority list, eg "de, fr, en". For example, if the html lang value were "de, fr, en", that would mean that there wasn't any substantial amount of linguistic content other than these three, and that the relationship was de >= fr >= en. Due to the ordering, if you had software that could only handle a single language, then de would be that value. > >Documents may contain a mixture of languages, and allowing them to be tagged at a high level with a priority list would allow people to reflect that reality without having to tag each and every element with the right language. Software can make use of that information, for example, in ranking the document with respect to the language of search queries. With a search query in "fr", a document with html lang of "de, fr" could be treated differently than if it just had "de". > >However, that may be too big a departure from current practice. As you say in a followup post, HTTP Content-Language and <meta (because it is equivalent to HTTP Content-Language) take a language priority list, but the lang and xml:lang attributes don't. My take is that this is as it should be: Documents are often enough multilingual that it would be a bad idea to ignore this case. On the other hand, individual document pieces can at some level be identified as being in one (or no) language. Allowing multiple languages for document pieces would only bring very, very limited benefits at significantly higher costs (even if we could design HTML and XML anew and would not have to consider the existing base). There are multiple possible semantics for multiple languages (I'm using the attribute name multilang to not confuse people): - Alternative, unclear (e.g. <span multilang='en, fr'>cat</span>) - Alternative, both (e.g. <span multilang='en, fr'>excellent</span>; sure there are better examples) - Summary (e.g. <p multilang='en, fr'>He said "Oui"</p> Obviously, having all of these doesn't help much for applications, and having only one of these eliminates the others. Probably the last one is what most people might expect, but it isn't really necessary assuming that the markup is reasonably designed, i.e. we can say <p lang='en'>He said "<span lang='fr'>Oui</span>"</p>. And given that most XML applications (e.g. XSLT) have difficulties to handle even simple language information correctly, it doesn't seem a good idea to bother applications with something more complicated. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Monday, 25 August 2008 08:53:18 UTC