W3C home > Mailing lists > Public > public-html@w3.org > August 2008

RE: meta content-language

From: CE Whitehead <cewcathar@hotmail.com>
Date: Tue, 26 Aug 2008 12:12:17 -0400
Message-ID: <BLU109-W29A865DCFC55613A0029AEB3660@phx.gbl>
To: HTML WG <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>

Hi, I'm sure I'm not following something because of limited computer time; but I think it is as it should be to have certain tags used to indicate text processing needs and certain tags used to indicate the target audience.  I think Ishida's "Internationalization Best Practices" suggests that if developers follow the recommendation to tag pages appropriately- using the tags indicating text processing (html lang=' ' or xml lang=' ') and the tags indicating the audience (http content header; meta content), then applications will start to make use of these tags:
 
"The more content is tagged and tagged correctly, the more useful and pervasive such applications [that can use info about the natural language] will become" (http://www.w3.org/TR/i18n-html-tech-lang/).
Mark is right that authors may not necessarily had tag every run of text appropriately but software, such as MS Word, does this very well!(Hand tagging my page --De La Salle's journal http://teacherweb.com/Fl/Cocoa/CEWhitehead/HTMLPage15.stm#notesvocabulary -- was the biggest pain and I'm sure I left off a few tags!) Still it would be lovely to have a multilang lang='en, fr' tag for the server in the case of a page which was designed to say teach French to English speakers or vice versa (English to French) where the content was in two languages, or for a page aimed at bilinguals in say Spanish and English where for whatever reason content was in both languages (a bilingual reading assessment or something that needed to be in both).This is for a page where the content is for speakers who absolutely have some sort of grasp of both languages and where a priority list would not make sense.I'm not sure how the user asks to get a page come up that is in two languages though. In the interim the tag <meta content language="mul, en, fr"> or something is o.k. with me (but I think I recall from discussions long ago that a lot of people did not like having content tagged "mul" for almost any purpose).  --C. E. Whiteheadcewcathar@hotmail.com  --C. E. Whiteheadcewcathar@hotmail.com Mark Davis wrote:>I think it is a hopeless task to distinguish between these semantics. It is hard enough > to get people to say that a document is in Japanese (correctly), let alone to get them> to make the very fine point that this document is in Japanese, but intended for > French readers. That is far, far too fine a distinction for people to reliably make, > even such knowledgeable people as "Web page/contents creators for pages > with multilingual content". The article has the feeling of trying to retrofit an > existing situation in making a distinction that frankly, is an exceedingly small> percentage of even multilingual contents, and will (I predict) never be followed > with any reliability.. . . > What percentage of multilingual documents actually go do the trouble of marking each > and every language run? Take a wild guess, and we'll see how accurate you are.> The author can write this into a description of the page (where the author has access > to the top of the page; you do not have access to the top of the page for all pages > that are created with online editors/programs; your page headers are truncated and > the body of your page is pasted into the page they create) The author can write the information about who the page is targeting into a description of the page (where the author has access to the top of the page; you do not have access to the top of the page for all pages that are created with online editors/programs; your page headers are truncated and the body of your page is pasted into the page they create).> From: duerst@it.aoyama.ac.jp>> > At 23:04 08/08/22, Mark Davis wrote:> >I'm kinda lost in this thread so far.> > This may be due to the fact that you don't seem to be too> familliar with existing practice and history.> > > >It seems to me the questions at had are:> >> >1. Distinction in Language. Should there be a distinction in interpretation between the language set via lang attribute and meta content?> >> ><html lang="foo">> >and> ><meta http-equiv="Content-Language" content="foo"/>> >> >My take is that any such distinction would be a departure from current practice, and too fine a distinction for the vast majority of people to be able to follow.> > Such a distinction IS current practice. The former can only> contain one language, the later can contain a priority list.> Also, the former is used on the browser side or by editing tools,> whereas the later is used by the server side (see e.g. the> examples that Roy gave).> > As for "too fine a distinction for the vast majority of people> to be able to follow", the people that we need to follow this> distinction are Web page/contents creators for pages with> multilingual content.> > The distinction is clearly given at> http://www.w3.org/International/tutorials/language-decl/#Slide0060.> If you think this is too difficult, and can be improved upon,> please tell us why/how.> >I think it is fine to have some tagging reserved for indicating text processing needs and some reserved for indicating the audience.  I see no problem with this distinction.> >2. Language Inheritance. If there are conflicting languages, what should win? (or in other words, what's the inheritance?)> >> >(HTTP) Content-Language: lang1> ><meta http-equiv="Content-Language" content="lang2"/>> ><html lang="lang4" xml:lang="lang3">> ><p lang="lang5">> > [please note that <meta> comes after <html> in an HTML document]> > > >My take is that HTML5 has it right, that the winner/inheritance should be in the above order: lang5 wins over lang4 over lang3 over lang2 over lang1.> > What HTML5 currently says may make some sense if argued ab initio.> Based on existing standards and practice, ignoring lang2 for> language-oriented is well justified because it is wide practice.> > > >3. Language Values. Should the value of any of these fields be a single language tag or also allow a priority list (both as defined by BCP47)? > >> >Note that it can be zero (""), which is equivalent to "und" (Unknown language) in BCP 47.> >> >Here I think we'd be somewhat better off if the value could be a priority list, eg "de, fr, en". For example, if the html lang value were "de, fr, en", that would mean that there wasn't any substantial amount of linguistic content other than these three, and that the relationship was de >= fr >= en. Due to the ordering, if you had software that could only handle a single language, then de would be that value.> >> >Documents may contain a mixture of languages, and allowing them to be tagged at a high level with a priority list would allow people to reflect that reality without having to tag each and every element with the right language. Software can make use of that information, for example, in ranking the document with respect to the language of search queries. With a search query in "fr", a document with html lang of "de, fr" could be treated differently than if it just had "de".> >> >However, that may be too big a departure from current practice.> > As you say in a followup post, HTTP Content-Language and <meta> (because it is equivalent to HTTP Content-Language) take a language> priority list, but the lang and xml:lang attributes don't.> > My take is that this is as it should be: Documents are often enough> multilingual that it would be a bad idea to ignore this case.> > On the other hand, individual document pieces can at some level be> identified as being in one (or no) language. Allowing multiple> languages for document pieces would only bring very, very limited> benefits at significantly higher costs (even if we could design> HTML and XML anew and would not have to consider the existing base).> > There are multiple possible semantics for multiple languages> (I'm using the attribute name multilang to not confuse people):> - Alternative, unclear (e.g. <span multilang='en, fr'>cat</span>)> - Alternative, both (e.g. <span multilang='en, fr'>excellent</span>;> sure there are better examples)> - Summary (e.g. <p multilang='en, fr'>He said "Oui"</p>> I see no sense in  having this for individual runs of text, as: <p lang='en'>He said "<span lang='fr'>Oui</span>"</p>would do the trick--as M. Duerst has noted. But it would be lovely to have this kind of a tag for the server in the case of a page which was designed to say teach French to English speakers or vice versa (English to French) where the content was in two languages, or for a page aimed at bilinguals in say Spanish and English where for whatever reason content was in both languages (a bilingual reading assessment or something that needed to be in both).This would be for a page with about equal content in both languages where a priority list would not make sense.I'm not sure how the user asks to get a page come up that is in two languages though. In the interim the tag <meta content language="mul, en, fr"> or something is o.k. with me (but I think I recall from discussions long ago that a lot of people did not like having content tagged "mul" for almost any purpose).  --C. E. Whiteheadcewcathar@hotmail.com > Obviously, having all of these doesn't help much for applications,> and having only one of these eliminates the others. Probably the> last one is what most people might expect, but it isn't really> necessary assuming that the markup is reasonably designed, i.e.> we can say <p lang='en'>He said "<span lang='fr'>Oui</span>"</p>.> > And given that most XML applications (e.g. XSLT) have difficulties> to handle even simple language information correctly, it doesn't> seem a good idea to bother applications with something more> complicated.> > Regards, Martin.> > > #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University> #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp > > 
Received on Tuesday, 26 August 2008 16:13:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:22 GMT