W3C home > Mailing lists > Public > www-international@w3.org > January to March 2010

RE: ISSUE-88 / Re: what's the language of a document ?

From: CE Whitehead <cewcathar@hotmail.com>
Date: Sat, 13 Mar 2010 16:45:55 -0500
Message-ID: <SNT142-w2073B267D1ED1133850628B3300@phx.gbl>
To: <xn--mlform-iua@xn--mlform-iua.no>, <addison@amazon.com>
CC: <www-international@w3.org>, <public-html@w3.org>, <ishida@w3.org>, <ian@hixie.ch>

Hi.  My comments primarily to Leif are below!

> Date: Fri, 12 Mar 2010 23:20:45 +0100
> From: xn--mlform-iua@xn--mlform-iua.no
> To: cewcathar@hotmail.com
> CC: addison@amazon.com; www-international@w3.org; public-html@w3.org; ishida@w3.org; ian@hixie.ch
> Subject: RE: ISSUE-88 / Re: what's the language of a document ?
> 
> CE Whitehead, Fri, 12 Mar 2010 15:18:01 -0500 - in reply to Addison:
> ....
> >> I think it would be appropriate to include text allowing the 
> >> user-agent to infer the value of @lang from a properly 
> >> formed <meta> tag in cases in which @lang is not present or empty. 
> >> In that case, I would expect that the first
> >> language in the <meta> tag would be assigned as an inferred value 
> [ ... ]
> >> Additional language tags can be trimmed off using a 
> >> well-placed call to strtok (for example). This is what our working 
> >> group's Change Proposal proposes.
> >
> > I thought this was Leif's proposal; perhaps I am mistaken here; but I 
> > thought Leif meant to read the first value of the http header to
> > infer the text-processing language when none was declared in
> > the xml or html element, and to otherwise consider two values o.k.
> > Thanks.
> 
> I think it was clear that I offered an alternative proposal to the one 
> from the I18N WG. ;-) And I believe that it was clear that this exact 
> point were my proposal differs from the I18N proposal ... 
> 
> Let us argue based on the algorithm in HTML4:
> 
> > 8.1.2 Inheritance of language codes
> > An element inherits language code information according to the 
> > following order of precedence (highest to lowest):
> > * The lang attribute set for the element itself.
> > * The closest parent element that has the lang attribute set (i.e., 
> > the lang attribute is inherited).
> > * The HTTP "Content-Language" header (which may be configured in a 
> > server). For example: Content-Language: en-cockney
> > * User agent default values and user preferences.
> 


ME] Default values should be last.  So this order sounds absolutely right--
except that the first value in the http header should get precedence over the second too, other things being equal.
I am not against automatic language detection either,  and that can be worked into this algorith.
So let's keep things as above but let's make sure to specify that the first value of the header gets precedence over the 2nd for
text processing.


> The second last step - the HTTP content-langauge header option - only 
> works whenever it represents just one language. And since User Agents 
> doesn't implement the last step ("User agent default values and user 
> preferences"), and since multiple language tags inside the <meta> C-L 
> means that no language can be inferred (without a change to the above 
> algorithm - such as looking for the first language tag), we have - in 
> my view - a problem to solve.


ME] When there are two languages in the http I can of course arbitrarily pick one or the other and set my html tag to that, and usually I do so; however sometimes the content is so mixed that R. I. has actually suggested in his 'best practices' for interrnationalization that -- in such cases only -- declaration of the text processing language be deferred till you get to divs or other elements with content in one or the other language.
(If no text processing language is declared the solution seems still to be to let the first value of http be the text-processing language; see above.)

 

> It is not very difficult to understand the issues that I described 
> above either. My proposal doesn't affect the format and rules for 
> <meta> C-L. It only takes its side effects into account.


ME] Having the first language of meta Content-Language be taken to be the text processing language in no way says you cannot still declare other languages'
it's just that for the text-processing language, when none is declared elsewhere,
the first language declared in meta Content-Language gets precedence.
This does not affect the format and rules for <meta> C-L either. 

 

(I'm not that picky about which proposal goes through; maybe I do not understand all the technical repercussions; but I just want to be able to have more than one language specified in meta elements and http headers;
we do have to have some system for determining a text-processing language 

but of course in many cases the language can be be guessed from character strings -- another option for cases where pages that do not conform to standards must be handled;
so we just have to spell out the ways in order for declaring text-processing language, and the priority assigned to each; I think you all have about done that.)

 

> 
> Really, Validator.nu should not only check the content of the <meta> 
> content-language element, but should also check the content-language 
> header coming from the server. And, whenever the HTTP content-language 
> from the server offers more than one language, then the validator 
> should warn/recommend that root element should have a lang attribute, 
> since otherwise, there is no way to obtain the language of the document.
> 
> Even if we specify - now - that it is the first language tag that 
> counts: Existing content out there doesn't know anything about this. 
> Some content could also rely on the current behaviour w.r.t. how user 
> agents interpret meta elements with multiple values. And I am not even 
> sure that it is any good to give any special heed to the first language 
> tag of the Content-Header - if that means ignoring the rest as possible 
> fallback options. 
 
ME] Agreed we should consider the rest as indicators of the possible audience 
(& perhaps these can be used also to try to help processors to choose languages to try to match the language to, in automatic language detection?)

 

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 
Date: Sat, 13 Mar 2010 06:01:52 +0100
> W.r.t. to the I18N group: Take the suggestion that the order of the 
> language tags inside the <meta> content-language element should be 
> significant. So what if there is no <meta> content-language element? 
> But instead, there are several content-languages specified on the 
> server? Do you want to regulate the order of the language tags coming 
> from the server as well? 

 

ME] In my opinion, yes, but only if the tags coming from the server are used to determine text-processing language.

 

Best,

C. E. Whitehead
cewcathar@hotmail.com
> -- 
> leif halvard silli
> 
 		 	   		  
Received on Saturday, 13 March 2010 21:46:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 13 March 2010 21:46:36 GMT