W3C home > Mailing lists > Public > www-international@w3.org > July to September 2008

Re: meta content-language

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 22 Aug 2008 13:50:26 +0900
Message-ID: <48AE4592.8030104@w3.org>
To: Martin Duerst <duerst@it.aoyama.ac.jp>
CC: Henri Sivonen <hsivonen@iki.fi>, Richard Ishida <ishida@w3.org>, 'Ian Hickson' <ian@hixie.ch>, 'HTML WG' <public-html@w3.org>, www-international@w3.org, Erik van der Poel <erikv@google.com>

Martin Duerst wrote:
> I fully agree with Richard/Addison/Mark and others.

me too.

> A few more comments below.

I agree on your comments too and have just one additional one.

> [this was written mostly yesterday, so some of it is
> repeating some points]
> At 16:36 08/08/15, Henri Sivonen wrote:
>> On Aug 14, 2008, at 19:08, Richard Ishida wrote:
>>> I would recommend that we keep the language attributes for declaring  
>>> the
>>> default language of the content (the text-processing language) and  
>>> not muddy
>>> the waters by using meta Content-Language declarations fulfill a  
>>> similar
>>> role, because:
>>> 1. the acceptable values are different and the meta approach is  
>>> incompatible
>>> with declaring the text-processing language
>> The spec could make multiple language tags in Content-Language non- conforming and could make processing pick the first language tag.
> The syntax uses the http-equiv element. It should be really obvious
> that this stands for "HTTP equivalent". It should also be pretty
> obious that it would be a really bad idea to try to create some
> arbitrary differences between the HTTP Content-Language header
> and the equivalent in the <meta> tag.
>>> 2. the meta approach is really not used by anything according to the  
>>> tests I
>>> did
> The original idea of the http-equiv meta data was to provide data
> to the server to put into the HTTP headers. So even if everything
> went according to plan, it wouldn't be surprising, because
> Richard is testing the browser side.
> It turned out that parsing the file itself on the server side
> was too slow when you just want to throw out pages, so this
> never caught on. [except for the cases that Roy mentions]
> The use of the charset parameter in Content-Type in <meta> is
> the big exception, in that it is actually used, but on the
> browsers side.
>> Given that people do put and have put language declarations there, is  
>> it good to keep ignoring that data?
> Maybe no. But on the browser side, it has been ignored
> for more than 10 years, without any problems. Why try to
> fix something that's not broken and confuse a lot of people?
> I'm also continuing to wonder about the goals for HTML5.
> My understanding, at one time, was that it was an attempt
> to very carefully define current "broken" browser behavior,
> in order to save future implementers the hassle of
> reverse-engineering, and to help existing browsers to
> converge on the same "broken" behavior if they wanted
> to do so.
> I also understand that HTML5 adds some new features,
> e.g. in the area of forms, to lower the gap between
> HTML and XHTML+XForms.
> It is new to me that arbitrary changes to well-defined
> and consistently implemented behavior are also part of
> HTML5.
>> Of course, if the data is *wrong* significantly more often than  
>> lang='' (assuming that the correctness level of lang='' establishes an  
>> implicit data quality baseline), it would be good to ignore it. My  
>> guess is that HTTP-level Content-Language is more likely to be wrong  
>> (it sure is less obvious to diagnose) than any HTML-level declaration.  
>> (Due to Ruby's Postulate:
>> http://intertwingly.net/slides/2004/devcon/68.html )
> I guess Google might be able to come up with some data.
> I have copied Erik van der Poel, an expert in this area.
> My guess is that:
> - Authors who declare something usually use lang/xml:lang,
>   and meta maybe as an addition.
> - Some tools may use meta, but the chance that the author
>   corrects this if necessary is low (this is different from
>   the charset case, because the charset case is very
>   visible/actionable).
>>> 3. the question of inheritance is unclear when using the meta  
>>> statement for
>>> declaring the text-processing language
>> The spec now makes it clear.
> Please don't just fix the details and keep the big problems.
>>> If the meta statement continues to be allowed, I suggest that it is  
>>> used in
>>> the same way as a Content-Language declaration in the HTTP header,  
>>> ie. as
>>> metadata about the document as a whole, but that such usage is kept  
>>> separate
>>> from use for defining the language of a range of content. As far as  
>>> I can
>>> tell, although Frontpage uses it and people on the Web recommend its  
>>> use, it
>>> has no effect at all on content, and wouldn't be missed if it were  
>>> dropped.
>> What purpose does metadata serve if it isn't actionable?
> Good question. But most typical metadata (author, title, summary,...)
> isn't really actionable either.

There are organizations, e.g. digital libraries, who care a lot about 
the creation of metadata in HTML, including author, title etc., and who 
make it actionable for e.g. their search facilities. Compared to the 
whole web this is a minority for sure, but I think such use cases should 
be taken into account as well, since they could help moving the "chicken 
and egg" problem of metadata content and metadata processing tools 
moving forward.


> Regards,    Martin.
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp      mailto:duerst@it.aoyama.ac.jp    
Received on Friday, 22 August 2008 05:01:03 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:04:27 UTC