Re: meta content-language from Felix Sasaki on 2008-08-22 (www-international@w3.org from July to September 2008)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 22 Aug 2008 13:50:26 +0900
To: Martin Duerst <duerst@it.aoyama.ac.jp>
CC: Henri Sivonen <hsivonen@iki.fi>, Richard Ishida <ishida@w3.org>, 'Ian Hickson' <ian@hixie.ch>, 'HTML WG' <public-html@w3.org>, www-international@w3.org, Erik van der Poel <erikv@google.com>
Message-ID: <48AE4592.8030104@w3.org>
Martin Duerst wrote:
> I fully agree with Richard/Addison/Mark and others.
>   

me too.

> A few more comments below.
>   

I agree on your comments too and have just one additional one.

> [this was written mostly yesterday, so some of it is
> repeating some points]
>
> At 16:36 08/08/15, Henri Sivonen wrote:
>   
>> On Aug 14, 2008, at 19:08, Richard Ishida wrote:
>>
>>     
>>> I would recommend that we keep the language attributes for declaring  
>>> the
>>> default language of the content (the text-processing language) and  
>>> not muddy
>>> the waters by using meta Content-Language declarations fulfill a  
>>> similar
>>> role, because:
>>> 1. the acceptable values are different and the meta approach is  
>>> incompatible
>>> with declaring the text-processing language
>>>       
>> The spec could make multiple language tags in Content-Language non- conforming and could make processing pick the first language tag.
>>     
>
> The syntax uses the http-equiv element. It should be really obvious
> that this stands for "HTTP equivalent". It should also be pretty
> obious that it would be a really bad idea to try to create some
> arbitrary differences between the HTTP Content-Language header
> and the equivalent in the <meta> tag.
>
>   
>>> 2. the meta approach is really not used by anything according to the  
>>> tests I
>>> did
>>>       
>
> The original idea of the http-equiv meta data was to provide data
> to the server to put into the HTTP headers. So even if everything
> went according to plan, it wouldn't be surprising, because
> Richard is testing the browser side.
>
> It turned out that parsing the file itself on the server side
> was too slow when you just want to throw out pages, so this
> never caught on. [except for the cases that Roy mentions]
>
> The use of the charset parameter in Content-Type in <meta> is
> the big exception, in that it is actually used, but on the
> browsers side.
>
>
>   
>> Given that people do put and have put language declarations there, is  
>> it good to keep ignoring that data?
>>     
>
> Maybe no. But on the browser side, it has been ignored
> for more than 10 years, without any problems. Why try to
> fix something that's not broken and confuse a lot of people?
>
>
> I'm also continuing to wonder about the goals for HTML5.
>
> My understanding, at one time, was that it was an attempt
> to very carefully define current "broken" browser behavior,
> in order to save future implementers the hassle of
> reverse-engineering, and to help existing browsers to
> converge on the same "broken" behavior if they wanted
> to do so.
>
> I also understand that HTML5 adds some new features,
> e.g. in the area of forms, to lower the gap between
> HTML and XHTML+XForms.
>
> It is new to me that arbitrary changes to well-defined
> and consistently implemented behavior are also part of
> HTML5.
>
>
>   
>> Of course, if the data is *wrong* significantly more often than  
>> lang='' (assuming that the correctness level of lang='' establishes an  
>> implicit data quality baseline), it would be good to ignore it. My  
>> guess is that HTTP-level Content-Language is more likely to be wrong  
>> (it sure is less obvious to diagnose) than any HTML-level declaration.  
>> (Due to Ruby's Postulate:
>> http://intertwingly.net/slides/2004/devcon/68.html )
>>     
>
> I guess Google might be able to come up with some data.
> I have copied Erik van der Poel, an expert in this area.
>
> My guess is that:
> - Authors who declare something usually use lang/xml:lang,
>   and meta maybe as an addition.
> - Some tools may use meta, but the chance that the author
>   corrects this if necessary is low (this is different from
>   the charset case, because the charset case is very
>   visible/actionable).
>
>   
>>> 3. the question of inheritance is unclear when using the meta  
>>> statement for
>>> declaring the text-processing language
>>>       
>> The spec now makes it clear.
>>     
>
> Please don't just fix the details and keep the big problems.
>
>   
>>> If the meta statement continues to be allowed, I suggest that it is  
>>> used in
>>> the same way as a Content-Language declaration in the HTTP header,  
>>> ie. as
>>> metadata about the document as a whole, but that such usage is kept  
>>> separate
>>> from use for defining the language of a range of content. As far as  
>>> I can
>>> tell, although Frontpage uses it and people on the Web recommend its  
>>> use, it
>>> has no effect at all on content, and wouldn't be missed if it were  
>>> dropped.
>>>       
>> What purpose does metadata serve if it isn't actionable?
>>     
>
> Good question. But most typical metadata (author, title, summary,...)
> isn't really actionable either.
>   

There are organizations, e.g. digital libraries, who care a lot about 
the creation of metadata in HTML, including author, title etc., and who 
make it actionable for e.g. their search facilities. Compared to the 
whole web this is a minority for sure, but I think such use cases should 
be taken into account as well, since they could help moving the "chicken 
and egg" problem of metadata content and metadata processing tools 
moving forward.

Felix

> Regards,    Martin.
>
>
>
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp      mailto:duerst@it.aoyama.ac.jp    
>
>
>
Received on Friday, 22 August 2008 05:01:03 UTC