Re: meta content-language from Martin Duerst on 2008-08-22 (public-html@w3.org from August 2008)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Fri, 22 Aug 2008 11:16:14 +0900
To: Henri Sivonen <hsivonen@iki.fi>, Richard Ishida <ishida@w3.org>
Cc: "'Ian Hickson'" <ian@hixie.ch>, "'HTML WG'" <public-html@w3.org>, <www-international@w3.org>, "Erik van der Poel" <erikv@google.com>
Message-Id: <6.0.0.20.2.20080821183723.09e03d68@localhost>
I fully agree with Richard/Addison/Mark and others.
A few more comments below.
[this was written mostly yesterday, so some of it is
repeating some points]

At 16:36 08/08/15, Henri Sivonen wrote:
>
>On Aug 14, 2008, at 19:08, Richard Ishida wrote:
>
>> I would recommend that we keep the language attributes for declaring  
>> the
>> default language of the content (the text-processing language) and  
>> not muddy
>> the waters by using meta Content-Language declarations fulfill a  
>> similar
>> role, because:
>> 1. the acceptable values are different and the meta approach is  
>> incompatible
>> with declaring the text-processing language
>
>The spec could make multiple language tags in Content-Language non- conforming and could make processing pick the first language tag.

The syntax uses the http-equiv element. It should be really obvious
that this stands for "HTTP equivalent". It should also be pretty
obious that it would be a really bad idea to try to create some
arbitrary differences between the HTTP Content-Language header
and the equivalent in the <meta> tag.

>> 2. the meta approach is really not used by anything according to the  
>> tests I
>> did

The original idea of the http-equiv meta data was to provide data
to the server to put into the HTTP headers. So even if everything
went according to plan, it wouldn't be surprising, because
Richard is testing the browser side.

It turned out that parsing the file itself on the server side
was too slow when you just want to throw out pages, so this
never caught on. [except for the cases that Roy mentions]

The use of the charset parameter in Content-Type in <meta> is
the big exception, in that it is actually used, but on the
browsers side.


>Given that people do put and have put language declarations there, is  
>it good to keep ignoring that data?

Maybe no. But on the browser side, it has been ignored
for more than 10 years, without any problems. Why try to
fix something that's not broken and confuse a lot of people?


I'm also continuing to wonder about the goals for HTML5.

My understanding, at one time, was that it was an attempt
to very carefully define current "broken" browser behavior,
in order to save future implementers the hassle of
reverse-engineering, and to help existing browsers to
converge on the same "broken" behavior if they wanted
to do so.

I also understand that HTML5 adds some new features,
e.g. in the area of forms, to lower the gap between
HTML and XHTML+XForms.

It is new to me that arbitrary changes to well-defined
and consistently implemented behavior are also part of
HTML5.


>Of course, if the data is *wrong* significantly more often than  
>lang='' (assuming that the correctness level of lang='' establishes an  
>implicit data quality baseline), it would be good to ignore it. My  
>guess is that HTTP-level Content-Language is more likely to be wrong  
>(it sure is less obvious to diagnose) than any HTML-level declaration.  
>(Due to Ruby's Postulate:
>http://intertwingly.net/slides/2004/devcon/68.html )

I guess Google might be able to come up with some data.
I have copied Erik van der Poel, an expert in this area.

My guess is that:
- Authors who declare something usually use lang/xml:lang,
  and meta maybe as an addition.
- Some tools may use meta, but the chance that the author
  corrects this if necessary is low (this is different from
  the charset case, because the charset case is very
  visible/actionable).

>> 3. the question of inheritance is unclear when using the meta  
>> statement for
>> declaring the text-processing language
>
>The spec now makes it clear.

Please don't just fix the details and keep the big problems.

>> If the meta statement continues to be allowed, I suggest that it is  
>> used in
>> the same way as a Content-Language declaration in the HTTP header,  
>> ie. as
>> metadata about the document as a whole, but that such usage is kept  
>> separate
>> from use for defining the language of a range of content. As far as  
>> I can
>> tell, although Frontpage uses it and people on the Web recommend its  
>> use, it
>> has no effect at all on content, and wouldn't be missed if it were  
>> dropped.
>
>What purpose does metadata serve if it isn't actionable?

Good question. But most typical metadata (author, title, summary,...)
isn't really actionable either.

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp      mailto:duerst@it.aoyama.ac.jp
Received on Friday, 22 August 2008 02:17:52 UTC