Re: Null change proposal for ISSUE-88 (mark II) from Kornel Lesinski on 2010-04-03 (public-html@w3.org from April 2010)

From: Kornel Lesinski <kornel@geekhood.net>
Date: Sat, 03 Apr 2010 17:34:20 +0100
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-ID: <op.valhniezptj49s@aimac.local>

On Sat, 03 Apr 2010 10:00:32 +0100, Julian Reschke <julian.reschke@gmx.de>  
wrote:

>> I agree that difference between http-equiv and HTTP is constant source
>> of confusion. Authors mistakenly think it is equivalent of HTTP headers,
>> and that most/all HTTP headers would work that way (e.g. there's lots of
>> documents with HTTP cache directives in HTML).
>>
>> Obviously, it's not an HTTP header equivalent (unless HTTP will require
>> HTTP clients to parse HTML) – the name is very misleading.
>
> Why would HTTP want to make requirements on HTML processing?

Because without parsing of HTML at some point (by an HTTP server or  
proxies and clients) <meta> won't affect HTTP (e.g. content negotiation  
with Vary: Content-Language may cause invalid version to be cached if only  
HTML pragma is used), so it's not really an HTTP header equivalent, it's  
something else that only superficially looks like HTTP header.

HTML5 defines http-equiv to contain specific values and HTTP-like pragmas  
registered in WHATWG registry under certain conditions, and not simply  
HTTP headers.

> I think we already heard about these use cases. Just because *browsers*  
> do not support them doesn't mean that it's not used in other frameworks,  
> and there's really no reason to make those documents non-compliant.

Could you point out such frameworks? How would they use such vague  
information in a useful way?

I've grepped over 600000 documents from dotnetdotcom.org. Found 52835  
content-language pragmas, and of these only 867 had a comma (same method  
finds 361666 content-type pragmas).

These are most popular values (after normalization of case and whitespace):

  100 nl,en
   77 de,at,ch
   39 fr,en
   34 en-us,english
   33 de,deutsch
   26 fr,fr-be,fr-ca,fr-lu,fr-ch
   26 en,us
   26 de,ch,at
   20 de,en
   18 fr,fr-be,fr-ch,fr-lu,fr-mc
   18 de,at
   17 it,it-ch
   16 nl,nl-be
   15 el,en
   14 deutsch,de
   12 it,en,fr,de,es
   12 en,th
   12 de,de-ch,de-at,de-lu,de-li
   11 es,es-es
   10 pt,pt-pt
    9 en-us,en-ca,en-au,en-bz,en-jm,en-nz,en-ph,en-tt
    8 german,deutsch,allemand,de
    8 en,us,fr,de,es,ca,nl,dk,it,pt,pl
    8 en,en-us
    7 en-us,en
    6 zh,zh-hk,zh-cn,zh-sg,zh-tw
    6 el,en-us
    6 de,at,ch,deutsch,german
    5 fr,french
    5 en-us,en-gb

Note that many of those are just trying to cover all possible spellings of  
one language.

This is just rough estimate - grepping would miss <meta> that was spread  
across multiple lines, and I looked at individual declarations rather than  
documents, so even few messy documents could distort such small sample,  
and I haven't checked whether declarations match document content.

Compared to number of Content-Type pragmas in the same sample,  
Content-Language seems popular enough to include in the spec.

However, declarations with more than one language are very rare and  
usually contain invalid/redundant information.  Based on this data I agree  
with the spec.

-- 
regards, Kornel Lesiński

Received on Saturday, 3 April 2010 16:35:05 UTC