W3C home > Mailing lists > Public > www-international@w3.org > July to September 2007

RE: Validator case-sensitive bug for CHARSET?

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 08 Aug 2007 10:32:39 +0900
Message-Id: <6.0.0.20.2.20070808101932.0a29bc10@localhost>
To: "McDonald, Ira" <imcdonald@sharplabs.com>, "David Dorward" <david@dorward.me.uk>, "Ernest Unrau" <ejunrau@mts.net>
Cc: "www-validator Community" <www-validator@w3.org>, <www-international@w3.org>

It's very clear that the charset tags themselves are case-insensitive,
i.e. US-ASCII is as good as us-ascii is as good as uS-aSCii or any
other variant. It's also clear that for HTML, element and attribute
names are case-insensitive.

The question is whether the charset parameter on the Content-Type
HTTP header is case-sensitive or case-insensitive. Olivier earlier
said that he wasn't able to find anything relevant in the HTTP spec,
but I found this at http://www.ietf.org/rfc/rfc2616.txt:

>>>>>>>>
3.7 Media Types

   HTTP uses Internet Media Types [17] in the Content-Type (section
   14.17) and Accept (section 14.1) header fields in order to provide
   open and extensible data typing and type negotiation.

       media-type     = type "/" subtype *( ";" parameter )
       type           = token
       subtype        = token

   Parameters MAY follow the type/subtype in the form of attribute/value
   pairs (as defined in section 3.6).

   The type, subtype, and parameter attribute names are case-
   insensitive. Parameter values might or might not be case-sensitive,
   depending on the semantics of the parameter name.
>>>>>>>>

"charset" is a parameter attribute name, and therefore case-insensitive.
Section 3.7 is clearly referenced from Section 14.7:

>>>>>>>>
14.17 Content-Type

   The Content-Type entity-header field indicates the media type of the
   entity-body sent to the recipient or, in the case of the HEAD method,
   the media type that would have been sent had the request been a GET.

       Content-Type   = "Content-Type" ":" media-type

   Media types are defined in section 3.7. An example of the field is

       Content-Type: text/html; charset=ISO-8859-4

   Further discussion of methods for identifying the media type of an
   entity is provided in section 7.2.1.
>>>>>>>>

Olivier said that "the rest of HTTP constructs" are case-sensitive,
but this is not true. Methods such as GET and PUT are case-sensitive,
but most of the other stuff is not because it was taken over from
email, where it is also not case-sensitive.


Regards,     Martin.s

At 04:33 07/08/08, McDonald, Ira wrote:
>
>Hi,
>
>Quoting HTTP/1.1 (RFC 2616), page 22:
>
>>> "HTTP character sets are identified by case-insensitive tokens. The
>    complete set of tokens is defined by the IANA Character Set registry
>    [19]."
>
>And the normative IANA Charset Registration Procedures (RFC 2978),
>page 4 says:
>
>  "Finally, charsets being registered for use with the "text" media type
>   MUST have a primary name that conforms to the more restrictive syntax
>   of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
>   MIME extended parameter values [RFC-2184].  A combined ABNF
>   definition for such names is as follows:
>
>     mime-charset = 1*mime-charset-chars
>     mime-charset-chars = ALPHA / DIGIT /
>                "!" / "#" / "$" / "%" / "&" /
>                "'" / "+" / "-" / "^" / "_" /
>                "`" / "{" / "}" / "~"
>>>   ALPHA        = "A".."Z"    ; Case insensitive ASCII Letter
>     DIGIT        = "0".."9"    ; Numeric digit"
>
>Any use of IANA charset tags in any standard that is case 
>sensitive is broken.
>
>Cheers,
>- Ira - editor of IANA Charset MIB (RFC 3808)
>
>Ira McDonald (Musician / Software Architect)
>Chair - Linux Foundation Open Printing WG
>Blue Roof Music / High North Inc
>PO Box 221  Grand Marais, MI  49839
>phone: +1-906-494-2434
>email: imcdonald@sharplabs.com
>
>-----Original Message-----
>From: www-international-request@w3.org
>[mailto:www-international-request@w3.org]On Behalf Of David Dorward
>Sent: Tuesday, August 07, 2007 2:59 AM
>To: Ernest Unrau
>Cc: www-validator Community; www-international@w3.org
>Subject: Re: Validator case-sensitive bug for CHARSET?
>
>
>
>On 7 Aug 2007, at 08:11, Ernest Unrau wrote:
>> No HTML tags are case-sensitive, but it may indeed be that the CHARSET
>> parameter must be case sensitive since I'm told that the META tags are
>> mimicking HTML headers. Perhaps the servers that parse these  
>> headers are
>> also case sensitive? But one would think that validation would fail on
>> other META tags also.
>
>There aren't any other meta tags that provide information needed in  
>order to parse a document, so that isn't the case.
>
>-- 
>David Dorward
>http://dorward.me.uk/
>http://blog.dorward.me.uk/
>
>
>
>
>No virus found in this outgoing message.
>Checked by AVG Free Edition. 
>Version: 7.5.476 / Virus Database: 269.11.8/940 - Release Date: 8/6/2007 4:53 PM
> 


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Wednesday, 8 August 2007 01:34:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:14 GMT