- From: olivier Thereaux <ot@w3.org>
- Date: Tue, 7 Aug 2007 20:38:48 +0900
- To: invalid@csc.jp, www-validator Community <www-validator@w3.org>
On Aug 7, 2007, at 17:48 , invalid@csc.jp wrote: > <blockquote cite="http://www.ietf.org/rfc/rfc2616.txt"> > 3.7 Media Types > > HTTP uses Internet Media Types [17] in the Content-Type (section > 14.17) and Accept (section 14.1) header fields in order to provide > open and extensible data typing and type negotiation. > > media-type = type "/" subtype *( ";" parameter ) > type = token > subtype = token > > Parameters MAY follow the type/subtype in the form of attribute/ > value > pairs (as defined in section 3.6). > > The type, subtype, and parameter attribute names are case- > insensitive. Parameter values might or might not be case-sensitive, > depending on the semantics of the parameter name. (...) > </blockquote> Thanks, that's the info I was looking for. So as far as HTTP (and thus Http-Equiv meta in HTML) is concerned > <META HTTP-EQUIV="Content-Type" CONTENT="text/html; > charset=ISO-8859-1"> is equivalent to > <META HTTP-EQUIV="Content-Type" CONTENT="text/html; > CHARSET=ISO-8859-1"> and to > <META HTTP-EQUIV="Content-Type" CONTENT="text/html; > charSet=ISO-8859-1"> (etc.) and Ernest's test cases are valid. I looked at the validator code, and for that part of the content detection, we use the module by Bjoern called HTML::Encoding. -> http://search.cpan.org/src/BJOERN/HTML-Encoding-0.53/lib/HTML/ Encoding.pm -> sub encoding_from_meta_element() -> sub encoding_from_content_type() encoding_from_content_type relies on the tokenization of the HTTP header from sub split_header_words() in HTTP::Headers::Util (itself in libwww-perl) I'm not convinced the "bug" is in HTML::Encoding. HTML::Encoding looks for the "charset" key of the tokenized HTTP header, and it's not really reasonable to expect it to look for CHARSET, and charSet, etc. I guess, from the bit of the spec quoted above, the tokenization should probably convert the media type parameters to lower case, hence when finding Content-Type: foo/bar; ParaMeter=value @values = split_header_words($h->header("Content-Type")); should return ['foo/bar'=> undef, parameter => 'value'] (Bjoern and Gisle in Bcc in this mail, and will forward this mail to cpan bug report for LWP.) -- olivier
Received on Tuesday, 7 August 2007 11:38:08 UTC