- From: olivier Thereaux <ot@w3.org>
- Date: Tue, 7 Aug 2007 20:38:48 +0900
- To: invalid@csc.jp, www-validator Community <www-validator@w3.org>
On Aug 7, 2007, at 17:48 , invalid@csc.jp wrote:
> <blockquote cite="http://www.ietf.org/rfc/rfc2616.txt">
> 3.7 Media Types
>
> HTTP uses Internet Media Types [17] in the Content-Type (section
> 14.17) and Accept (section 14.1) header fields in order to provide
> open and extensible data typing and type negotiation.
>
> media-type = type "/" subtype *( ";" parameter )
> type = token
> subtype = token
>
> Parameters MAY follow the type/subtype in the form of attribute/
> value
> pairs (as defined in section 3.6).
>
> The type, subtype, and parameter attribute names are case-
> insensitive. Parameter values might or might not be case-sensitive,
> depending on the semantics of the parameter name. (...)
> </blockquote>
Thanks, that's the info I was looking for.
So as far as HTTP (and thus Http-Equiv meta in HTML) is concerned
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
> charset=ISO-8859-1">
is equivalent to
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
> CHARSET=ISO-8859-1">
and to
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
> charSet=ISO-8859-1">
(etc.) and Ernest's test cases are valid.
I looked at the validator code, and for that part of the content
detection, we use the module by Bjoern called HTML::Encoding.
-> http://search.cpan.org/src/BJOERN/HTML-Encoding-0.53/lib/HTML/
Encoding.pm
-> sub encoding_from_meta_element()
-> sub encoding_from_content_type()
encoding_from_content_type relies on the tokenization of the HTTP
header from sub split_header_words() in HTTP::Headers::Util (itself
in libwww-perl)
I'm not convinced the "bug" is in HTML::Encoding. HTML::Encoding
looks for the "charset" key of the tokenized HTTP header, and it's
not really reasonable to expect it to look for CHARSET, and charSet,
etc.
I guess, from the bit of the spec quoted above, the tokenization
should probably convert the media type parameters to lower case,
hence when finding
Content-Type: foo/bar; ParaMeter=value
@values = split_header_words($h->header("Content-Type"));
should return
['foo/bar'=> undef, parameter => 'value']
(Bjoern and Gisle in Bcc in this mail, and will forward this mail to
cpan bug report for LWP.)
--
olivier
Received on Tuesday, 7 August 2007 11:38:08 UTC