W3C home > Mailing lists > Public > www-validator@w3.org > August 2007

Re: Validator case-sensitive bug for CHARSET?

From: olivier Thereaux <ot@w3.org>
Date: Tue, 7 Aug 2007 14:46:54 +0900
Message-Id: <C0CC49DD-2B51-44E9-BB30-BE4411240B04@w3.org>
Cc: www-validator Community <www-validator@w3.org>, www-international@w3.org
To: Ernest Unrau <ejunrau@mts.net>

Hello Ernest, all,

On Aug 5, 2007, at 04:05 , Ernest Unrau wrote:
> Specifically, the validator is unable to detect the character  
> encoding if
> "CHARSET" is uppercased in the CONTENT field (see below). It will  
> detect it
> automatically if this parameter is lowercased.

This is the first time I run into this issue. Looking at the HTTP  
specification (which HTML normatively refers to for the http-equiv  
meta information) I was unable to find precisely whether the  
"charset=" string was case-sensitive or not, but lacking any mention,  
I will assume that it is case sensitive, as is the rest of HTTP  
constructs.

I have added an entry in bugzilla to track the issue:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=4917

> If indeed this parameter must be lowercased, I would suggest the  
> validator
> should return some help for this problem. I have seen some  
> correspondence
> on your site noting problems with the doctype, but did not find any  
> that
> specifically identified where the problem occurs.

I agree. The validator should probably be loose in its detection of  
the charset parameter in http-equiv, but should shoot a warning if  
the case is wrong. We are, however, lacking documentation on this.  
The otherwise excellent document:
http://www.w3.org/International/O-charset
talks about this usage of <meta> but does not mention case.


> Testing variations of the CONTENT field, these constructions work:
>
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/html;  
> charset=ISO-8859-1">
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/ 
> html;charset=ISO-8859-1">
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/html  
> charset=ISO-8859-1">
>
> These constructions don't work:
>
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/html  
> CHARSET=ISO-8859-1">
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/html;  
> CHARSET=ISO-8859-1">
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/html;  
> CHARSET=iso-8859-1">
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/ 
> html;CHARSET=ISO-8859-1">
>   <META HTTP-EQUIV="Content-Type" CONTENT="text/ 
> html;CHARSET=iso-8859-1">

Could you make at least a few of these into test documents?
* very minimal HTML documents
* encoded as iso-8859-1
* using one of these constructs
* including some non-ascii characters (will be a good test of the  
detection)

Thank you
-- 
olivier
Received on Tuesday, 7 August 2007 05:46:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:25 GMT