Validator case-sensitive bug for CHARSET? from Ernest Unrau on 2007-08-04 (www-validator@w3.org from August 2007)

From: Ernest Unrau <ejunrau@mts.net>
Date: Sat, 04 Aug 2007 14:05:06 -0500
To: www-validator@w3.org
Message-ID: <yam10807.320.121097416@smtp.mts.net>

Hello,

I have encountered what appears to be a upper/lower case bug in your html
validator parser at http://validator.w3.org/

Specifically, the validator is unable to detect the character encoding if
"CHARSET" is uppercased in the CONTENT field (see below). It will detect it
automatically if this parameter is lowercased.

I discovered this when attempting to validate my pages at
http://www.mts.net/~ejunrau/kronsfeld

If indeed this parameter must be lowercased, I would suggest the validator
should return some help for this problem. I have seen some correspondence
on your site noting problems with the doctype, but did not find any that
specifically identified where the problem occurs.

The doctype with content META fields must be formatted in this fashion in
order for the parser to automatically recognize the document type and
charset:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
  <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">

  ...etc.


Testing variations of the CONTENT field, these constructions work:

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=ISO-8859-1">
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html charset=ISO-8859-1">

These constructions don't work:

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html CHARSET=ISO-8859-1">
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=ISO-8859-1">
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-1">
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=ISO-8859-1">
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=iso-8859-1">


If CHARSET is uppercased, the validator falls back to document encoding
UTF-8 and returns this message:

****SNIP****

This Page Is Tentatively Valid HTML 4.01 Transitional

 Result:       Tentatively passed validation 
 File:         about.html 
 Encoding:     utf-8 
 Doctype:      HTML 4.01 Transitional 
 Root Element: HTML  

 Important Warnings

 The validator has found the following problem(s) prior to validation, which
should be addressed in priority:

 No Character Encoding Found! Falling back to UTF-8.

****END SNIP****

Kind regards

-- 

Ernest Unrau
Morden, Manitoba
CANADA
E-mail: ejunrau@mts.net

Received on Monday, 6 August 2007 11:58:58 UTC