- From: Ernest Unrau <ejunrau@mts.net>
- Date: Sat, 04 Aug 2007 14:05:06 -0500
- To: www-validator@w3.org
Hello, I have encountered what appears to be a upper/lower case bug in your html validator parser at http://validator.w3.org/ Specifically, the validator is unable to detect the character encoding if "CHARSET" is uppercased in the CONTENT field (see below). It will detect it automatically if this parameter is lowercased. I discovered this when attempting to validate my pages at http://www.mts.net/~ejunrau/kronsfeld If indeed this parameter must be lowercased, I would suggest the validator should return some help for this problem. I have seen some correspondence on your site noting problems with the doctype, but did not find any that specifically identified where the problem occurs. The doctype with content META fields must be formatted in this fashion in order for the parser to automatically recognize the document type and charset: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML> <HEAD> <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1"> ...etc. Testing variations of the CONTENT field, these constructions work: <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=ISO-8859-1"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html charset=ISO-8859-1"> These constructions don't work: <META HTTP-EQUIV="Content-Type" CONTENT="text/html CHARSET=ISO-8859-1"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=ISO-8859-1"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-1"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=ISO-8859-1"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=iso-8859-1"> If CHARSET is uppercased, the validator falls back to document encoding UTF-8 and returns this message: ****SNIP**** This Page Is Tentatively Valid HTML 4.01 Transitional Result: Tentatively passed validation File: about.html Encoding: utf-8 Doctype: HTML 4.01 Transitional Root Element: HTML Important Warnings The validator has found the following problem(s) prior to validation, which should be addressed in priority: No Character Encoding Found! Falling back to UTF-8. ****END SNIP**** Kind regards -- Ernest Unrau Morden, Manitoba CANADA E-mail: ejunrau@mts.net
Received on Monday, 6 August 2007 11:58:58 UTC