W3C home > Mailing lists > Public > www-validator@w3.org > July 2002

Re: [BUG] Character encoding not detected correctly with SGML SHORTTAG

From: Terje Bless <link@pobox.com>
Date: Tue, 2 Jul 2002 22:55:37 +0200
To: Jarno.Elovirta@nokia.com
cc: www-validator@w3.org
Message-ID: <r01050300-1015-2A43BDD08DFE11D688E700039300CF5C@[192.168.1.7]>

Jarno.Elovirta@nokia.com wrote:

>Character Encoding is currently detected erroneously when the document
>uses SGML SHORTTAG constructs. The following document is valid SGML
>document and parses without errors (using SP 1.3.4):
>
><!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN"><TITLE/test
>document/<META 
>http-equiv=Content-Type content="text/html;charset=ISO-8859-1"<P>

Well, it's certainly a bug in the sense that our heuristics are failing
when presented with SHORTTAGS-using HTML. *BUT*, as Nick says, SHORTTAGS in
itself is a bug, IMO, in the SGML Declaration for HTML. As is the use of
inband encoding information (still only IMO).

Thanks for the report though. I'm going to look into how/if we can improve
on the charset detection and file it as a "known issue". A future version
of the validator will probably warn about the use of SHORTTAGS though, due
to it's many problems.

Thanks for the feedback, and please do let us know if you find any more
such issues!

-- 
>For all I know they probably have a standard for
>which direction to put the thread on a bolt.

That would be ISO 261:1973.         -- John Cowan
Received on Tuesday, 2 July 2002 16:55:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:03 GMT