Re: [BUG] Character encoding not detected correctly with SGML SHORTTAG

Jarno.Elovirta@nokia.com wrote:

>Character Encoding is currently detected erroneously when the document
>uses SGML SHORTTAG constructs. The following document is valid SGML
>document and parses without errors (using SP 1.3.4):
>
><!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN"><TITLE/test
>document/<META 
>http-equiv=Content-Type content="text/html;charset=ISO-8859-1"<P>

Well, it's certainly a bug in the sense that our heuristics are failing
when presented with SHORTTAGS-using HTML. *BUT*, as Nick says, SHORTTAGS in
itself is a bug, IMO, in the SGML Declaration for HTML. As is the use of
inband encoding information (still only IMO).

Thanks for the report though. I'm going to look into how/if we can improve
on the charset detection and file it as a "known issue". A future version
of the validator will probably warn about the use of SHORTTAGS though, due
to it's many problems.

Thanks for the feedback, and please do let us know if you find any more
such issues!

-- 
>For all I know they probably have a standard for
>which direction to put the thread on a bolt.

That would be ISO 261:1973.         -- John Cowan

Received on Tuesday, 2 July 2002 16:55:14 UTC