W3C home > Mailing lists > Public > www-validator@w3.org > July 2002

[BUG] Character encoding not detected correctly with SGML SHORTTAG

From: <Jarno.Elovirta@nokia.com>
Date: Tue, 2 Jul 2002 13:43:59 +0300
Message-ID: <E392EEA75EC5F54AB75229B693B1B6A70E261C@esebe018.NOE.Nokia.com>
To: <www-validator@w3.org>

Hi,

Character Encoding is currently detected erroneously when the document uses SGML SHORTTAG constructs. The following document is valid SGML document and parses without errors (using SP 1.3.4):

  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN"><TITLE/test document/<META http-equiv=Content-Type content="text/html;charset=ISO-8859-1"<P>

However, the W3C Validator fails to read the character encoding information from the META element and issues a warning. The following document is the same document, but with the SHORTTAG construct not used in the META element.

  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN"><TITLE/test document/<META http-equiv=Content-Type content="text/html;charset=ISO-8859-1"><P>

This passes the validation without warnings. Both documents have the exact same parse tree:

    AVERSION CDATA -//IETF//DTD HTML 2.0 Strict//EN
  <HTML>
    <HEAD>
      <TITLE>
         test document
      </TITLE>
        AHTTP-EQUIV TOKEN CONTENT-TYPE
        ACONTENT CDATA text/html;charset=ISO-8859-1
      <META>
      </META>
    </HEAD>
    <BODY>
      <P>
      </P>
    </BODY>
  </HTML>
  C

Cheers,

Jarno
Received on Tuesday, 2 July 2002 06:44:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:03 GMT