[BUG] Character encoding not detected correctly with SGML SHORTTAG

Hi,

Character Encoding is currently detected erroneously when the document uses SGML SHORTTAG constructs. The following document is valid SGML document and parses without errors (using SP 1.3.4):

  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN"><TITLE/test document/<META http-equiv=Content-Type content="text/html;charset=ISO-8859-1"<P>

However, the W3C Validator fails to read the character encoding information from the META element and issues a warning. The following document is the same document, but with the SHORTTAG construct not used in the META element.

  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN"><TITLE/test document/<META http-equiv=Content-Type content="text/html;charset=ISO-8859-1"><P>

This passes the validation without warnings. Both documents have the exact same parse tree:

    AVERSION CDATA -//IETF//DTD HTML 2.0 Strict//EN
  <HTML>
    <HEAD>
      <TITLE>
         test document
      </TITLE>
        AHTTP-EQUIV TOKEN CONTENT-TYPE
        ACONTENT CDATA text/html;charset=ISO-8859-1
      <META>
      </META>
    </HEAD>
    <BODY>
      <P>
      </P>
    </BODY>
  </HTML>
  C

Cheers,

Jarno

Received on Tuesday, 2 July 2002 06:44:01 UTC