W3C home > Mailing lists > Public > www-html@w3.org > December 2001

Are the public HTML DTDs valid XML?

From: Ken Klose <ken.klose@imedium.com>
Date: Thu, 6 Dec 2001 15:08:37 -0500 (EST)
Message-ID: <003d01c17e91$b12c6940$5601a8c0@optonline.net>
To: <www-html@w3.org>
I'm trying to use Xerces (java) to parse the simple HTML document below.
I've tried both versions 1.4.4 and 2.0.0b3.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<HTML>
   <HEAD>
      <TITLE>My first HTML document</TITLE>
   </HEAD>
   <BODY>
      Hello world!
   </BODY>
</HTML>

Both offer a similar error: "[Fatal Error] strict.dtd:81:5: The declaration
for the entity "ContentType" must end with '>'".  Looking at the referenced
DTDs http://www.w3.org/TR/html4/strict.dtd and
http://www.w3.org/TR/html4/HTMLlat1.ent I see numerous ENTITY declarations
with comments intermingled such as:

<!ENTITY % ContentType "CDATA"
    -- media type, as per [RFC2045]
    -->

Is this intermingling valid?  If so why would Xerces barf on it?  The XML
1.0 spec (http://www.w3.org/TR/2000/REC-xml-20001006) mentions in section
2.5 Comments that "[comments] may appear within the document type
declaration at places allowed by the grammar" but the grammar for entity
declarations defined in 4.2 does not include comments between the opening <!
and closing >.

Any thoughts?

Thanks,
Ken Klose
Received on Thursday, 6 December 2001 16:32:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:50 GMT