I'm trying to use Xerces (java) to parse the simple HTML document below. I've tried both versions 1.4.4 and 2.0.0b3. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <HTML> <HEAD> <TITLE>My first HTML document</TITLE> </HEAD> <BODY> Hello world! </BODY> </HTML> Both offer a similar error: "[Fatal Error] strict.dtd:81:5: The declaration for the entity "ContentType" must end with '>'". Looking at the referenced DTDs http://www.w3.org/TR/html4/strict.dtd and http://www.w3.org/TR/html4/HTMLlat1.ent I see numerous ENTITY declarations with comments intermingled such as: <!ENTITY % ContentType "CDATA" -- media type, as per [RFC2045] --> Is this intermingling valid? If so why would Xerces barf on it? The XML 1.0 spec (http://www.w3.org/TR/2000/REC-xml-20001006) mentions in section 2.5 Comments that "[comments] may appear within the document type declaration at places allowed by the grammar" but the grammar for entity declarations defined in 4.2 does not include comments between the opening <! and closing >. Any thoughts? Thanks, Ken KloseReceived on Thursday, 6 December 2001 16:32:14 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 6 April 2009 12:59:15 GMT