- From: Matthew Idso <MatthewI@soma.com>
- Date: Thu, 28 Oct 1999 13:25:12 -0400 (EDT)
- To: "'Mike Brown'" <mbrown@netignite.com>, "'www-html@w3.org'" <www-html@w3.org>
- Cc: Matthew Idso <MatthewI@soma.com>, "'nir@nirdagan.com'" <nir@nirdagan.com>
The Problem is with the XHTML DTD. XHTML is a subset of XML, therefore requiring what is call well-formedness. See the w3c Page: http://www.w3.org/TR/REC-xml#sec-entity-decl You will find the following: <!ENTITY lt "&#60;"> <!ENTITY gt ">"> <!ENTITY amp "&#38;"> <!ENTITY apos "'"> <!ENTITY quot """> The declarations in the XHTML DTD are not in the above form as required for XML well-formedness. Matt Idso -----Original Message----- From: Mike Brown [mailto:mbrown@netignite.com] Sent: Thursday, October 28, 1999 10:07 AM To: 'www-html@w3.org' Cc: 'MatthewI@soma.com'; 'nir@nirdagan.com' Subject: XHTML DTD revisited: entity declarations and the MSXML/XJParser Apologies if this has been discussed before. This is in reference to Message-Id: <199909151443.KAA00876@dark.brown.edu>, archived at http://lists.w3.org/Archives/Public/www-html/1999Sep/0026.html ... the original poster complained about IE5 choking on an XHTML DTD. The respondent speculated that the problem was not in the DTD. I ran into the same behavior in IE5 recently and discovered the actual cause is related to the XML parser that IE5 uses and perhaps some redundancy in the DTD. To demonstrate the behavior, simply attempt to view this XHTML document in IE5: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/transitional.dtd"> <html> <head> <title>hello</title> </head> <body> <p>hello world</p> </body> </html> The problem occurs when parsing the DTD. The Datachannel-Microsoft XML Parser that IE5 uses, and which can be obtained independently at http://msdn.microsoft.com/downloads/tools/xmlparser/xmlparser.asp (COM) or http://msdn.microsoft.com/xml/IE4/jparser.asp (Java), or as the Datachannel XJParser from http://xdev.datachannel.com, treats the following characters *and* their numeric entity references specially: < > & " ' This reference explains what's going on, although I don't fully understand it: http://xdev.datachannel.com/downloads/xjparser/documentation/#pgfId-1001590 The consequences of this situation appear to be that if a DTD contains <!ENTITY foo "&">, the & is going to be treated as the beginning of an entity reference. Similarly, < and > (angle brackets) are going to be treated as the beginning and end of tags. Single and double quotes seem to be unaffected by this behavior. The XHTML DTDs refer to a set of entity declarations that include the following: <!ENTITY amp "&"> <!-- ampersand, U+0026 ISOnum --> <!ENTITY lt "<"> <!-- less-than sign, U+003C ISOnum --> <!ENTITY gt ">"> <!-- greater-than sign, U+003E ISOnum --> The parser will not allow &, < or > to be redefined anyway, so simply removing these declarations will allow the parser to function. The other "solution" is to replace the & in the entity reference with & like this: <!ENTITY amp "&#38;"> <!-- ampersand, U+0026 ISOnum --> <!ENTITY lt "&#60;"> <!-- less-than sign, U+003C ISOnum --> <!ENTITY gt "&#62;"> <!-- greater-than sign, U+003E ISOnum --> http://msdn.microsoft.com/xml/general/xmlfaq.asp#issues-Entities suggests using a DTD that defines HTML entities: http://msdn.microsoft.com/xml/general/htmlentities.dtd ... Take a look at this DTD and you will see that they are using both solutions: & > and < are not being redefined at all, but they are defining the unnecessary & > and < by putting & in the replacement text. So I have the following questions: 1. Is the MS/Datachannel XML parser violating XML 1.0 by not allowing & < or > to be redefined? (I would think not, as their immutability is crucial to the operation of an XML parser). 2. Is the MS/Datachannel XML parser violating XML 1.0 by treating & < and > in entity replacement text as if it were markup? 3. What does "&#38;" as replacement text mean -- 1 character '&', or 4 characters '&", or 9 characters '&#38;'? Is the suggested approach of using "&#38;" as the entity replacement text wrong? 4. Is it redundant/unnecessary to have entity declarations for these characters in XHTML at all, since XHTML is XML, and thus must have those immutable entities defined by default? -Mike
Received on Thursday, 28 October 1999 13:31:42 UTC