- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Sun, 01 Dec 96 11:04:32 CST
- To: Liam Quin <lee@sq.com>, W3C SGML Working Group <w3c-sgml-wg@w3.org>
On Thu, 28 Nov 1996 16:17:22 -0500 Liam Quin said: >If SGML can be changed to disable RS/RE processing, or if Charles' >kludge with SHORTREF works, perhaps we're OK. We do have to get the >same whitespace behaviour with and without a DTD, so element content >and data content is not a distinction XML can make, I think. I think I agree that simplifying RE/RS handling would be good, but in the interests of clarity let me ask: Do you mean we have to make the DTD irrelevant to white-space handling * because that was decided at some point, or * as a direct consequence of some other decision, or * because that would be a cleaner design? I agree that it would be desirable to make rules which allow processors to skip the DTD and still get a wholly correct parse tree, er, grove, but I don't think that's possible, since a processor cannot resolve entity references or provide correct default attribute values without reading the DTD. I also believe that if XML makes no distinction between element content and mixed content, it will be impossible to build correct XML processors on top of SGML parsers. It might be worthwhile to distinguish clearly among (a) full validation, which checks the instance against the DTD and provides the correct grove, (b) full parsing, which doesn't validate but does provide character-for-character the correct grove, and which requires at least some attention to the DTD, and (c) minimal or partial parsing, which doesn't require a DTD and which is guaranteed (in XML) to vary from the correct grove only in specific ways: (i) it will have extra white space in element content, (ii) it won't provide correct values for defaulted attributes, (iii) it won't have information about attribute types like ID/IDREF. (iv) it won't resolve references to entities not hardcoded into the processor. I'm not sure (iii) is part of the standard grove output on which we want to model the XML processor's output; if it isn't, then that's one difference fewer. -C. M. Sperberg-McQueen
Received on Sunday, 1 December 1996 12:26:23 UTC