- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Mon, 02 Dec 96 15:16:29 CST
- To: "David G. Durand" <dgd@cs.bu.edu>, W3C SGML Working Group <w3c-sgml-wg@w3.org>
On Mon, 2 Dec 1996 16:02:35 -0500 David G. Durand said: >First, we have a compriomise on this issue that is far from perfect, >but took us at least a month of intense debate to reach: why are we >re-opening it? Let's leave not-so-well-enough alone! I'm wistful for >a better (different) solution, but given that everyone (including me) >seems to find the compromise livable, shouldn't we leave it in place? I think some people are interested in re-opening it because Charles seems to have found a way to get RE Delenda Est behavior out of SGML processors. At least, it's the behavior I thought was intended for REDE: record ends are always passed through in mixed content, and all white space is eaten in element content. Since many people gave up on REDE solely for compatibility reasons, Charles's new proposal seems to open the possibility of getting something materially better than the compromise we have now. >At 11:04 AM 12/1/96, Michael Sperberg-McQueen wrote: >>I think I agree that simplifying RE/RS handling would be good, but >>in the interests of clarity let me ask: Do you mean we have to make >>the DTD irrelevant to white-space handling >> >> * because that was decided at some point, or > Yes. When was that? I don't remember that decision, and I'd like to refresh my memory on the wording. Actually, I rather suspect that Lee was just failing to distinguish his own opinions from logical necessities and from decisions made in the design to date, and since he has scolded others on that account, I thought I'd ask to make sure. >> * as a direct consequence of some other decision, or > Yes, as well: if this is not the case, then applications will produce >different results depending on whether a DTD is present or not. Yes, they will. How is this incompatible with any decision we've made? I don't remember a decision saying they can't produce different results, only a series of strenuous attempts to minimize the set of differences. It didn't go to zero, and won't while XML has attribute defaults and entities. Since I'd rather have attribute defaults and entities than lose them, this seems like a good deal to me. The goal is to let some useful applications skip the DTD, if they can. Right now, very few useful applications can skip the DTD, because most useful applications are guided in their actions by element boundaries, and these cannot be reliably detected without knowing which elements have declared content. In XML, there are no CDATA or RCDATA elements, and EMPTY elements are detectable without a DTD. The only remaining types of information for which a DTD is essential are: - content models (for validation) - attribute types and default values (for recognition of IDREFs etc. and for full provision of all attributes which apply to each element instance) - entity declarations (for resolution of entity references) - element vs. mixed content (for elimination of white space in element content) I can think of useful applications which require this information; I can also think of useful applications which don't. Removing the last item from the list doesn't add or shift any applications I can think of. >> * because that would be a cleaner design? > I dunno, none of the proposals we've considered leads to a "clean >design" other than making all whitespace significant (a policy that has >been vetoed for compatibility reasons). And brought up again for further discussion precisely because, as Charles explained, we may find a compatible way to do it - at least for mixed content. >>I agree that it would be desirable to make rules which allow >>processors to skip the DTD and still get a wholly correct parse >>tree, er, grove, but I don't think that's possible, since a >>processor cannot resolve entity references or provide correct >>default attribute values without reading the DTD. > >We already determined that resolving entity references is not >required: the XML data structure cannot completely hide entity >references from applications in that way that SGML parsers do. >Default attributes at least have fixed values, so they must offer >relatively little to an application that doesn't care about DTDs. I don't say these things are earth-shattering; I do think they constitute differences between the parse tree produced by processors which read the DTD and those which don't. The parse trees aren't the same and tweaking the rules for white-space handling in element content won't make them the same. So I don't see why we should depart from SGML by saying that white space should be preserved in element content. If that were the only difference between DTD-less parsing and DTD-aware parsing, we might have something to discuss, but it's not and we don't. >>I also believe that if XML makes no distinction between element >>content and mixed content, it will be impossible to build correct >>XML processors on top of SGML parsers. >We did determine, I think, that some differences would exist between >the SGML and XML "data streams" in the case of RE near markup. This >was deemed acceptable compared to the horrendously complex SGML >status quo. Not all REs near markup; only REs in pathological positions, and they won't appear or disappear, they'll only be moved from their actual position by SGML parsers, and left alone by XML processors. >> discussion of validation, production of correct ESIS, and >> simple parsing with almost-correct ESIS >This is far too complex: two kinds of parsing are enough. Which two do you choose? I suggested we distinguish the three concepts of validatiion, generation of correct ESIS, and minimal parsing because we seem to be using all three of these distinct concepts while pretending to ourselves that there are only two concepts involved. If one of these three is unnecessary, by all means tell us which it is so we can do without it. Michael
Received on Monday, 2 December 1996 17:58:39 UTC