- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Tue, 22 Oct 96 08:25:14 CDT
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
[Medium-long note. Executive summary: the discussion of 'implicit' DTDs has served its purpose and can be ended without damage to the process of designing and documenting XML 1.0.] The discussion of explicit and implicit DTDs has confused me a lot, mostly because the replies seem to be about topics very different from the postings to which they are ostensibly replying. A few rounds back, Len Bullard asked about the point of this debate. For what it's worth, this is my understanding of the original point: - the ERB asked what to do about references to undeclared entities. - Charles Goldfarb suggested they should be treated as in 8879, i.e. as errors. Along the way, he referred to the fact that any document can be viewed as being of some type, and that if that document type is not explicitly defined, one can nevertheless regard it as having some definition implicit in its usage. Unfortunately, Charles used the term 'DTD' for the set of rules governing the use of a document type -- not surprising, since that is how 8879 defines it -- and many others interpreted it as referring to the set of explicit declarations provided -- again not surprising, since that is a very widespread usage despite what 8879 says. The difference between the set of rules governing X and the set of explicitly formulated rules governing X is, needless to say, the set of implicit rules governing X. And that's the beginning of our rabbit trail. Implicit declarations are a reasonably common phenomenon in many formal languages (the rule in C that an undeclared identifier is assumed to be a variable or a function of type int is an easy example); the charges that they involve mysticism or magic seem way overblown to me. It is easy to define the behavior of an XML processor working with an empty or incomplete set of declarations as being governed by an *implicit* set of declarations -- I can say that, because I have already sketched out language that does so. An XML processor which has no explicit markup declarations has no warrant to flag any errors except those which violate some basic rule of XML, like element nesting or attribute quoting. That is, an XML processor which has no explicit markup declarations must behave pretty much identically to an XML processor which has explicit markup declarations in which every element is declared <!ELEMENT foo - - ANY > and every attribute is declared ATTNAME CDATA #IMPLIED. (This is the form which Lou Burnard and I call the Waterloo DTD, in honor of the Waterloo Centre for the Oxford English Dictionary, though Tim Bray has pointed out with some edge that no such DTD was ever used at Waterloo. The responsibility for the name must be borne by Lou and myself.) Note that here I part company with Charles, since he posits a maximally constraining DTD and I posit a maximally permissive DTD. As will be seen below, this turns out to make no difference at all. It seems to me that the only point of interest here is whether explicit reference to some notion of an implicit DTD is useful or not, in explaining the behavior of a processor faced with incomplete or nonexistent declarations. It seems Charles and I agree in our instincts that it is, or could be. It seems clear from the confused reactions of others that our instincts are wrong in this case: the idea is simple, but the obvious way of expressing it conveys something other than that simple idea to many readers who ought, if possible, to be able to read the XML spec with comprehension, if not always with the highest pitch of aesthetic pleasure. That's an argument against using the notion in the documentation. David Durand also pointed out to me that *requiring* a processor to construct, or behave as if it had constructed, element declarations of the form <!ELEMENT foo - - ANY> could be construed as *forbidding* processors from generating a more constraining, and thus more useful, DTD. If we want XML processors to be able to generate DTDs from sets of instances (the way OCLC's Fred does), and to compete on the quality of the DTDs they can generate (and I certainly want that), then we don't want to forbid such competition. And competition is indeed a good idea here, since as has been pointed out it's not always clear which of the many possible explicit DTDs is the most useful for further work with the document in question, or other documents. David (and, independently, Jon Bosak) also pointed out that if an XML processor is required to treat a well-formed XML document with no explicit DTD as legal, then (a) it doesn't matter whether it generates, internally, a maximally constraining DTD, or a maximally permissive DTD such as the Waterloo DTD described above, or some intermediate DTD, or even whether it generates an identifiable DTD data structure at all, and (b) we couldn't tell what it does even if it did matter, because the resulting behavior (accept a well-formed document as legal) is the same in all cases. So it's misleading to *define* the particular implicit DTD which a processor is supposed to assume. These arguments, coupled with the fact that no one but Charles and I seem to think the appeal to an 'implicit' DTD makes any sense, have led me to conclude that there is no gain in using the concept of an implicit DTD in describing XML processor behavior. Since no one seems to be arguing the contrary (Charles is arguing, quite rightly, that the notion itself is not internally contradictory, but that's not in itself an argument for using it in the documentation), the question of implicit DTDs can, I think, be put to rest. As an editorial, not a technical matter, the editors of the XML spec have no intention of appealing to the notion of an implicit DTD. The technical aspects of the question are not relevant, since the required behavior can be explained with or without such an appeal, and it's clear that the notion of implicit DTDs will confuse, not clarify, the issues for some readers. -C. M. Sperberg-McQueen
Received on Tuesday, 22 October 1996 10:11:59 UTC