[Prev][Next][Index][Thread]

Re: C.4 Undeclared entities?



[Medium-long note.  Executive summary:  the discussion of 'implicit'
DTDs has served its purpose and can be ended without damage to
the process of designing and documenting XML 1.0.]


The discussion of explicit and implicit DTDs has confused me a
lot, mostly because the replies seem to be about topics very
different from the postings to which they are ostensibly replying.

A few rounds back, Len Bullard asked about the point of this debate.
For what it's worth, this is my understanding of the original point:

  - the ERB asked what to do about references to undeclared entities.
  - Charles Goldfarb suggested they should be treated as in 8879,
    i.e. as errors.  Along the way, he referred to the fact that
    any document can be viewed as being of some type, and that
    if that document type is not explicitly defined, one can
    nevertheless regard it as having some definition implicit in
    its usage.

Unfortunately, Charles used the term 'DTD' for the set of rules
governing the use of a document type -- not surprising, since that is
how 8879 defines it -- and many others interpreted it as referring to
the set of explicit declarations provided -- again not surprising,
since that is a very widespread usage despite what 8879 says.  The
difference between the set of rules governing X and the set of
explicitly formulated rules governing X is, needless to say, the
set of implicit rules governing X.  And that's the beginning of our
rabbit trail.

Implicit declarations are a reasonably common phenomenon in many
formal languages (the rule in C that an undeclared identifier is
assumed to be a variable or a function of type int is an easy
example); the charges that they involve mysticism or magic seem way
overblown to me.  It is easy to define the behavior of an XML
processor working with an empty or incomplete set of declarations as
being governed by an *implicit* set of declarations -- I can say
that, because I have already sketched out language that does so.

An XML processor which has no explicit markup declarations has no
warrant to flag any errors except those which violate some basic rule
of XML, like element nesting or attribute quoting.  That is, an XML
processor which has no explicit markup declarations must behave
pretty much identically to an XML processor which has explicit markup
declarations in which every element is declared

  <!ELEMENT foo - - ANY >

and every attribute is declared ATTNAME CDATA #IMPLIED.  (This is the
form which Lou Burnard and I call the Waterloo DTD, in honor of the
Waterloo Centre for the Oxford English Dictionary, though Tim Bray
has pointed out with some edge that no such DTD was ever used at
Waterloo.  The responsibility for the name must be borne by Lou and
myself.)  Note that here I part company with Charles, since he
posits a maximally constraining DTD and I posit a maximally
permissive DTD.  As will be seen below, this turns out to make no
difference at all.

It seems to me that the only point of interest here is whether
explicit reference to some notion of an implicit DTD is useful or
not, in explaining the behavior of a processor faced with incomplete
or nonexistent declarations.

It seems Charles and I agree in our instincts that it is, or could
be.  It seems clear from the confused reactions of others that our
instincts are wrong in this case:  the idea is simple, but the
obvious way of expressing it conveys something other than that simple
idea to many readers who ought, if possible, to be able to read the
XML spec with comprehension, if not always with the highest pitch of
aesthetic pleasure.  That's an argument against using the notion in
the documentation.

David Durand also pointed out to me that *requiring* a processor to
construct, or behave as if it had constructed, element declarations
of the form <!ELEMENT foo - - ANY> could be construed as *forbidding*
processors from generating a more constraining, and thus more useful,
DTD.  If we want XML processors to be able to generate DTDs from sets
of instances (the way OCLC's Fred does), and to compete on the
quality of the DTDs they can generate (and I certainly want that),
then we don't want to forbid such competition.  And competition is
indeed a good idea here, since as has been pointed out it's not
always clear which of the many possible explicit DTDs is the most
useful for further work with the document in question, or other
documents.

David (and, independently, Jon Bosak) also pointed out that if an XML
processor is required to treat a well-formed XML document with no
explicit DTD as legal, then (a) it doesn't matter whether it
generates, internally, a maximally constraining DTD, or a maximally
permissive DTD such as the Waterloo DTD described above, or some
intermediate DTD, or even whether it generates an identifiable DTD
data structure at all, and (b) we couldn't tell what it does even if
it did matter, because the resulting behavior (accept a well-formed
document as legal) is the same in all cases.  So it's misleading
to *define* the particular implicit DTD which a processor is supposed
to assume.

These arguments, coupled with the fact that no one but Charles and
I seem to think the appeal to an 'implicit' DTD makes any sense, have
led me to conclude that there is no gain in using the concept of an
implicit DTD in describing XML processor behavior.  Since no one
seems to be arguing the contrary (Charles is arguing, quite rightly,
that the notion itself is not internally contradictory, but that's
not in itself an argument for using it in the documentation), the
question of implicit DTDs can, I think, be put to rest.

As an editorial, not a technical matter, the editors of the XML spec
have no intention of appealing to the notion of an implicit DTD.
The technical aspects of the question are not relevant, since the
required behavior can be explained with or without such an appeal,
and it's clear that the notion of implicit DTDs will confuse, not
clarify, the issues for some readers.

-C. M. Sperberg-McQueen


Follow-Ups: