XML and required DTDs

If I understand them correctly, it seems that Paul Grosso, Len
Bullard, and Robin Cover have all raised the same concern, by
pointing to situations in which it's necessary to have a DTD, whether
for authoring, for contractual purposes, or for other applications.

Let's distinguish four cases:

a DTDs Required for Parsing.

Declarations are always required, and in practice all applications must
process them on each run, since it's impossible to parse the document
correctly without knowing which elements are EMPTY, which are CDATA,
etc.  (It's also possible to cache the essential information in some
other form, but that's just a standard store-for-compute tradeoff.)
This, roughly, is the situation with 8879:1986.

b DTDs Required for Validation

Declarations are always required, but the language is so constructed
that the document can at least be parsed correctly* without reading the
DTD.r declarations need not always be read.  I.e. it's possible to do
some kinds of useful work even without reading all the declarations.
This, roughly, is what SGML would be like if the ETAGC proposal is
adopted (at least for Minimal SGML documents without references to
external entities).

     * (within some limits -- element boundaries and content
     will be correctly identified; some non-significant white
     space may be preserved unnecessarily)

c DTDs Optional

Declarations are always allowed, but not always required; the system
makes certain default assumptions if no declarations are provided.  This
is the approach taken by PSGML.  (Or C, if a programming-language
analogy is useful.  In programming, I don't find this helpful at all,
but Tim and others have suggested plausibly that it may be useful in
XML.)

d DTDs Forbidden

Declarations are never allowed; the system makes certain assumptions
about things, and your usage had better agree.  I don't know of any
serious markup languages that do this; the only analogy I can think of
is Basic, in the form that requires names of string variables to end in
$ and so on.

----

Cover, Grosso, and Bullard seem to be arguing against (d), but I'm not
sure whom they are arguing against.  My own reading of the goals
document is that (b) or (c) should apply -- or at least, that (a) is not
what's wanted here.  I don't think anyone is actually in favor of (d),
and if the current phrasing of the goals statement gives readers that
impression, then it needs to change.  Can someone who finds the current
phrasing confusing suggest a less confusing alternative?

It does, on the other hand, seem to me that we haven't got a clear
consensus on the choice between (b) and (c).

Myself, I lean toward (c), because it makes startup for light-weight
applications or experimentation a bit easier.  Applications which need
declarations will always be able to use them -- and XML processors (like
good C compilers) can provide an option to warn, or treat as errors, any
use of undeclared element types.  Note that choosing (c) is not the same
as saying validating editors are impossible, as Paul Grosso seems to
suggest -- any more than C compilers can't do syntax checking, just
because variables can be declared implicitly.

A validating editor just doesn't belong to the class of applications for
which declarations are inessential.  When presented with an XML document
which has no DTD, it might (a) warn about missing declarations, (b)
silently assume <!ELEMENT foo - - ANY>, or (c) something else entirely.

-C. M. Sperberg-McQueen
 ACH / ACL / ALLC Text Encoding Initiative
 University of Illinois at Chicago
 tei@uic.edu

Received on Friday, 13 September 1996 15:51:36 UTC