W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > October 1996

Re: B.2 DTD required or optional?

From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
Date: Tue, 15 Oct 96 10:00:45 BST
Message-Id: <9324.9610150900@grogan.cogsci.ed.ac.uk>
To: Tim Bray <tbray@textuality.com>
Cc: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Tim Bray writes
>  One concept is that the XML spec should define two separate characteristics
>  that a document can have, well-formedness and validity.
>  A. A "Well-Formed XML Document" has 
>  1. syntactically distinguishable elements, attributes, comments, and PI's
>  2. one root element
>  3. all elements properly nested within each other
>  4. no text outside the root element
>  Such a document may or may not contain markup declarations.  If it does,
>  well-formedness also allows
>  5. empty elements, if they're declared
>  6. entity references, if they're declared
>  B. A "Valid XML document" means there's a full DTD and it's been validated
>     in the familiar SGML sense.

I think this is the right approach, with a moderate extension of the
list of properties of well-formedness, to allow for defaulted and CONREF
attributes, possibly mixed content model (note this is challenge to
the 'just use the normal declaration' strategy, although I suppose
<!ELEMENT foo - - ((#PCDATA|foo)*)>
would do.), possibly other things.

To recapitulate something I said a while back, if the declarations are
NOT there, a conforming application can do whatever it likes,
e.g. with respect to empty elements (which can and should on this
account remain as simply <E>) report an error after treating the
element as normal, but if the declarations ARE there a conforming
application must make use of them.

What this amounts to for specification/documentation purposes is
identifying a core and a set of (independent) extensions.  The core is
unequivocally processable without any declarations, but for each
extension type there is a required accompanying declaration type.

Oh yes, I agree with Martin that the single root requirement is too
strong, all we need is 'all text in elements'.


I like the idea of propaganda in sigs -- here's mine:

Reference concrete syntax means just that:  NO NEW SYNTAX
Received on Tuesday, 15 October 1996 05:00:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:04 UTC