[Prev][Next][Index][Thread]

Re: Empty elements (and processing without a DTD)



A few nits re. EMPTY:

At 06:20 PM 09/11/96 CDT, Michael Sperberg-McQueen wrote:

>2 The Elephant's Child, or, the Insatiable Content Model (Clark).  James
>has already mentioned [in the ERB discussions] this way of ensuring that
>an element is always empty, even though it is not necessarily declared
>using the EMPTY keyword.  E.g.
>
>  <!ELEMENT blort - - (nonesuch?) >

I don't think we have the same 'Con' in XML as in SGML, namely having to
reserve a name. The Elephant's Child is a way to obviate the syntactic
construct that SGML calls "EMPTY declared content" by using a different SGML
construct to get the desired effect. There are other SGML ways that
accomplish this that also avoid the reserved-name issue, like:

  <!ELEMENT blort - - (blort?) -(blort)>

But my main point is that to do The Elephant's Child in XML, is simply to
say there is no such thing as EMPTY declared content. It is always possible
(easy, in fact) to construct an *SGML* DTD which has this effect (even if we
don't specify how, or at least don't specify a name like 'nonesuch') -- so
we have compatibility in the important sense that one can always construct
an SGML DTD under which the XML document instance will parse to the same
structure it is defined to have in XML.

>
>3 The Trapeze Act -- working *with* a net (Huitfeldt, DeRose).  The
...
>(This is slightly different from Steve's proposal, which involved
>two slashes, not one.)

The mover accepts the amendment as expressing the intent of the original
motion. 

...
>  element  : starttag content endtag
>           | emptytag
>           ;
>  starttag : STAGO NAME attspecs TAGC
>           ;
>  emptytag : STAGO NAME attspecs NET
>           ;

You can improve "The Trapeze Act -- working *with* a net" [good heavens, I
just now got the joke] by choosing some string other than "/" to go in these
(XML!!!!) productions. Changing NET to that string in the *SGML* declaration
used to obtain XML compatibility still works just fine, but you get a more
intuitive appearance, such as "<ART/>" for empties.

>
>7 Cooked Bits.  A variant of the Raw Bits (parse it and like it!)
>approach would be to provide a different syntax for the declarations, to
>make them really, really easy to parse.  Since we already agree that
>we'll need a filter to take an XML document and prepare an SGML
>declaration and prolog for it, we do have this as an option.  It has
>been suggested, for example, that DTDs are structured information, and
>might usefully be represented using the same techniques we suggest for
>other structured information, namely SGML / XML document instances.
>This has the beneficial side effect of reducing the size and complexity
>of the context-free grammar by about half.

This isn't so bad, especially given that we'll likely end up having a
simplified DTD-like thing for some subset of entity dcls, notation dcls,
attribute defaulting dcls, and for something akin to content models to
enable validation (hopefully NOT to enable parsing, which should be possible
without such). The theory would be "declare at least what's obviously needed
-- and that's in fact all you need to declare". Namely, if you're going to
reference an entity you better declar it; etc.