Re: B.10 Empty elements?

lee@sq.com wrote:

> So we can use <e/> and understand that existing SGML parsers will break,
> but w'll try and fix that by changing SGML and hope that at least some
> of the commercial SGML parsers are updated -- maybe most of them.
> (or can this be done without breaking things?  I don't see how, although
> if James said it could, he is right, and I have just forgotten!)
> (oh, OK, a shortref or datatag mapping to > perhaps?)


Full-featured SGML parsers (i.e., SP) can be tricked into 
accepting that syntax by specifying "/>" as the NET delimiter
and setting SHORTTAG YES in the SGML declaration.
Thus this:  <e/>  gets parsed as

    STAGO, generic identifier, NET,

and since 'e' is (presumably) an EMPTY element the
next NET's not necessary.

I think the syntax is OK from an aesthetic standpoint, but
there are a *lot* of things that can go wrong....
First off it requires SHORTTAG YES, which means that all
the other SHORTTAG features (that we have already decided
we do *not* want) get enabled.

More importantly, if somebody feeds an XML document to
an SGML parser and forgets to supply the right SGML declaration --
or tries to use SGMLS for that matter, which won't let
you change the RCS delimiter strings -- she'll end up
with a spurious ">" after every EMPTY element.
Or if somebody (naively or accidentally) types <e/>
for an element that is not in fact EMPTY, his parser
stack will get severely out of whack.

My current preference is <@e>, which requires @ being added to
NMSTART and an XML application convention for naming EMPTY
elements.  This would be much more robust.  (SGMLS won't let you
do this either, but it could be hacked to do so more easily than
it could the "NET=/>" trick.  SP can handle either solution.  I
don't know about other parsers.)


--Joe English

  jenglish@crl.com

Received on Wednesday, 23 October 1996 20:16:02 UTC