Empty endtags (Was: short-tag considered unhealthy) from Arjun Ray on 1996-09-15 (w3c-sgml-wg@w3.org from September 1996)

From: Arjun Ray <aray@nmds.com>
Date: Sun, 15 Sep 1996 03:01:29 -0400
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <1.5.4.32.19960915070129.0031a650@www.nmds.com>
 [Apologies in advance to this list for a somewhat lengthy message]

:Michael Sperberg-McQueen:
| On the other hand -- there are one or two uses of SHORTTAG that I don't
| think complicate parsing all that much, and might be retained:
|
|  - empty end-tags

:Tim Bray:
| Please, no; these save a tiny number of bytes, make it harder for
| both humans and computers to understand, and for people who don't
| already know SGML, have to be explained.  Also, on purely CS-theory
| grounds, they push an XML parser over the edge from a pure automaton
| to something that has to keep a stack.  OK, the cost of keeping a stack
| is not high, but neither is the benefit of using </>.

:Bill Lindsey:
| Someone has suggested that it makes XML processors a little easier
| to write since they won't need a stack.  I don't see what kind of
| useful processing you can do with structured data while not
| keeping track of its context in the structure.  You'll want a
| stack with or without named end tags.

I believe this is true. Automata without stacks are limited to Type 3
languages -- basically regular expressions at the syntactic level. But
matching start-tag GIs with end-tag GIs is essentially embedding, which
Type 3 languages can't handle[1]. In general the abstract representation
of a document will be a tree (of some sort), to construct which will 
require the power of at least CF methods. Also, I believe that the SGML 
tendency to conflate parsing and validation is at work here. Validation 
isn't a parsing problem. It's a recognition problem, for which in many 
cases only a FSA is needed. But this shouldn't predicate the parsability 
of a XML instance on stackless automata. (Apologies to the Perl faction.)

So, I'd like to introduce a reconsideration of empty end-tags. I'll
argue that they're necessary -- and beneficial.

:Bill Lindsey:
| I'm interested in the question of what value there is in requiring
| un-minimized end tags when omittag is not allowed.  Requiring
| redundant information and allowing humans into the process of
| maintaining it will guarantee errors.

Thank you. The idea is to have a *popular* language, I believe. But for
this to happen, we have to consider the possibility of unsophisticated
users "getting involved". They may come to XML from "outside", without
the benefit of any structured program of appropriate training. Here, the
critical consideration is that (bright) beginners should not be led to
draw fundamentally wrong conclusions, nor should they be required to
suspend any obvious impulse to declare "Gee, that's kinda stoopid"

<aside temptation=irresistable>
User: why do I have to say compact="compact"?
Guru: Because Those Are The Rules. They're Good For You.
User: [bleep]
</>

The basic problem with unminimized endtags has been demonstrated, I dare
say conclusively, by the HTML experience. Quite unnecessarily, they give
a "tag-souper" the *option* to insert them anywhere he pleases, rather
than where they must go. The enormous benefit of "anonymous right braces"
is that their syntactic interpretation is unambiguous: the user either
gets it right or wrong. This has clearcut pedagogical value. (Indeed, 
GI-laden endtags can be an argument *for* partially overlapping elements,
because then such a syntactic device would be necessary! Without those
GIs, the user *can't* get overlaps even if he tried.) In other words,
the essential argument for empty endtags isn't the keystrokes they save,
but the illusions (ie "mistakes" that require *theoretical* explanation)
they prevent.

Finally, with OMITTAG disallowed and a solution to the lexical problem
with empty elements (e.g. STAGC), there's no reason that empty endtags
shouldn't be the rule rather than the exception: the GI is unambiguously
just lexical overhead. In the more general case where OMITTAG isn't
disallowed, an *explicit* GI could function as the trigger for the
simultaneous close of more than one open element. (However, writing a
parser for this may require a little more than just LL or LR methods.)

In any case, as a counterargument, could anyone demonstrate -- without
resorting to the known empty element problem -- how a GI on an endtag
actually simplifies parsing? (e-mail is fine if my heresies are getting
seriously off-topic. Thanks.)



Regards,
Arjun
 







[1] See, e.g. Gyorgy Revesz, _Introduction to Formal Languages_,
    Dover 1991 (reprint) ISBN 0-486-66697-2, Theorems 3.10, 6.1, 6.2
Received on Sunday, 15 September 1996 02:59:53 UTC