- From: Arjun Ray <aray@nmds.com>
- Date: Sun, 15 Sep 1996 03:01:29 -0400
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
[Apologies in advance to this list for a somewhat lengthy message] :Michael Sperberg-McQueen: | On the other hand -- there are one or two uses of SHORTTAG that I don't | think complicate parsing all that much, and might be retained: | | - empty end-tags :Tim Bray: | Please, no; these save a tiny number of bytes, make it harder for | both humans and computers to understand, and for people who don't | already know SGML, have to be explained. Also, on purely CS-theory | grounds, they push an XML parser over the edge from a pure automaton | to something that has to keep a stack. OK, the cost of keeping a stack | is not high, but neither is the benefit of using </>. :Bill Lindsey: | Someone has suggested that it makes XML processors a little easier | to write since they won't need a stack. I don't see what kind of | useful processing you can do with structured data while not | keeping track of its context in the structure. You'll want a | stack with or without named end tags. I believe this is true. Automata without stacks are limited to Type 3 languages -- basically regular expressions at the syntactic level. But matching start-tag GIs with end-tag GIs is essentially embedding, which Type 3 languages can't handle[1]. In general the abstract representation of a document will be a tree (of some sort), to construct which will require the power of at least CF methods. Also, I believe that the SGML tendency to conflate parsing and validation is at work here. Validation isn't a parsing problem. It's a recognition problem, for which in many cases only a FSA is needed. But this shouldn't predicate the parsability of a XML instance on stackless automata. (Apologies to the Perl faction.) So, I'd like to introduce a reconsideration of empty end-tags. I'll argue that they're necessary -- and beneficial. :Bill Lindsey: | I'm interested in the question of what value there is in requiring | un-minimized end tags when omittag is not allowed. Requiring | redundant information and allowing humans into the process of | maintaining it will guarantee errors. Thank you. The idea is to have a *popular* language, I believe. But for this to happen, we have to consider the possibility of unsophisticated users "getting involved". They may come to XML from "outside", without the benefit of any structured program of appropriate training. Here, the critical consideration is that (bright) beginners should not be led to draw fundamentally wrong conclusions, nor should they be required to suspend any obvious impulse to declare "Gee, that's kinda stoopid" <aside temptation=irresistable> User: why do I have to say compact="compact"? Guru: Because Those Are The Rules. They're Good For You. User: [bleep] </> The basic problem with unminimized endtags has been demonstrated, I dare say conclusively, by the HTML experience. Quite unnecessarily, they give a "tag-souper" the *option* to insert them anywhere he pleases, rather than where they must go. The enormous benefit of "anonymous right braces" is that their syntactic interpretation is unambiguous: the user either gets it right or wrong. This has clearcut pedagogical value. (Indeed, GI-laden endtags can be an argument *for* partially overlapping elements, because then such a syntactic device would be necessary! Without those GIs, the user *can't* get overlaps even if he tried.) In other words, the essential argument for empty endtags isn't the keystrokes they save, but the illusions (ie "mistakes" that require *theoretical* explanation) they prevent. Finally, with OMITTAG disallowed and a solution to the lexical problem with empty elements (e.g. STAGC), there's no reason that empty endtags shouldn't be the rule rather than the exception: the GI is unambiguously just lexical overhead. In the more general case where OMITTAG isn't disallowed, an *explicit* GI could function as the trigger for the simultaneous close of more than one open element. (However, writing a parser for this may require a little more than just LL or LR methods.) In any case, as a counterargument, could anyone demonstrate -- without resorting to the known empty element problem -- how a GI on an endtag actually simplifies parsing? (e-mail is fine if my heresies are getting seriously off-topic. Thanks.) Regards, Arjun [1] See, e.g. Gyorgy Revesz, _Introduction to Formal Languages_, Dover 1991 (reprint) ISBN 0-486-66697-2, Theorems 3.10, 6.1, 6.2
Received on Sunday, 15 September 1996 02:59:53 UTC