- From: William D. Lindsey <blindsey@bdmtech.com>
- Date: Sat, 14 Sep 1996 18:28:33 -0600
- To: w3c-sgml-wg@w3.org
I'm interested in the question of what value there is in requiring un-minimized end tags when omittag is not allowed. Requiring redundant information and allowing humans into the process of maintaining it will guarantee errors. Someone has suggested that it makes XML processors a little easier to write since they won't need a stack. I don't see what kind of useful processing you can do with structured data while not keeping track of its context in the structure. You'll want a stack with or without named end tags. I decided to play around with NET to see if I could come up with an SGML tagging scheme which eliminates named end tags while (in my opinion) improving legibility and programmer-friendliness. I apologize in advance to anyone whose sensibilities are offended by this kludgery. We can do this by making some adjustments to the concrete syntax. Define NET such that it is a right-hand side of a commonly recognized character pair. For instance "]", ">", "}" or ")". We'll define some other delimiters while we're at it, taking care to match the number of left and right hand characters in the delimiter strings. Since we're all learning Scheme these days, I'll use parentheses in this example. SHORTREF must be disabled, along with OMITTAG and CONCUR. NET ")" STAGO "((" ETAGO "(/" TAGC ")/)" <!-- needs two close parens, but must be lexically distinct from two NETs --> This results in a syntax for which any text editor with paren matching capability can be used to assist in navigating an instance. Some of them, e.g. Emacs lisp-mode, can be used for pretty-printing as well. My feeling is that it also helps make the element structure lexically evident: ((gi attr="val") Here is the content.) ((again) We can still use un-minimized end tags. (/again)/) [ Thanks and apologies to Arjun Ray for this idea. ] Here's a common tag-souper's mistake: <p> foo <bold> stuff <ital> more stuff</bold> what's this?</ital></p> Compared to: ((p) foo ((bold) stuff ((ital) more stuff) what's this?)) An interesting by-product of this delimiter scheme is an alternate version of Huitfeld's and DeRose's Trapeze Act solution to recognizing EMPTY elements in the absence of a DTD. We switch the requirement of using the NET enabling start tag from the EMPTY element tag, to all non-empty element tags. That way, NET can be used to terminate every element. This has the same advantages and disadvantages of the Trapeze Act. I believe it also has the same effect as Charles Goldfarb's proposed [ S | E ]TAGC revision to 8879. Here is the example from Michael Sperberg-McQueen's summary of the EMPTY element problem. "blort" has declared content EMPTY. ((p) foo ((blort)/) bar ) <!-- use of TAGC tells XML processor that blort is empty --> ((p) foo ((noblort) bar )) <!-- <p> foo <noblort> bar </noblort></p> --> ((p) foo ((noblort)) bar ) <!-- <p> foo <noblort></noblort> bar </p> --> ((p) foo ((blort) bar)) <!-- This should be an error. It tries to put data in an element declared EMPTY <p> foo <blort> bar </blort></foo> --> ((p) foo ((blort)) bar) <!-- So should this --> ((p) foo ((noblort)/) bar) <!-- Hmmm, XML would think this is empty, while an un-XML-fettered SGML parser would think " bar " is blort's content. An XML validator with a DTD should report this as an error --> While I won't be shocked to learn that I've overlooked an obvious flaw, I do hope the above encourages thought in two areas: 1) Are un-minimized end tags really helpful in light of all the other proposed restrictions for XML? 2) If we're going to standardize on a concrete syntax, is the reference concrete syntax the best possible? Thanks for your patience. -Bill -- William D. Lindsey blindsey@bdmtech.com +1 (303) 672-8954
Received on Saturday, 14 September 1996 20:29:07 UTC