- From: Kent Pitman <kmp@harlequin.com>
- Date: Wed, 6 May 98 11:59:16 EDT
- To: tbray@textuality.com
- Cc: xml-editor@w3.org
Date: Wed, 06 May 1998 08:26:27 -0700 From: Tim Bray <tbray@textuality.com> At 03:59 AM 5/6/98 EDT, Kent M Pitman wrote: >The XML 1.0 specification seems to go out of its way to make a CDStart [19] >appear as a single token '<![CDATA[' even though both common sense and >the SGML specification (Section 10.4 Marked Section Declaration, definitions >[93] and [97] and [100]) would lead one to expect that all marked section >declarations are uniformly treated and permit Yes, it is a deliberate matter of design that CDATA marked sections effectively have a 9-char start delimiter and 3-char end delimiter. They are the only kind of marked section that can appear outside of the DTD, so the argument from parallelism with include/ignore loses force. Once again, a nod in the direction of making lightweight non-validating processors easy. -Tim I'm really not impressed by this answer. First, you've made a parser design that presumes an implementation strategy for said lightweight parsers. I have written lightweight parsers, but I could never bring myself to write a parser that treated "<![CDATA[" as a single token. It is NOT a "natural concept" to have a token that is made up of so substantially bizarre a set of characters, and it cries out to have students say "how on earth was that chosen"? And once having learned that this is an SGML subset that has more flexibility, you can't help but code in a little flexibility so you aren't slammed when the committee finally gets some sense and extends it to what it should have been in the first place. (Program design based around 'accidental truth' rather than 'grand truth' is fragile--it's like happening to note that all operator names have a string length that's a prime number and designing some lookup table around it--it just awaits the day someone makes an operator name that's not and breaks things.) And there is every reason to believe that W3 will add featurism later, since the average size of your specs (e.g., CSS and HTML) are growing by factors of 5 and 10 in the second round version... (sigh) Second, my complaint is not the choice but the inconsistency of the choice. If you value lightweight parsers enough to make the design deicision that way, all you have to do is propagate your design choice back into the rest of the language in a regular fashion in order to have been consistenct. You can remove the option of whitespace in conditionals to "fix" my cited problem with no damage to the lightweight case. Just let people write: <[%foo;[ where they now write: <[ %foo; [ And don't tell me it's this way so that people can write: <[ %foo; [ because the same argument can be made for <[CDATA [ and you've disallowed it there. Third, I'd be thrilled to see XML be a language which had NO dtd part at all. I think the DTD part buys it nothing and that an XML spec that was wholly adequate could be done without it. Moreover, I think if the lightweight thing carries any weight at all, it should have a whole spec all its own, separate from the XML spec that permits a DTD, just to make it clear how simple it really is. Because right now I tell you the DTD stuff takes up most of the spec and is till a serious entry barrier (not as much as for SGML, but still serious) to both implementors AND users... and needlessly so. But given that the DTD part is there, I don't understand allowing the lightweight side to drive the day--since it's got an incomplete view of the world. Without thinking very hard about it, I bet I can show you a dozen other decisions that did not fall in favor of lightweight so I don't believe "prefer the lightweight version" is a true design criterion. To raise it only where convenient seems a "cop out" to me. If you tell me it really was uniformly applied as such, I'll be happy to start sending bug reports where I don't think you succeeded. --Kent - - - - - Disclaimer: These opinions are my own and do not necessarily reflect the official position of any company or organization with which I may be affiliated.
Received on Wednesday, 6 May 1998 12:00:37 UTC