XML 1.0 - query - S not allowed in CDStart ??

   Date: Wed, 06 May 1998 08:26:27 -0700
   From: Tim Bray <tbray@textuality.com>

   At 03:59 AM 5/6/98 EDT, Kent M Pitman wrote:
   >The XML 1.0 specification seems to go out of its way to make a CDStart [19]
   >appear as a single token '<![CDATA[' even though both common sense and
   >the SGML specification (Section 10.4 Marked Section Declaration, definitions
   >[93] and [97] and [100]) would lead one to expect that all marked section 
   >declarations are uniformly treated and permit 

   Yes, it is a deliberate matter of design that CDATA marked sections 
   effectively have a 9-char start delimiter and 3-char end delimiter.
   They are the only kind of marked section that can appear outside of
   the DTD, so the argument from parallelism with include/ignore loses
   force.  Once again, a nod in the direction of making lightweight 
   non-validating processors easy. -Tim

I'm really not impressed by this answer.

First, you've made a parser design that presumes an implementation
strategy for said lightweight parsers.  I have written lightweight parsers,
but I could never bring myself to write a parser that treated "<![CDATA["
as a single token.  It is NOT a "natural concept" to have a token that is
made up of so substantially bizarre a set of characters, and it cries out
to have students say "how on earth was that chosen"?  And once having learned
that this is an SGML subset that has more flexibility, you can't help but 
code in a little flexibility so you aren't slammed when the committee finally
gets some sense and extends it to what it should have been in the first place.
(Program design based around 'accidental truth' rather than 'grand truth' is
fragile--it's like happening to note that all operator names have a string
length that's a prime number and designing some lookup table around it--it
just awaits the day someone makes an operator name that's not and breaks 
things.)  And there is every reason to believe that W3 will add featurism
later, since the average size of your specs (e.g., CSS and HTML) are growing
by factors of 5 and 10 in the second round version... (sigh)

Second, my complaint is not the choice but the inconsistency of the choice.
If you value lightweight parsers enough to make the design deicision that
way, all you have to do is propagate your design choice back into the
rest of the language in a regular fashion in order to have been consistenct.
You can remove the option of whitespace in conditionals to "fix" my cited 
problem with no damage to the lightweight case.  Just let people write:
 <[%foo;[
where they now write:
 <[ %foo; [
And don't tell me it's this way so that people can write:
 <[ %foo;
 [
because the same argument can be made for 
 <[CDATA
 [
and you've disallowed it there.

Third, I'd be thrilled to see XML be a language which had NO dtd part
at all.  I think the DTD part buys it nothing and that an XML spec
that was wholly adequate could be done without it.  Moreover, I think
if the lightweight thing carries any weight at all, it should have a
whole spec all its own, separate from the XML spec that permits a DTD,
just to make it clear how simple it really is.  Because right now I
tell you the DTD stuff takes up most of the spec and is till a serious
entry barrier (not as much as for SGML, but still serious) to both
implementors AND users... and needlessly so.  But given that the DTD
part is there, I don't understand allowing the lightweight side to
drive the day--since it's got an incomplete view of the world.
Without thinking very hard about it, I bet I can show you a dozen
other decisions that did not fall in favor of lightweight so I don't
believe "prefer the lightweight version" is a true design criterion.
To raise it only where convenient seems a "cop out" to me.  If you
tell me it really was uniformly applied as such, I'll be happy to
start sending bug reports where I don't think you succeeded.
 --Kent

- - - - -
Disclaimer: These opinions are my own and do not necessarily reflect
 the official position of any company or organization with which I 
 may be affiliated.

Received on Wednesday, 6 May 1998 12:00:37 UTC