XML 1.0 - reading confusion - parsed vs unparsed

   Date: Sat, 25 Apr 1998 08:12:27 -0700
   From: Tim Bray <tbray@textuality.com>

   At 11:43 AM 4/24/98 EDT, Kent M Pitman wrote:
   >The introductory text in section 4, Physical Structures, is very
   >confusing.  It uses a meaning for "parsed" which is alien to any
   >meaning of "parsed" that I am familiar with.

   Well, I must say that I'm impressed at the intensity you've been putting
   into reading the XML spec.  I am sorry that you find it so disappointing.

Well, I emphasize that one of the disappointments was that it came so close
to repairing a problem that had bugged me so much about SGML.  You kind of
set up the expectation by showing that it was within grasp of fixing... so
take at least a little of my comment as a compliment in that it's obvious
you have some people who were working hard on simplification.  I just wanted
to weigh in strongly on the idea that certain simplifications matter in very
material ways.

   I'll try to find the time in the near future to address the points
   you raise, but before that, a couple of meta-points are in order:

No problem.  I'm happy to see responses but I'm not blocked in any way
by any failure on your part to respond; mostly I'm just happy to know
they're filed where they can be discussed by your group and action
taken if/when it is ever appropriate to take action again.  I just
wanted to get my thoughts into the pipeline while I was thinking about
them.  I had done a careful compare of the old draft I had and the new
document and that seemed the appropriate time...

   First, XML 1.0 is effectively frozen now and will not be changing.  Yes,
   there are shortcomings, but at some point we had to draw the line and
   ship, so on Feb. 10th, we did.

   Second, others, who find the spec less unsatisfying than you do, have
   charged in and implemented a wide variety of parsers and tools in a
   variety of languages; so far, they seem to offer very high 
   interoperability (it helps having James Clark in the field) - thus
   for most developers, details of syntax can generally be ignored and
   outsourced to the XML processor authors.

   Having said that, all your input has gone in my "errata" file and
   will be considered carefully when, if ever, we do another revision of
   the XML spec.

Yeah, I'm Project Editor for J13 (formerly X3J13), the committee that
produced ANSI Common Lisp.  I'm familiar with the problems of the
standards cycle and have just such a file myself for exactly the same
reason.  My comments will keep.

   Finally, as to your specific point regarding "parsed" and "unparsed" -
   the committee kicked around lots of options.  In earlier drafts we
   had used "text" and "binary" but that was unsatisfactory since "binary"
   might in fact be text.  In fact, the only distinguishing characteristic
   of "binary" entities (what SGML calls "data" entities) is that they are
   not read and parsed by the XML processor.  So the correct label should
   be "NotToBeReadByTheXMLProcessor", for which "unparsed" seemed to us
   an acceptable contraction.  Then for symmetry, the other kind is called
   "parsed".  I agree with you that there are other usages of the word 
   "parsed", but I do feel that our usage is legitimate and unsurprising.

I guess the essence of my point was really that nowhere in the
discussion of "parsed" does it SAY that "parsed" means "by XML".
Honestly, I had to read this section way too many times before I
figured out what it meant, and some extra verbiage would have helped
because the concepts I finally figured out it was offering me were not
as complex as I had feared. Even just a single sentence that says
``Some documents are intended to be parsed by XML; we'll called those
"parsed".'' would help a lot.  I don't care what formal terms you
create--if you see the Common Lisp HyperSpec(TM), at
 http://www.harlequin.com/books/HyperSpec/FrontMatter/
you'll notice I have a glossary of about 70 printed pages of English
terms that I hijacked for my own use in describing ANSI Common Lisp.
But I do offer definitions so that people don't get confused between the
common and formal meanings.  I think it's the absence of the phrase
"not to be read by the XML processor" for "unparsed", etc. that got me.

   BTW, can we infer from your close attention that Harlequin is going
   to do something interesting with XML?

I can't speak for the company about what the company will do.  I can
observe that we do a heavy amount of business in digital printing and
publishing--we make a high-end PostScript RIP that supports many major
publishers in publishing document content.  At this point we're
"tracking XML seriously".  Whether we make any products out of it
will, I imagine, depend on customer demand.  My sense is that the
industry is warming to XML, but also that it's too early to tell for
sure.  And as I'm `just' a technology developer, I don't have a say in
what specific products we release.  However, I think it's safe to say
that if our customers ask for it, we'll support it.

Received on Saturday, 25 April 1998 13:14:42 UTC