Re: RS/RE: basic questions from Michael Sperberg-McQueen on 1996-10-02 (w3c-sgml-wg@w3.org from October 1996)

From: Michael Sperberg-McQueen <U35395@UICVM.CC.UIC.EDU>
Date: Wed, 02 Oct 96 14:29:14 CDT
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <199610022016.QAA20432@www10.w3.org>
On Wed, 2 Oct 1996 14:37:19 -0400 Eliot Kimber said:
>At 01:44 PM 10/2/96 -0400, David G. Durand"  (David G. Durand wrote:
>
>>My argument against quoting is that SGML compatibility should not
>>be _more_ important than user utility (and familiarity is a
>>significant component of utility for most busy people who aren't
>>toolsmiths).
>
>David has, I think, crisply defined the key to this issue.  Either

Agreed.  Except that he concedes too much.  Familiarity is important
for getting the toolsmiths on board, too.

>XML is completely compatible with SGML or it isn't.  A review of the
>ERB's stated principles shows that SGML compatibility gets higher
>priority over ease of entry.

Yes.  Ease of entry, however, is not quite what David mentions, and not
quite the same as user utility and user acceptance.  So I don't think
the goals statement actually provides an unambiguous answer to this
question, even if we believe that we need absolutely full compatibility
or nothing.  (I for one think almost-compatible is a lot better than
nothing!  XML with incompatible RE rules is, for example, much better
than recommending that the world switch to Word Perfect binary
format....)

>Whether this priority is the correct one or whether it should apply
>in this case is still open for debate, but I think it's clear, as
>James pointed out what seems like years ago that the only real
>solution to the RE problem that preserves SGML compatibility is to
>eliminate mixed content, which means quoting data.

I don't think that was quite what James said.  Also, I don't think it's
quite true beyond all cavil or doubt.  James pointed out that
prohibiting certain things in mixed content (things like subelements,
comments, and PIs) would render the rules trivial.  He also pointed out
several other possible approaches which would preserve strict
compatibility, as well as some that would come close.

Quoting is one approach to trying to make the agonizingly painful
restrictions on mixed content less agonizing and less painful.  It is an
extraordinarily clever use of shortref, and my hat is off to Charles for
imagining it.  The dean of SGML hackers (in the non-pejorative sense!)
can still show the rest of us a thing or two.

But I expect that it would be the death of XML to require it.  It would
be far better, I think, to adopt James's compromise solution of
white-space stripping and RE-merging, which also achieves SGML
compatibility in all non-pathological cases, or to treat RE as we treat
any other white space (significant outside of angle brackets, a
separator inside of angle brackets, -- roughly speaking), which breaks
SGML compatibility in most non-pathological cases, but is very simple to
use and understand, and which would *never* make a difference in any
SGML application I've ever used in real life.  (I.e. all the application
code I actually work with in practice is already written to assume it's
got to recover when it's passed an unwanted RE or two.  Full disclosure
requires that I admit I do have some code that does terrible things
when confronted with leading blanks at times it doesn't expect them.
But that's not relevant here, because 8879 doesn't deal with that.
James's compromise solution, to its credit, does.)

SGML already *has* delimiters between markup and data.  Do we really
need a second set of delimiters for *white space*, for heaven's sake?
Say it ain't so!

>I certainly agree that quoting data will make authoring *by hand*
>more difficult (I hated it the first time I tried it), and we do
>have to be sensitive to the marketing implications of requiring it,
>but I feel very strongly that the cost of not having SGML
>compatibility in this case is much greater than the cost of
>authoring.

We agree that the tradeoff involves authoring vs. strict compatibility,
but I think it's more than authoring.  The closer we come to making XML
documents look familiar to today's users of SGML and SGML applications,
including HTML, the greater the acceptance of XML not only among users
but also among toolsmiths.  (It's for this reason that I think Bill
Lindsey's very smart NET tricks, though like Charles's quoting proposal
they are a tour de force of ingenuity, are not the politically savvy
route for XML to take.  I read Scheme books in the evenings, and in
the mornings, with a sigh, write code in C.)

We seem to disagree primarily in assessing the relative costs.  It's an
empirical question, but it's going to be hard to test both
possibilities, since if we don't get it right the first time, no one in
their right mind will pay attention to us the second time.

In this case, I think the cost of the quote proposal, like the NET
tricks proposal, is massive resistance (= massive indifference) on the
part of tool makers.

We need a workable solution to the RE-simplicity / 8879-conformance
tradeoff.  There are several on the table I could live with; the
white-space-stripping + RE-merger proposal that came out of the ERB last
week seems the best to me, but I'm not dogmatic about it.  I *am*
dogmatic on the proposals to forbid comments and PIs inside of mixed
content:  I'd rather live with our current situation (full 8879 or
nothing) than explain to incredulous users why comments are not allowed
everywhere.  And if I'm right about the user reaction to the quoting
proposal, adopting it will mean we do continue to live with the current
situation.

Quoting, in short, seems to me to fail the Stoopid test.  And failing
the Stoopid test is not the way to call users and toolsmiths to the
banner of XML as a Better Way for a Better Net.

Michael Sperberg-McQueen

&disclaimer; &serious-disclaimer; &really-serious-disclaimer;

&no-really-i-mean-it-disclaimer;
Received on Wednesday, 2 October 1996 16:16:43 UTC