Re: Final words, I think, on error handling

On Wed, 7 May 1997 01:48:06 -0400 Paul Prescod said:
>You forgot one tolerant point and I would like you to address it please.
>It falls naturally out of the draconian:
>
>Tim Bray wrote:
>> I think I am speaking fairly for the draconians when I say that from
>> our point of view, it works because
>>  - well-formedness is so easy that it isn't a significant burden on
>>    anyone,
>
>Well-formedness is such a small step on the way to a useful document
>that it isn't of particular *value* to anyone: so why all the fuss? How
>many applications that will be able to read a well-formed XML document
>and do something useful with it?

Well, it seems to me that the fuss is not entirely irrational.  (I was
in the minority, but I think the draconians were fairly clear-headed on
this point.)

It is very important, for the long-term health of XML, that with
regard to error tolerance and error detection we adopt something more
like the culture of SGML (there is a spec, and if your document has
errors, you better fix them pronto because otherwise your software
may break and you will in any case be laughed to scorn and possibly
ridden out of the next SGML 'XX conference on a rail) than like the
culture of HTML as it has developed (where error recovery is in some
cases just another name for buggy software not noticing the errors).

In the case of SGML, the banner of Validity has helped everyone a lot.
(It has a down side, too, but on balance I'd say the introduction of the
notion of formal validity is a major advance in document processing; one
of the ways SGML is a step forward vis-a-vis GML and Scribe and so on.)
Note that most everything Paul says about how little WF buys for us is
also true of validity.  I can have a perfectly valid SGML document that
is ugly as sin, abuses its tags in a way that would make any decent
document blush, mixes incompatible values of attributes which ought to
be compatible with each other, and points to locations that don't exist,
in documents whose file names are mistyped, on nodes that are no longer
part of the network and maybe never were.

A valid document, that is, is not the same as a correct one.  And yet,
we don't regard validity checking as pointless.  And so SGML documents
*tend* to be cleaner than non-SGML documents, even though many types
of dirt are not detected by validation.

Without some minimal standard that works for XML the way validity works
for SGML ('works' in this sense of encouraging a certain kind of culture
among users and programmers), we are indeed in serious danger of
launching another race to the bottom, and replicating the current
situation in HTML, where people can say with a straight face that
it doesn't matter what's in the HTML spec or DTD, all that counts is
whether the major browsers handle a given construct.

Requiring XML processors to go on strike when they encounter ill-formed
input is an important symbolic gesture that says "Come to XML, all ye
who labor and are tired of dirty data."  It draws a line in the sand
and delimits a class of data so hopelessly messed up that there is
nothing useful to be done with it but issue error messages.

Where we draw the line matters, to be sure.  But *that* we draw such
a line may matter, in the long run, even more.  Because it establishes
that formal correctness counts, too, not just pretty pictures and how
it looks on a 21-inch monitor.

>We can be totally draconian when it comes to well-formedness and the
>Web will be just as messy, nasty a place tomorrow

Here I disagree.  If there are 10 flies in the soup today, then
insisting on well-formedness may not give us fly-free soup tomorrow.
But I think going from ten to five -- or even seven -- is well worth
everyone's while.  At the very least, it calls everyone's attention to
the notions of fly-in-soup, and the notion of screens that keep certain
kinds of fly *out* of the soup.

And on the whole, I think we can safely say that that is progress.

-C. M. Sperberg-McQueen

Received on Wednesday, 7 May 1997 19:53:49 UTC