RE: Error handling: yes, I did mean it from Terry Allen on 1997-04-30 (w3c-sgml-wg@w3.org from April 1997)

From: Terry Allen <tallen@sonic.net>
Date: Tue, 29 Apr 1997 23:07:34 -0700
To: jeanpa@microsoft.com, w3c-sgml-wg@w3.org
Message-Id: <199704300607.XAA19587@bolt.sonic.net>
Jean Paoli writes:
| In short, we, at Microsoft (I discussed this with a lot of persons), 
| think that we must start with a fresh start with this new great format 
| and define very precisely the error handling strategy for XML.

Good to hear from you, and on this topic.

| XML is a new open format and writing an XML parser is easy and should
| stay easy.

I would like to see a defense of this position.  The more I think about
it, the more I wonder whether a handful of parsers might not satisfy
global demand.  But I don't think it matters here.  The concern of
vendors (as Tim has articulated it) is not to increase the cost of 
error recovery to their own parsers.  This is entirely reasonable.

| This is why my position for parsing XML would be :
| "Error handling in XML should obey to public rules and not to some
| random heuristics".
| ( I know this seems unbelievable to have to state such things, but hey!,
| we know what happens
| on the WWW)
| 
| The spec already says that all reportable errors should be reported
| (under control 
| of a user-settable option). I think vendors and implementors should
| compete
| on the quality of error reporting.
| 
| This is great.

Yes, but also underspecified.  Who is the user?  If the user is a
software program, what is the API?  How can the user determine the
abilities of the parser?  If the user is a software program, must
it offer the same options to the human user?

Instead of describing error handling as a parser issue, why not
describe it as an application issue (except for the reluctance
to talk about applications at all)?  

If the parser reports a fatal error and dies[1], and the application
chops off the already parsed part of the instance and sends the
rest back to the parser for a new parse, and so on iteratively
until the end of the instance has been reached and the results
catted together, what is the difference between what has resulted
and error recovery conducted entirely within the parser?  Users buy 
applications (results), not bare parsers (process); any strictures on 
parsers may be evaded by keeping parsers ignorant of the crimes they 
are committing.

| But I think it is *essential* to define what happens after the error(s)
| are reported.

[good reductive pursuit of this position deleted]

| If we do not do that, this means that incorrect documents are going to
| be published, that they will stay on the web because  some tools are
| going to be able to display 
| or process them whithout requiring that their author modifies them. 
| This means that heuristics are going to be used because users are going
| to find them as they
| are invented by tool providers. 
| Sounds familiar. I hope we will not go this way for XML.

We cannot play Canute.  XML is envisioned as the data format
for an unimaginable range of applications, and some of those will
benefit from error recovery.  Humans do error recovery almost
continuously (I know it's one of my specialties), why should
not their software?  and if it's useful, what chance have we
of forbidding it successfully?

[1].  On Sudden Death and Who's on First.  

Sudden Death.  Real life offers something of a parallel to the sudden death 
scenario.  Until the season before last (or so), American college
football had no overtime provision.  Hating ties, the governing body
decreed a novel overtime rule.  While the professionals play extra
15-minute periods under a first-to-score (sudden death) provision
up to a certain number of minutes (or in the case of championships, 
until one team suddenly wins, no matter how long it takes), the
college teams (if I have this straight) now play 15-minute periods
in which, if one team scores, the other gets a chance at tying 
again.  In the past UCLA-USC game, one of the most exciting in
this storied rivalry, and which I missed due to the mischance of being in 
Massachusetts at the time (if anyone has a videotape, please let me 
know!), three overtimes were played before UCLA, which had come back
from a seemingly insurmountable deficit, defeated its archrival.  

My point is that a policy of sudden death at "first error encountered"
only shifts the point of competition to what the first error is, and
differences in construing just what that is could lead to the very
competition it is sought to avoid through sudden euthanasia.  "Oh,
well, I had it in my partially-processed-stack, so I just thought
I'd send it on in the body of the error message" would cover quite
a bit of error recovery.  Any ambiguity in the specification would
become a point of bitter contention.

Who's on First:  This is purely a matter of interest, and maybe
there's a flat answer I just don't know.  Does ISO 8879 or
XML-lang define the method of parsing in such detail that parsers
must begin at "the beginning" of a sequence of bytes and not
at "the end" or somewhere in the middle?  If the instance is

<foo>
<bar>coasters, beer mugs, ashtrays</bar>
<bear>fur, growl</bear>
<porridge>oats, cream, brown sugar</porridge>
</foo>

would it not be possible to construct a grove starting from
<bear> and exploring its context?  or by parsing <porridge>,
then <bear>, then <bar>?  or it is required that one start 
with <foo>?  If not, the requirement that the parser
die at the first error does not ensure that various parsers
will die after having sprouted identical groves, which I take
to be the intended functional effect of the Draconian stance.


Regards,

  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com
Received on Wednesday, 30 April 1997 02:05:01 UTC