- From: Terry Allen <tallen@sonic.net>
- Date: Tue, 29 Apr 1997 23:07:34 -0700
- To: jeanpa@microsoft.com, w3c-sgml-wg@w3.org
Jean Paoli writes: | In short, we, at Microsoft (I discussed this with a lot of persons), | think that we must start with a fresh start with this new great format | and define very precisely the error handling strategy for XML. Good to hear from you, and on this topic. | XML is a new open format and writing an XML parser is easy and should | stay easy. I would like to see a defense of this position. The more I think about it, the more I wonder whether a handful of parsers might not satisfy global demand. But I don't think it matters here. The concern of vendors (as Tim has articulated it) is not to increase the cost of error recovery to their own parsers. This is entirely reasonable. | This is why my position for parsing XML would be : | "Error handling in XML should obey to public rules and not to some | random heuristics". | ( I know this seems unbelievable to have to state such things, but hey!, | we know what happens | on the WWW) | | The spec already says that all reportable errors should be reported | (under control | of a user-settable option). I think vendors and implementors should | compete | on the quality of error reporting. | | This is great. Yes, but also underspecified. Who is the user? If the user is a software program, what is the API? How can the user determine the abilities of the parser? If the user is a software program, must it offer the same options to the human user? Instead of describing error handling as a parser issue, why not describe it as an application issue (except for the reluctance to talk about applications at all)? If the parser reports a fatal error and dies[1], and the application chops off the already parsed part of the instance and sends the rest back to the parser for a new parse, and so on iteratively until the end of the instance has been reached and the results catted together, what is the difference between what has resulted and error recovery conducted entirely within the parser? Users buy applications (results), not bare parsers (process); any strictures on parsers may be evaded by keeping parsers ignorant of the crimes they are committing. | But I think it is *essential* to define what happens after the error(s) | are reported. [good reductive pursuit of this position deleted] | If we do not do that, this means that incorrect documents are going to | be published, that they will stay on the web because some tools are | going to be able to display | or process them whithout requiring that their author modifies them. | This means that heuristics are going to be used because users are going | to find them as they | are invented by tool providers. | Sounds familiar. I hope we will not go this way for XML. We cannot play Canute. XML is envisioned as the data format for an unimaginable range of applications, and some of those will benefit from error recovery. Humans do error recovery almost continuously (I know it's one of my specialties), why should not their software? and if it's useful, what chance have we of forbidding it successfully? [1]. On Sudden Death and Who's on First. Sudden Death. Real life offers something of a parallel to the sudden death scenario. Until the season before last (or so), American college football had no overtime provision. Hating ties, the governing body decreed a novel overtime rule. While the professionals play extra 15-minute periods under a first-to-score (sudden death) provision up to a certain number of minutes (or in the case of championships, until one team suddenly wins, no matter how long it takes), the college teams (if I have this straight) now play 15-minute periods in which, if one team scores, the other gets a chance at tying again. In the past UCLA-USC game, one of the most exciting in this storied rivalry, and which I missed due to the mischance of being in Massachusetts at the time (if anyone has a videotape, please let me know!), three overtimes were played before UCLA, which had come back from a seemingly insurmountable deficit, defeated its archrival. My point is that a policy of sudden death at "first error encountered" only shifts the point of competition to what the first error is, and differences in construing just what that is could lead to the very competition it is sought to avoid through sudden euthanasia. "Oh, well, I had it in my partially-processed-stack, so I just thought I'd send it on in the body of the error message" would cover quite a bit of error recovery. Any ambiguity in the specification would become a point of bitter contention. Who's on First: This is purely a matter of interest, and maybe there's a flat answer I just don't know. Does ISO 8879 or XML-lang define the method of parsing in such detail that parsers must begin at "the beginning" of a sequence of bytes and not at "the end" or somewhere in the middle? If the instance is <foo> <bar>coasters, beer mugs, ashtrays</bar> <bear>fur, growl</bear> <porridge>oats, cream, brown sugar</porridge> </foo> would it not be possible to construct a grove starting from <bear> and exploring its context? or by parsing <porridge>, then <bear>, then <bar>? or it is required that one start with <foo>? If not, the requirement that the parser die at the first error does not ensure that various parsers will die after having sprouted identical groves, which I take to be the intended functional effect of the Draconian stance. Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com
Received on Wednesday, 30 April 1997 02:05:01 UTC