Re: Error handling in XML

Tim Bray wrote:
> In recent discussions, some but not all at the recent WWW6 conference, it has
> become apparent that we have an opportunity, if we act now, to avoid one of
> the big problems that has caused HTML a lot of grief.  This is the area of
> error-handling.  HTML doesn't have any.  As a result, the browser and tool
> vendors are stuck on an endless treadmill of trying to enhance the system
> while at the same time handling any and all collections of bytes that Netscape
> 1.X did.  Get a couple of beers into anyone from the big N or the big M and
> you'll see some real tears over this.  In my former life as a Web indexer,
> I cried some of those tears myself.  So let's not let it happen again.

Agreed !
> The subject is violations of well-formedness.  Well-formedness should be easy
> for a document to attain.  In XML, documents will carry a heavy load of
> semantics and formatting, attached to elements and attributes, probably with
> significant amounts of indirection.  Can any application hope to
> accomplish meaningful work in this mode if the document does not even manage
> to be well-formed!?!?

I share PMR's opinion on this. An XML document is either well-formed
or else it is not an XML document - full stop.
> I suggest that we add language to section 5, "conformance", which says:
>  "An XML processor which encounters a violation of the constraints
>   of well-formedness must not thereafter pass any information about
>   text or markup to the application.  It must pass to the application
>   a notification of the first such violation encountered.  It MAY
>   thereafter, at user option, pass to the application information
>   about well-formedness violations encountered after the first."
> [or in English: you gotta tell the app about the first syntax botch you hit;
>  you're allowed to send the app more error messages, but you're not allowed
>  to send anything but error messages after you've detected an error]

Here I strongly disagree. We shouldn't forget all the different 
application scenarios of XML.

What about an XML editor ? Of course, I might be in a situation where
I am, temporarily, in a state where I have no well-formdness. Does it
that I will never be able to load my document again ? (The parser 
would stop passing along data after the first violation, but only 
keep complaining about my errors.) Would you suggest that I can not
save until the document is well-formed ? I might be in a hurry !
Editing is an incremental process, the outdocument of the process
*must* be an XML document, but before that (almost) anything goes. I
from past experience that often I had to switch off the automatic
document validation of, for instance FM+SGML, so that I could do
certain things in a convenient way.

Furthermore error messages only make sense if you know the context
of them. I want to show my user the output of the parser that
shows where exactly the error occured. You can provide the line-number,
of course, but you would have to go back to the original document,
load it non-parsed, so that you can exactly show the user where
the error occured.

> If we wanted to avoid phrasing this in terms of the actions of a processor
> (which might be a good idea in general for the spec) we could redefine
> "markup" and "character data" in such a way that they are considered not
> to exist in a document which is not well-formed.
> Some might argue that this violates the Internet creed: "Be conservative in
> what you supply, and liberal in what you accept."  
> I can live with that:
> the consequences of the second half of that creed have led to intolerable
> results in the quality and usability of the data on the Net.  Furthermore,
> if you want to serve up ill-formed dogshit, this will presumably remain
> possible, because: "text/html means never having to say you're sorry."

To accept anything should not be the issue of the parser but of the
application. The role of the parser is to provide the application
with enough information in a meaningful way, so that it can take
appropriate action.

So what I would like to focus this disucssion on is a taxonomy of
error events. If we can come up with a list of errors, the class
they belong to, and possible ways to handle these errors in terms
of actions of the application, I guess many people would be happy.

Best regards,
Norbert H. Mikula

= SGML, DSSSL, Intra- & Internet, AI, Java 
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula