Error handling in XML

In recent discussions, some but not all at the recent WWW6 conference, it has 
become apparent that we have an opportunity, if we act now, to avoid one of 
the big problems that has caused HTML a lot of grief.  This is the area of 
error-handling.  HTML doesn't have any.  As a result, the browser and tool 
vendors are stuck on an endless treadmill of trying to enhance the system 
while at the same time handling any and all collections of bytes that Netscape 
1.X did.  Get a couple of beers into anyone from the big N or the big M and 
you'll see some real tears over this.  In my former life as a Web indexer,
I cried some of those tears myself.  So let's not let it happen again.

The subject is violations of well-formedness.  Well-formedness should be easy 
for a document to attain.  In XML, documents will carry a heavy load of 
semantics and formatting, attached to elements and attributes, probably with 
significant amounts of indirection.  Can any application hope to 
accomplish meaningful work in this mode if the document does not even manage 
to be well-formed!?!?

I suggest that we add language to section 5, "conformance", which says:

 "An XML processor which encounters a violation of the constraints
  of well-formedness must not thereafter pass any information about
  text or markup to the application.  It must pass to the application
  a notification of the first such violation encountered.  It MAY 
  thereafter, at user option, pass to the application information
  about well-formedness violations encountered after the first."

[or in English: you gotta tell the app about the first syntax botch you hit; 
 you're allowed to send the app more error messages, but you're not allowed 
 to send anything but error messages after you've detected an error]

If we wanted to avoid phrasing this in terms of the actions of a processor 
(which might be a good idea in general for the spec) we could redefine 
"markup" and "character data" in such a way that they are considered not 
to exist in a document which is not well-formed.

Some might argue that this violates the Internet creed: "Be conservative in 
what you supply, and liberal in what you accept."  I can live with that: 
the consequences of the second half of that creed have led to intolerable 
results in the quality and usability of the data on the Net.  Furthermore, 
if you want to serve up ill-formed dogshit, this will presumably remain
possible, because: "text/html means never having to say you're sorry."

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

Received on Saturday, 19 April 1997 01:37:01 UTC