Re: Error handling in XML

At 22:35 18/04/97 -0700, Tim wrote:
[...]
>1.X did.  Get a couple of beers into anyone from the big N or the big M and 
>you'll see some real tears over this.  

They have no-one but themselves to blame for this. The facts of the
situation were explained to them ad nauseam in person and over the
wires long before the Web passed the point of inflection on its growth 
curve.

Before we expend a large amount of effort on this, can someone confirm
that it stands some chance of being listened to. Forgive my scepticism
(and as I wasn't at WWW6 I didn't have the chance to hear it from the
horses' mouths): I know both N and M contain numbers of people who are
seriously committed to making a better shot at it this time round, but 
both organizations also contain rather more people who want to ship 
something slick that will do what the masses want: gobble bytes and 
never gag.

>The subject is violations of well-formedness.  Well-formedness should be easy 
>for a document to attain.  In XML, documents will carry a heavy load of 
>semantics and formatting, attached to elements and attributes, probably with 
>significant amounts of indirection.  Can any application hope to 
>accomplish meaningful work in this mode if the document does not even manage 
>to be well-formed!?!?

Joe and Jill Homepage are not likely to give the proverbial tinker's cuss
whether their documents are well-formed or not, if the browsers are as
forgiving and tolerant as N or M. The browsers are going to have to offer
significant new features to compensate for the penalty of having the 
parser gag on invalid syntax. They already know this, and already have
their own agenda for dealing with it. Are we singing from the same score?

>[or in English: you gotta tell the app about the first syntax botch you hit; 
> you're allowed to send the app more error messages, but you're not allowed 
> to send anything but error messages after you've detected an error]

That should kill XML stone dead, all right :-)

It is absolutely the "right" thing to do, because plowing onwards after
a syntax error will likely just throw up another few thou "errors". I'm
just worried it's not going to wash with the people who have to market the
product (any marketeers out there?). 

One simple way out of this is for there to be editing software which is 
fully conformant (guarantees not to make syntax errors), and is free and 
so easy to get that no-one in their right mind would edit by hand any
more. But that's off our track.

[It's really a weird phenomenon: no-one expects a C compiler to gracefully
accept syntax errors, put them right as it sees fit, and carry on compiling.
But everyone expects a Web browser to handle HTML like N and M. Anyone
investigating the psychology of this?]

>If we wanted to avoid phrasing this in terms of the actions of a processor 
>(which might be a good idea in general for the spec) we could redefine 
>"markup" and "character data" in such a way that they are considered not 
>to exist in a document which is not well-formed.

Ah, you mean like <PLAINTEXT> ? :-)

>Some might argue that this violates the Internet creed: "Be conservative in 
>what you supply, and liberal in what you accept."  I can live with that: 
>the consequences of the second half of that creed have led to intolerable 
>results in the quality and usability of the data on the Net.  Furthermore, 
>if you want to serve up ill-formed dogshit, this will presumably remain
>possible, because: "text/html means never having to say you're sorry."

I agree 100%. I just really want someone to prove me wrong in my scepticism
before we start...

(Maybe you can do that in exchange for a pint on Tuesday, Tim?)

///Peter

Received on Saturday, 19 April 1997 07:28:41 UTC