Re: Error handling in XML from Peter Murray-Rust on 1997-04-19 (w3c-sgml-wg@w3.org from April 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Sat, 19 Apr 1997 13:46:43 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <5805@ursus.demon.co.uk>
<?XML VERSION="1.0"?>
<FLAME/>
I hope this represents a WF and appropriate message :-) [I certainly didn't want
to be confrontational, and I deliberately used epsilon to take any value :-)]
So here are some (hopefully constructive) points to provide a meeting point.

In message <199704191012.LAA14198@mail.iol.ie> digitome@iol.ie (Digitome Ltd.) writes:
> [Peter Murray Rust]
> ><AXIOM NEGOTIABILITY="epsilon">
> >My basic tenet is that an XML document is either WF or it isn't.  If there is
> >one error, then the result is a null document.  If that isn't true then I think
> >we lose a large number of people who see XML as a robust and reliable way of
> >passing information.  
> ></AXIOM>
> 
> I do not think this is so. I think it is hugely important to say "this is
> not well formed"
> but overly spartan to then say "so fix it and try again!".
> 
OK.  But what happens to the result?  I think there are (at least) two possible
ways in which XML will be used:

human->editor/validator->XML(WWW)->parser/browser+stylesheet [->printer]

program ->XML(WWW)->parser(->robot->program)+

I have no problems with the first.  Indeed I will probably be as mad as anyone
if I download 100Kbyte of XML and the browser says 'sorry :-), goof-up on line
1.  Request better document from author'.  So it's clear there needs to be a 
mechanism for this.

But people are already starting to think of robotic applications of XML.  Here
we can see distributed servers with different components.  If a user agent
(= robot) gets a document (=data to be processed) and this is not WF, then
it cannot reliably do anything further with that information.  Guessing
what the author really meant is a recipe for disaster - we've all seen that with
HTML.   Remember that we are already designing into XML the ability to carry
large collections of links which have to be precise.  These will (I assume) be
routinely processed by machines, not humans. 

I can see the value of trying to recover document content *for human 
consumption*, but not for machine consumption.  The danger is that laxness
with the first invalidates the value of the second completely.  I'm particularly
concerned that if non-WF documents get output in apparently WF-form and then 
*automatically* passed to a second application, the original error warning
will be lost.  A document with 'recoverable errors' (if this exists) must
be indelibly stamped with this and any containing document must be aware of
it.  Rather like banknotes with 'specimen' stamped over them.

I appreciate the fact that it is very difficult to get XML fully specified,
and that different parsers will produce different output and that they may
converge rather slowly.  And there are undefined things like '++a + ++a' in C.
But it will be necessary to educate authors about the most difficult areas
(whitespace, parameter entities, etc.) rather than leaving these fuzzy and 
hoping.  When the ERB comes up with the final spec about (say) whitespace then
we have to implement it as faithfully as we can, compare notes if we differ, and
attempt to resolve the issue.

	P.

[I hope it didn't come across that I have anything other than total admiration
for James Clark's software, as I have stated before.  Indeed I couldn't be 
using SGML if it didn't exist.]


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Saturday, 19 April 1997 09:19:00 UTC