Re: Error handling: yes, I did mean it from Peter Murray-Rust on 1997-04-28 (w3c-sgml-wg@w3.org from April 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Mon, 28 Apr 1997 13:57:37 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <6004@ursus.demon.co.uk>
In message <199704262345.TAA19868@www10.w3.org> Michael Sperberg-McQueen writes:
[...]
> 
> Otherwise, the arguments of the Draconian camp are all centered around

From the Encyclopedia Britannica [about Draco's code]
<Q>for nearly all crimes there was the same penalty of death</Q>
(Plut. <I>Solon</I>)
- documents?  authors? both?

Constructively:
[...]
> 
> Tim and others have, in the meantime, conceded that some applications
> can usefully attempt error recovery, and hope to salvage the Draconian
> Rule by suggesting that such applications should use programs which
> aren't 'XML procesors' in the strict sense.  This amounts to saying
> "implementors can pick and choose which parts of XML to implement, and
> can keep themselves blameless even when flouting basic requirements of
> the spec, if only they call themselves XML Handlers or some other name
> instead of 'XML processors'".  I cannot think of a worse approach to
> the problem of ensuring uniform error reporting by XML software.

My main problem is that I see 'applications', 'parsers' or whatever being
complex aggregates of components.  If these components take different views
on error handling (or anything else for that matter) information can become
corrupted.  As a proto-implementor I think any graduated policy of 
error-handling is going to be complex.  

Assume that a user-agent consists of a parser+browser+transformation module.
Assume that these have been written by different groups, and that some 
run without human intervention.  Then all have to have the same view of error
processing.  If not, we can assume that the quality of the result is at
least as bad as, and probably worse than, the least Draconian component.
If that component does not interact with humans it may even be unclear
that an error is being 'laundered'.

At present we might have 
	document -> NXP -> JUMBO -> human -> output
For a  WF document, the display represents the document faithfully and 
there is a 'Save as XML' button if required.
At present a non-WF document causes NXP to terminate, so JUMBO could only
process what it's got up to then.  It chooses not to (since elements
are likely to have semantics dependent on their context in the whole 
document.)  If the parser sends JUMBO a non-WF document, JUMBO has to
*at least* remember that the document was not WF.  Every action during that
session, and any persistent record of that session must *at least* have the
capability of remembering that the document or its products came from
a non-WF source.  Therefore JUMBO has to have:
	- a flag for *humans* which (IMO) requires action from the humans
		before it proceeds.  (Not just a 1-second banner at the 
		bottom saying the input was not-WF).
	- a different flag for automatic processing (JUMBO can be run in
		'batch' if required).
	- a stamp on saved documents stating that they came from non-WF
		documents.
It gets worse when the document is used to collect linked documents.  For
example:  'This document contains links to the current rates of exchange, ...'
In this case incorrect WF documents could be retrieved automatically as part
of this process.  The collection might have to have a stamp saying
'the documents in here are WF but we aren't sure if they are the right ones'.

The point I'm making is that *as an implementor* this looks like a lot of
work and a lot of opportunities to go wrong.  I'm not saying it's not
possible, but it's some way from the CS grad student (who never makes 
mistakes anyway).  So IF the policy is to be non-Draconian, then it is
critical to enumerate the possibilities where it may go wrong.

[...]
> 
> If this is true, then what we need to do is to ensure that XML
> processors *always* allow the user to request error reports, even if the
                                ^^^^^^^^^^^^^^^
As always, we have to expect that the 'user' may be a robot, agent or other
piece of software and it doesn't understand 'looks all right'.  So it
needs a very clear indication not to try anything clever.  My problem
is that the same pieces of code in applications will be used by programs
as by humans.  Application programmers may not always realise this.        

> software recovers from the errors in question.  That way, the user who
> says "program X displays my data all right, why don't you?" can be told
> "look, even program X says your document is ill-formed: look at it with
> error-checking turned on!"
> 
> As it happens, the xml-lang spec already requires this.  I don't think
> it can realistically or usefully require more, except perhaps that it
> should also explicitly require that the processor notify any down-stream
> app, as well as the 'end' user if any.  I don't think it should require
> less.

Agreed.  I'm just worried that the error flag drops off somewhere along
the way, especially if we haven't given it a vehicle.  For example, every
'Save as XML' button should tell the 'user' "This came from a corrupt 
document, do you *really* want to save it?"

A related concern is that some software might assume that missing elements
were relatively unimportant.  e.g.

<CHAPTER>
[10 bytes garbled here..., do you want to continue?...]
<P>It doesn't matter that ... [important announcement]...</P>
[ 10 more bytes of garbled data, continue to ignore errors?...]

This could look harmless to a human - 10 bytes is not much - but if they
were (say) <REVOKED  and </REVOKED  (the parser isn't clever enough to insert
the '>' and omits the malformed tags) it could be disastrous.
> 
> If we want the culture of XML usage to differ from that of HTML, we need
> to ensure that implementors pay attention to the requirement that they
> report reportable errors unless the user says not to.  We can do that by
> complaining unmercifully about any implementation that doesn't provide
> error reporting, and by pointing out -- correctly -- that it's not a
> conforming implementation of XML.

I agree completely with this.  On a practical level I doubt if the early
prototype tools (at least from amateurs like myself) will manage thorough 
error-handling in the way that is being required.  In my case I have only 
two options Draconian, or fuzziness.  For the sake of the cause I will have
to stick with the former :-)

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Monday, 28 April 1997 11:41:10 UTC