ERB votes on error handling from Tim Bray on 1997-05-07 (w3c-sgml-wg@w3.org from May 1997)

From: Tim Bray <tbray@textuality.com>
Date: Wed, 07 May 1997 11:32:50 -0700
To: w3c-sgml-wg@w3.org
Message-Id: <3.0.32.19970507113246.009fe210@pop.intergate.bc.ca>
The ERB met on May 7th.  All members were present in person or by
proxy.  The chief subject under discussion was error handling; I have
been asked to report on the discussion and results.  The arguments on both
sides have been exhaustively covered, and I won't repeat them.  There
were, however, a few new issues that came up in the course of the
meeting.

1. WF-ness may not be as easy to check as I have been claiming -
getting the grammar right for a complex ATTLIST inside an INCLUDed
marked section is nontrivial.  

2. We have a strong political reality to deal with here in that for
the first time, the big browser manufacturers have noticed XML and
have together made a strong request: that error-handling be completely
deterministic, and that browsers not compete on the basis of excellence
in handling mangled documents.  It was observed that if they wanted
to do this, they could just do it; but then pointed out that this is
exactly why standards exist - to codify the desired practices shared
between competitors.  In any case, if we want XML to succeed on the
Web, it will be difficult to throw the first serious request from
M & N back in their face.

3. In fact, everyone on the ERB substantially agrees with M&N's 
goal, in that we do not, ever, want an XML user-agent to encounter
a WF error and proceed as though everything were OK.  Our disagreements
centre on how to use the spec machinery to achieve this.

4. We're not worried that XML editors will silently recover from errors,
because they exist precisely to create and manipulate correct content and
to fix incorrect content.  XML processors that are "read-only" are the
things that have the problem, because users have no incentive to prefer
error-free documents.

5. We considered an alternative proposal, which makes two major changes
to the XML spec by defining the concept of an XML-conformant application,
and the concept of a human user.  This proposal would require an 
XML-conformant application, when confronted with a WF error, to 
refuse to proceed until a human user had been notified of the error
and explicitly authorized error recovery.  After some discussion, this
proposal failed to win majority support - concerns included 
 - the radical changes to the spec
 - the fact that much parsing code is operating in multithreaded mode at 
   a very low level, and it may not be tractable to have to check for the 
   presence of a human
 - this seems to compromise a design goal of XML, that processors be
   lightweight and easy to send across the Net, because they will
   all start to carry around user-interaction and error-recovery code 
   for competitive reasons
 - it is not clear that the modal-approval model is achievable across
   the range of user interfaces where XML will likely be deployed
However, this proposal did get serious consideration, and quite likely
would have attracted significant numbers of votes from the Tolerants
in the crowd.

6. If it turns out that there are common classes of WF errors that are
bedeviling users, we should be willing to fix the language to address
the problem.

7. There are some detailed operational concerns about the draconian
model.  First, it allows processors to feed parsed info to the app
up to the point of error; but is this required, i.e. can a processor
refuse to cough up a single byte because the doc is non-WF?  Second,
it is important that the processor be able to feed the app raw
un-parsed text to aid in error repair - given that the processor knows
where he is in the entity tree, it's much easier for the processor
to do this than the app - and this should probably include portions
of the doc *before* the error.

8. It was pointed out that if adopt the draconian policy, and then
at some later point decide that error recovery should be allowed in
some or all circumstances, we can relax it.  The reverse is not perceived 
to be true.

So after all this, the vote:

The question is [note special terms 'must', and 'may']:

1. The XML-lang spec should be modified (probably in the conformance
   section) to state that for well-formed documents, an XML processor
   must make available to the application, at a minimum, the character 
   data extracted from among the markup in the document, and a description 
   of the logical document structure expressed by the markup. 

2. The XML-lang spec should be modified to state:

   When an XML processor encounters a violation of a well-formedness 
   constraint, it must report this error to the application.  It may 
   continue processing the data to search for further errors, and report 
   such errors to the application.  In order to support correction of 
   errors, it may make the unprocessed text from the document, with 
   intermingled character data and markup, available to the application.

   Once such a violation is detected, however, the processor must not
   continue the process, described in [ref. to language in point 1],
   of passing character data extracted from markup, and description
   of the logical document structure expressed by the markup, to the 
   application.

Yes: Bosak, Bray, DeRose, Magliery, Maler, Paoli, Wood*
 No: Clark, Hollander, Kimber, Sperberg-McQueen

(* Lauren Wood was substituting for Peter Sharpe, with the approval
   of the Chair and ERB)

On a related point, the ERB agreed to put some application notes in 
the spec covering the points raised in items 4 and 7 above.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592
Received on Wednesday, 7 May 1997 14:34:34 UTC