Re: "error" reporting and recoverability from C M Sperberg-McQueen on 1998-12-07 (xml-editor@w3.org from October to December 1998)

From: C M Sperberg-McQueen <cmsmcq@uic.edu>
Date: Mon, 7 Dec 1998 12:58:39 -0600
To: db@Eng.Sun.COM
CC: xml-editor@w3.org, cmsmcq@uic.edu
Message-Id: <199812071858.MAA270080@tigger.cc.uic.edu>

>Date: Wed, 02 Dec 1998 14:12:46 -0800
>From: David Brownell <db@Eng.Sun.COM>

Thanks for your note.  The following reply should be taken as
reflecting my personal views, not those of any WG, editorial team,
organization, project, or institution.

>I've noticed when testing against the spec that the definition of
>an "error" in section 1 is particularly useless when it comes to
>defining common behaviors:
>
>	Error:  a violation of the rules of this specification;
>	results are undefined.  Conforming software may detect
>	and report an error and may recover from it.
>
>The problem is that this definition promotes wide variations in
>handling errors:  it permits parsers either to ignore the "error"
>entirely, or to treat it as a fatal error.

Correct; we expect parser developers to compete with each other
in part by providing the error behavior best suited to a particular
field of application.  In some contexts that will mean dogged
perseverance, and in others it may mean crash-and-burn-quick to
avoid burning cycles unnecessarily.

>Minimally, I'd suggest that this be tightened to require error
>reporting "at user option" (as for validity errors).

See below.

>It'd also be advantageous to preclude treating errors as "fatal"
>unless such treatment is specifically allowed in the spec.  For
>example, in 4.3.3 it might be appropriate to permit processors
>to optionally report fatal errors when the encoding declaration
>is sufficiently broken.

I do not believe that errors in the encoding declaration (or in
external encoding information) are always detectable, or
distinguishable from other errors.  If data arrives in ISO 8859-7 but
it is tagged as ISO 8859-1, it will be very difficult for any software
or hardware system to detect the error.  If it arrives tagged as some
variant of ISO 8859, but in fact it's EBCDIC, how will the system
distinguish the actual error (in the encoding declaration) from other
errors (corrupt data, error in use of delimiters, inclusion of illegal
characters) which might produce similar results?

>Why was this definition made so weak and fuzzy?  Was it just that
>there wasn't much implementation experience on which to draw?  If
>so, I think that there's plenty of experience now!

There was some diversity of opinion in the WG, and I believe that
diversity is reflected among the editors, but at least some members of
the WG believed that at least some things which are (or should be)
defined as errors in the spec are not necessarily detectable, or are
detectable only at the cost of unacceptably limiting the possible
implementation strategies.  So we distinguished between errors which
an implementation is required to detect from errors which an
implementation is not required to detect, but which will generally
result in unexpected and probably undesired (i.e. incorrect) results.

Removing that distinction would have the drawback, in some cases,
of requiring software to do things which software is logically 
incapable of doing -- or else of defining undetectable errors as
non-errors (e.g. saying that when an ASCII file contains an 
EBCDIC encoding declaration, it is "not an error"). 

-C. M. Sperberg-McQueen

Received on Monday, 7 December 1998 13:59:12 UTC