Re: "error" reporting and recoverability from David Brownell on 1998-12-08 (xml-editor@w3.org from October to December 1998)

From: David Brownell <db@Eng.Sun.COM>
Date: Tue, 08 Dec 1998 14:36:48 -0800
To: C M Sperberg-McQueen <cmsmcq@uic.edu>
CC: xml-editor@w3.org
Message-ID: <366DAA00.D891227D@Eng.Sun.COM>
C M Sperberg-McQueen wrote:
> 
> >       Error:  a violation of the rules of this specification;
> >       results are undefined.  Conforming software may detect
> >       and report an error and may recover from it.
> >
> >The problem is that this definition promotes wide variations in
> >handling errors:  it permits parsers either to ignore the "error"
> >entirely, or to treat it as a fatal error.
> 
> Correct; we expect parser developers to compete with each other
> in part by providing the error behavior best suited to a particular
> field of application.  In some contexts that will mean dogged
> perseverance, and in others it may mean crash-and-burn-quick to
> avoid burning cycles unnecessarily.

I guess I still see inconsistency in such behaviors as
undesirable ... at least with respect to reporting, though
less so for recovery.  I think of "perseverence" as relating
to recovery, but that might just be me!


> >Minimally, I'd suggest that this be tightened to require error
> >reporting "at user option" (as for validity errors).
> 
> See below.
> 
> >It'd also be advantageous to preclude treating errors as "fatal"
> >unless such treatment is specifically allowed in the spec.  For
> >example, in 4.3.3 it might be appropriate to permit processors
> >to optionally report fatal errors when the encoding declaration
> >is sufficiently broken.
> 
> I do not believe that errors in the encoding declaration (or in
> external encoding information) are always detectable, or
> distinguishable from other errors.  If data arrives in ISO 8859-7 but
> it is tagged as ISO 8859-1, it will be very difficult for any software
> or hardware system to detect the error.

I take your point (some errors can't be not be consistently detected)
but I don't quite see how it relates to my suggestion to reduce the number
of errors that "might" be treated as fatal.  (In this case I think
what'd be hard is correctly attributing the error to the encoding
declaration, vs for example an illegal Name.)


>	  If it arrives tagged as some
> variant of ISO 8859, but in fact it's EBCDIC, how will the system
> distinguish the actual error (in the encoding declaration) from other
> errors (corrupt data, error in use of delimiters, inclusion of illegal
> characters) which might produce similar results?

That would be "sufficiently broken", and I'd say that's a reasonable
place to allow reporting a fatal error.

Contrariwise, what's the point of permitting a fatal error in the case
of an incorrect redefinition of the "amp" entity?  It must be predeclared,
and for all (other) entities the first definition holds and others are
ignored.  That should be at most a warning.


> >Why was this definition made so weak and fuzzy?  Was it just that
> >there wasn't much implementation experience on which to draw?  If
> >so, I think that there's plenty of experience now!
> 
> There was some diversity of opinion in the WG, and I believe that
> diversity is reflected among the editors, but at least some members of
> the WG believed that at least some things which are (or should be)
> defined as errors in the spec are not necessarily detectable, or are
> detectable only at the cost of unacceptably limiting the possible
> implementation strategies.

That's almost what Tim said -- except for the second clause (removing
implementation options, such as not checking for deterministic FSMs
in the content models, can sometimes be bad).


>	So we distinguished between errors which
> an implementation is required to detect from errors which an
> implementation is not required to detect, but which will generally
> result in unexpected and probably undesired (i.e. incorrect) results.

That addresses the "optional detection" aspect, but not the
"optional fatality" aspect I am also bothered by ...


> Removing that distinction would have the drawback, in some cases,
> of requiring software to do things which software is logically
> incapable of doing -- or else of defining undetectable errors as
> non-errors (e.g. saying that when an ASCII file contains an
> EBCDIC encoding declaration, it is "not an error").

Well, that's one of those perceptual issues in my book:  if it
looks like EBCDIC and quacks like EBCDIC you can't tell it isn't
really EBCDIC ... if it's really undetectable, how can you really
argue that there was an error except in a user's expectations?

- Dave
Received on Tuesday, 8 December 1998 17:36:56 UTC