Re: non-reportable errors from lee@sq.com on 1996-10-23 (w3c-sgml-wg@w3.org from October 1996)

From: <lee@sq.com>
Date: Wed, 23 Oct 96 19:31:36 EDT
To: w3c-sgml-wg@w3.org, U35395@UICVM.UIC.EDU
Message-Id: <9610232331.AA04895@sqrex.sq.com>
> >It seems to me that from a CS point of view, a non-reportable error
> >sounds like an error in the specification!  If you mean `it is optional
> >that an XML application detect this as an error', say that.
> 
> Does this mean that a condition which (a) is certainly unintended
> except in cases of deliberate sabotage, (b) will likely or possibly
> produce erroneous results, or (c) however else you wish to say
> 'mistaken, wrong, bad, glitch, snafu, ...' but which cannot reliably
> be detected by software, or which can be detected only at great cost,
> cannot be described as an error?

No.

> 
> In C, is it an 'error' or only a situation with 'undefined results'
> when I allocate memory, free it, and then try to write to it with a
> pointer that wasn't updated? 

It is a situation with undefined behaviour, not an error.
An implementation may flag it at runtime; we have at least three such
implementations here, maybe more, and they are very useful for detecting
and fixing such errors.  Don't leave home without them.
It is possible to prove mathematically that there are cases that no
C compiler can detect, that many people would consider errors -- for example,
a program that under some inputs goes into an infinite loop.  This, of
course, is called the Halting Problem.



> I don't think this is reliably
> detectable, and the C standard certainly doesn't require compilers or
> runtime systems to detect it.  But it's clearly an error in my book.
> It certainly isn't correct!

The program that does this is probably not correct.
It is a bug to rely on undefined behaviour, and the program is in error.
If that is what you mean by a non-reportable error, fine.  The phrase
to me means `an error that must not be reported to the user', and I
don't like that.  All I was complaining about was the wording.


> My proficiency in English has limits, like anyone's, but to me
> 'error' means 'something wrong'.

In compiler practice, an error is something that prevents successful
compilation.  If there was an error, processing did not complete.
If there was a warning, you should fix it, but you will still get
an output file that you can try to run.

In English, Entity and element have meanings, but in SGML they're jargon.
Error and Warning are jargon to computer programmers.

>   Conforming processors
> should be required to report any error they can reasonably be
> expected to detect

Assuming that they are running in an environment in which there is a
back-channel to get messages to a human who can reasonably be expected to
deal with the error.  For example, a web crawler might write to a log file,
but it might well be run with parsing warnings/errors turned off, so that
when it examines 1,000,000 web pages, the person running it (who doesn't
care about parse errors, and can't fix them as he doesn't own them) doesn't
see the messages.

> (unless of course they die immediately -- does
> sudden death count as an error message?)
No.

> So I'd like to call it
> an error when (for example) a document declares List as containing
> (head?, item+) and the instance has a list with two Head elements.
> This is an error, but I do *not* want all XML parsers to be required
> to read the DTD and validate the document against its content models.

The error here is that the DTD does not correctly describe the instance.
Whethere the DTD or the instance is at fault, it is only an error if you
have both the DTD and the document.

> It is (or, I think, should be) an error if a metadata header says
> that a data stream is ISO 8859-1 when in fact it's ISO 8859-2.
> A MIME processor cannot reasonably be expected to detect such an
> error; does that mean it's not going to lead to erroneous results?

It will lead to undefined results.
They may happen to be correct (e.g. there are only ASCII/646 chraracter
codes in the file)

When the XML spec is written, it should define error cases where applicable.
The simpler the language, the fewer kinds of error there should be.

Every XML implementation will need to implement code to check for, report,
and handle, every error case that is described in the spec.  And no others.
A good implementation might handle many more cases, of course.

Never call something undetectable an error.  Instead, design the system so
that it is not an error.  For example, your DTD example is made tractable
by saying that a DTD, when supplied, must match the instance.  This is
much better than saying that every instance must match a DTD even if
one was not supplied.

If something need not be reported, ask why.  If it is because fixing the
"problem" does not affect the document's meaning, portability or longevity,
maybe it is a pointless error.  If there is something that may reduce
document portability, but is still within the spec, the spec might suggest
a warning.  for example, in C,
	unsigned long ul = 12;
	.
	.
	.
	if (ul < 0) {
	    printf("not reached\n");
	}
some compilers will warn that the comparison (ul < 0) can never be true, by
definition, so that the printf is never executed.  It's not illegal C, but
it is odd C,and the warning often catches a bug.  Similarly,
	char c;
	
	while ((c = getchar()) != EOF) {
	}
is a common bug (a char variable can't represent EOF, so the comparison will
always fail on some systems; on other systems, c is sign extended, and
character 255 (y dieresis in Latin 1) will be treated as EOF.  Oops)
This is a case for a warning, not an error.  It is not the compiler's
job to second-guess the programmer (except in PL/1 maybe!).

Sorry for the extended programming language example to non-programmers!


Lee
Received on Wednesday, 23 October 1996 19:31:51 UTC