Re: Error Handling from Paul Prescod on 1997-05-07 (w3c-sgml-wg@w3.org from May 1997)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Tue, 06 May 1997 20:13:38 -0400
To: w3c-sgml-wg@w3.org
Message-ID: <336FC932.443B6DA1@calum.csclub.uwaterloo.ca>
Tim Bray wrote:
> >Anyhow, if they are all agreed that they want to be tough on errors,
> >why don't they just DO SO and let the editors and other tools do the
> >Right Thing for their customers?
> 
> Because the technical people are very aware of the dangers of some
> marketing droid seeing a possible competitive advantage in incorrect
> behavior and launching us down the HTML poop-chute again.  

It is not marketing droids who caused the HTML problem or will cause a
problem with XML. That's passing the buck. Lazy programmers didn't do
their error checking and authors didn't know they were making a mistake.
The solution is careful programmers and good validation facilities in
browsers.

> We are
> trying to achieve a big cultural change, we need all the ammo
> we can get.  Or were you proposing doing this for HTML?  It's too
> late for that (and I would argue that the forgiving behavior for
> HTML is just fine, and a major reason for the Web's success).

*** NO IT IS NOT TOO LATE FOR HTML ***

If Netscape and IE 4.0 both provided an accurate validation mode and an
optional "bad HTML flag" then HTML would be "fixed" within a couple of
years. But they don't and most authors don't even know about validation.
I have demonstrated validators to authors and watched their eyes grow
wide when they realized that they were doing so many illegal things.

This is why I am VERY cynical about brwoser vendors asking for our
"help" in stopping XML from becoming a cess-pool of non-compliance. They
choose not to expend *any* effort on correcting HTML but want everyone
else in the world to conform to this draconian XML model in order to
stop them from being lazy in the future: "stop me before I kill again!"
 
> >There is a big gap between specifying mandatory error recovery and
> >specifying mandatory error non-recovery. In the middle is:
> >"specifying/requiring error reporting and leaving error recovery
> >upto the best judgement of the vendor.
> 
> Right; we understand this very clearly.  There is a strong feeling
> among vendors of commercial authoring and display software, and serious
> information providers, that we want mandatory error non-recovery.

As long as that is clear to everyone, draconian-supporters can stop
describing our two options as "draconian error recovery" and "pretend
nothing happened."
 
> Error reporting has been tried in HTML-land, and failed.  

*** NO IT HASN'T ***

No version of the HTML standard has every specified that an application
must report errors. I argued for that a year or more ago but was shouted
down.

> Nothing less
> than a draconian policy has a chance of making it clear that we're
> serious.  The theory is that error reporting will be taken seriously
> only if it is accompanied by an absence of data.

And this theory is based upon ... what? In the SGML world many tools
have validation built in but will also do their best with the data they
are given. I don't know anybody who goes around ignoring these error
messages. The same situation with LaTeX. I think that it is an insult to
the intelligence of prospective XML users to suggest that they would.
Even if Joe/Jane HomePage is that stupid (and I DO NOT believe it) since
when did "the HomePages" become our target audience? Do we really expect
people who write DTDs and stylesheets to ignore error messages?

> >And SGML editors. Doesn't every SGML editor you have ever used do its best
> >with incorrect data? How about Jade? Right this very minute I am processing
> >invalid and non-well-formed documents with it.
> 
> There's no reason that should stop.  We have to make it clear that
> there is a place, in authoring systems, for all sorts of weird noncompliant
> documents, in transition to being XML.  The comformance of an authoring
> system is only measurable in terms of its output - which HAD BETTER
> be well-formed.

If I load a document with a mistake in the middle into some XML Editor I
expect it to do the best it can. When that document is corrected and
saved, it should contain that data. Therefore its parser cannot both
meet my expectations AND conform to your draconian policy. Therefore it
is "non-compliant".

And when did Jade or SP become "authoring systems?" I may use them while
I am authoring a document, but I would expect to use a generic XML
browser in the same way. What loophole are you going to invent that will
exclude Jade but include Netscape? And why shouldn't Netscape be as
useful during the initial authoring stages as Jade is?

> In fact, one of the reasons why Jade/SP are so big, and why it takes
> a superhuman like James Clark to write them, is because they have
> had this kind of wizardry built in.  One of the goals of XML is to
> enable the creation of lightweight tools that can flit about the
> web; while SP is wonderful and appropriate in support of editing
> environments, we'd like something a lot smaller and simpler to
> run in our JVM's.

Nobody in this discussion needs to be convinced of the value of
validity. We don't have to go over them over and over. The question is:
"Can we achieve validity without being overly restricting and
draconian." I think that the success of the SGML world in doing so
proves that we can.
 
> If, on the other hand, you are saying that you like Jade/SP because
> you want to produce and publish non-WF documents, then you are correct;
> you will not be able to use XML because no popular browsing tool will
> support such a practice.

I want to produce documents that will be non-WF during their development
process. I do not want to distribute them while they are non-WFed.
 
> There's one last key point:  If Netscape and Microsoft jump on board,
> as they say they will, then no major browser will display non-WF
> docs.  So the publishing model can be about the same as it is now;
> create the doc, see if it looks OK in Navigator or Explorer, and if
> so, ship.  With the knowledge that it's well-formed.  Which means:
> 
> - No information provider who does even the most cursory checking
>   will publish non-WF docs
> - No user will ever be in the position that he can't see an "interesting"
>   doc just because it's non-WF, because there won't be any
> - And if there are, they will be evidence of either serious breakage
>   in the delivery system, or a provider who is so contemptuous of
>   quality that
>   (a) they can't master balanced-tag + quoted-attribute syntax, and
>   (b) don't bother with even a single basic usability check before
>       publishing.
>   In other words, a bozo, whose output can safely be ignored anyhow.

Big deal! Links can point to non-existant documents and non-existant
elements. Stylesheets can go into infinite loops. Elements can be nested
in random orders that make no sense and break the stylesheet.
Stylesheets can be totally missing. Documents can be embedded within
themselves recursively. Included data can be in a bad format. Documents
can fail to conform to their declared DTD.

You aren't even close to solving the real Web quality problem. This can
only be solved with *programming* not regulations. Browsers must have
full validators in them that can check all of these things. They must
*report* these errors. What they do after reporting them is up to them.
Users want to fix their pages. They aren't stupid or lazy. They just
don't know that there is a problem.

 Paul Prescod
Received on Tuesday, 6 May 1997 20:18:37 UTC