Re Jon on Error from Terry Allen on 1997-05-06 (w3c-sgml-wg@w3.org from May 1997)

From: Terry Allen <tallen@sonic.net>
Date: Tue, 6 May 1997 16:42:46 -0700
To: w3c-sgml-wg@w3.org
Message-Id: <199705062342.QAA15241@bolt.sonic.net>
Jon writes:
| 2. The ERB is thrashing out a policy regarding error handling.  We
| seem to be converging on a position that doesn't go quite as far as
| "halt and catch fire" but comes pretty close to it.  Since I haven't
| said much about this, allow me to express a couple of personal
| opinions; please put replies, if any, under a new subject line.
| 
| <EXCURSUS>
| 
| (a) I've been a bit annoyed at times over the past couple of weeks
| with the repeated assertions that the major implementors will ignore
| standards for error handling, so please indulge this brief outburst:
| IT'S THE MAJOR IMPLEMENTORS WHO ARE ASKING US TO DO THIS.  Got that?
| Good.

That's what those of their representatives who are cognizant of the
issue say today.  It is reasonable (though perhaps in error) to
predict that the corporate interests of the companies these people
work for will in time pull the other way.  

| (b) You can't specify standard error recovery without ipso facto
| making the recovery behavior an implicit extension to the language.
| If an application can recover from an omitted end tag, for example,
| then you have just made omitted end tags part of the language spec.
| The HTML experience over the last few years has proven this point
| beyond any doubt (and is one of the key reasons that we are being
| asked by the Big Guys to take a hard line on error handling).

I don't want to specify standard error recovery, I want only that
any sort of error recovery not be forbidden.  Please consider my
(so far unanswered) question about recursive parsing after errors.
If the app sends the XML to the parser, gets something back and
an error message, mends the remaining unprocessed XML and sends
it back for a new parse (under a fictitious name if need be),
cats the result of the first parse and the second together, etc.,
you have, legally, just the sort of error recovery you want to
forbid the parser to provide.  If you really want to satisfy the
market needs perceived today by the representatives of the big
guys, you'll have to get into specify apps, won't you?  I mean,
that's what they sell.

| (c) Some people who understand the necessity for a compiler to refuse
| to produce an executable from broken code seem to think that it's
| perfectly OK for a document processor to pass over bad spots in a
| document and carry on.  Maybe you have to be part of a group that
| produces support documentation for hardware and software that really,
| truly does run nuclear power stations and air traffic control systems
| to understand this, but take it from me that it is *not* acceptable
| for pieces of language to silently disappear from documents or appear
| in ways that could be misinterpreted by the user.

I agree, but it is not necessary to forbid all error recovery.  Far
better to say that error recovery behavior is not specified rather
than say it is illegal.  

| People who insist that there is a customer requirement for "forgiving"
| document applications overlook the fact that we already have those:
| they are called HTML browsers.  The text/html media type or .htm(l)
| file extension says "I don't know (or don't care) exactly what I'm
| doing, this could be anything, do the best you can with it."  Fine.
| We all know that this is the behavior to be expected from HTML.  The

No, from HTML applications.  

| text/xml media type or .xml file extension (or embedded XML tag in an
| HTML document, if that ever happens) says "I really do know what I'm
| doing here, I'm serious about this, I'm not kidding, I mean exactly
| what I say, please treat this the way that you would treat a program
| and don't do me any favors by pretending that my errors aren't there."

You are predicting user behavior here, and if our recent history is
any guide, XML will be under exactly the pressures of HTML, plus
some.  You can't legislate morality.  Profitable XML apps will be as useful
as they well can be, and there will be occasions to examine portions
of the instance beyond the first error.  Let M & N sign a treaty
if they want about what their *applications* won't do. 

| XML is a power tool.  It's not for beginners.  It is perfectly

It's not for those C.S. graduates with two weeks of time at their 
disposal?  It's really only for M & N?  Authors can just lump it?
If you take the Draconian approach you only encourage people to
work around it.

| consistent with its market positioning to require a different set of
| error-handling behaviors from XML processors.

Let me introduce another point.  Much of the handling of XML 
instances by user agents will involve non-XML-compliant parsing
("search for the last foobar using fb.pl and write it to stdout").  
Indeed, the aptness of XML for this kind of parsing is (to me, at least)
a major selling point, perhaps the prime selling point of XML.

So if we look out a couple years, we can envision a landscape in
which there are some XML-compliant parsers and an awful lot of
partial-parsers.  Value will be placed on XML compliance only for
some purposes; some applications will use partial parsing to get
the most they can out of instances, and if they make that known,
users and user agents can be attuned to that behavior.  

So why is anyone going to balk at using a non-XML-compliant
parser that does error recovery?  What do you achieve in the real
world by the Draconian approach beyond the immediate goals of 
M & N?

Tim Bray writes:
==========
There's one last key point:  If Netscape and Microsoft jump on board,
as they say they will, then no major browser will display non-WF
docs.  So the publishing model can be about the same as it is now;
create the doc, see if it looks OK in Navigator or Explorer, and if
so, ship.  With the knowledge that it's well-formed.  Which means:

- No information provider who does even the most cursory checking
  will publish non-WF docs
- No user will ever be in the position that he can't see an "interesting"
  doc just because it's non-WF, because there won't be any
- And if there are, they will be evidence of either serious breakage
  in the delivery system, or a provider who is so contemptuous of
  quality that
  (a) they can't master balanced-tag + quoted-attribute syntax, and
  (b) don't bother with even a single basic usability check before
      publishing.
  In other words, a bozo, whose output can safely be ignored anyhow.
==========

Anyone who has a single error in his document is a bozo?  Ahem.
I don't buy any of this.  But I've said my piece, I don't think
it matters what you specify, and as this is a vendor-sponsored
forum, perhaps it doesn't matter what the WG says, either.


Regards,

  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com
Received on Tuesday, 6 May 1997 19:39:36 UTC