Re: Error handling: yes, I did mean it from James Clark on 1997-04-21 (w3c-sgml-wg@w3.org from April 1997)

From: James Clark <jjc@jclark.com>
Date: Mon, 21 Apr 1997 16:17:12 +0700
To: w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19970421091712.00f00ef8@jclark.com>
At 10:44 20/04/97 -0700, Tim Bray wrote:

>>Consider
>>unquoted attribute values: an XML processor can easily recover from these.
>
>I disagree.  The reliable presence of quotes makes XML attribute processing
>dramatically easier than SGML's.  Furthermore, 
> <foo attr="val>some more text &amp; markup... 20k before the
> next quotation-mark
>will send most parsers (including at least SP and Lark) into gibber-mode.
>I think an unquoted attribute value is *precisely* the kind of thing
>that should remove any expectation that a document can be processed
>reliably by a receiving application, and which vendors should *not*
>be asked to heuristic (v?) their way around.

Missing closing quotes are not easy to recover from, but an unquoted
attribute value that is legal HTML and SGML such as this

  <foo attr=bar>

certainly is.

>The processor is not forbidden to proceed, looking for more errors (one
>of the things that SP is really good at) - it is just forbidden to
>pretend that everything is OK and go on passing bogus data to the
>application.

I'm not suggesting that the parser pretend everything is OK.  The parser
should tell the application that it has encountered an error, and it should
so before it passes any further data.

>You would be within your rights as an implementor to
>ignore this, but I, as an application builder, would not assign any
>significant client processing role to a component that insisted on
>trying to guess the meaning of egregiously broken data.

If the parser tells you about the error, then you, as an application
builder, can choose to ignore any data sent by the parser after the error.
The parser may even provide you with a way to do that automatically (nsgmls
-E1 will stop after the first error).  I think users and application
builders should have a choice with what they do with invalid data.  I cannot
see how a user or application builder can be disadvantaged by being provided
with this choice, and I therefore plan to continue to provide it even if the
spec says that this is non-conforming.

>>I was simply making the point that the likes of
>>    nsgmls foo.sgm | grep -c "^(BAR$"
>>can be a useful thing to do even if foo.sgm markup contains errors.
>
>Sure; but if you replace "grep -c" with some fancy java applet that
>does a business-critical application, this is no longer useful but
>highly dangerous.

Such an application would probably need to ignore not just the portion of
the document after the first error but the entire document.  So long as
applications are told about the errors, they can choose to do that.  But I
don't see how this advances your argument at all.  All you've shown is that
*some* applications must not process data that is not well-formed.  What
you're proposing is that *all* applications must not process data that is
not well-formed.

James
Received on Monday, 21 April 1997 05:31:07 UTC