Re: Error handling: yes, I did mean it from Sarah Slocombe on 1997-04-24 (w3c-sgml-wg@w3.org from April 1997)

From: Sarah Slocombe <sarah@attd.com>
Date: Wed, 23 Apr 1997 23:00:52 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <3.0.32.19970423225929.0077807c@mail.lglobal.com>
Greetings. I will cease my usual lurking and post as, like Lauren, 
I feel particularly strongly about this topic.

At 10:44 AM 20/04/97 -0700, Tim Bray wrote:
>To summarize: I proposed that XML processors be required to stop
>passing data (other than error notifications) to applications after the
>first violation of well-formedness. 

The concern has been that document information after the first error 
may be of value to the user.

At 04:08 PM 21/04/97 GMT, Peter Murray-Rust wrote:
>In message <1.5.4.32.19970421065409.0069c20c@mail.u-net.com> Martin Bryan
writes:
...
>> There is a difference between loss of markup and loss of data. Whilst both
>> consititute information, no data should be discarded just because there is
>> an error in a piece of markup. XML should at very least retain the
following
>> data as part of the last validly opened element.
>
>Although I'm not an SGML expert, I take a different view, in that markup and
>data are both essential parts of the document.  I am prepared to write the 
>following:

Like Peter, I must disagree. How much of the really important information 
in a document is held as content and how much is carried by attribute values 
or nesting of elements would depend on the DTD (or less formally, the use 
at hand). The one thing of which I think we may be certain is that XML 
users will deal in STRUCTURED information or they would use raw 
Unicode. Passing only content in the case of a document which purports 
to be XML seems to me a particularly poor idea. 

At 03:08 PM 19/04/97 +0100, Sean Mc Grath wrote:
...
>One of the cool things about XML as a
>document
>format is that some of the content can be recovered even in the face of
>error. Compare this
>to our binary document friends where a blown byte can render the entire
content
>inaccessible.

So true. And there are certainly numerous opportunities to exploit this 
advantage. But we should not call recovered content "XML." 

At 04:17 PM 21/04/97 +0700, James Clark wrote:
...
>I think users and application
>builders should have a choice with what they do with invalid data.  I cannot
>see how a user or application builder can be disadvantaged by being provided
>with this choice, and I therefore plan to continue to provide it even if the
>spec says that this is non-conforming.

Quite reasonable. But could such an application not be considered a 
tolerant XML-processing application consisting of a strict XML processor 
plus smarts to handle weird cases? 

I believe the distinction between "XML processor" and "XML application" 
is key. An XML application need not be restricted to an XML processor 
plus some display bits and an XML processor refusing to accept what it 
deems a "broken" document does not preclude some other subsystem having 
a crack at it. 

I think, too, we should stop talking about "broken" XML documents. A 
document must either be or not be XML and "A textual object is an XML 
document if it is either valid or well-formed." If we broaden this to say 
"valid or well-formed, except when it isn't quite" then all the documents 
in the world are XML and that isn't terribly useful.

So the issue of what an XML processor does with a "broken" document 
becomes the issue of what an XML processor does with a non-XML 
document. And Tim's proposal becomes "do nothing at all save report 
the earliest (and possibly subsequent) evidence that it IS a non-XML 
document." 

At 09:41 AM 23/04/97 -0400, Dave Petersen wrote:
...
>Are you really still prohibiting the *parser* from attempting to make
>what sense it can from the "remaining text"?  Sounds like that means
>each application that wanted to do what it could would have to have
>a built-in error-correcting parser, and the "real" parser and the
>application would have to pass the material up to the first error
>(which has been already parsed and is not part of the "remaining text"
>as defined).

Yes, exactly. I very much doubt that anyone cares whether, behind the 
scenes, these two functions are one big jumble of code. But this XML 
application should not be represented as simply an XML processor. 
Some people DO want software that guesses. Should the fact that an 
application's guessing at a relationship to XML make it an XML 
processor? 

Regardless of what an XML application does with non-XML documents, 
it MUST correctly process well-formed documents. If an individual 
application developer wishes to go further, super. But the handling of 
non-XML documents must not be a REQUIREMENT of a conforming XML 
system. I cannot see how we can therefore INSIST on passing more than 
error messages, in the case of an invalid document. 

If we do not restrict conforming XML processors to passing valid data, 
we're adding one whopping great optional feature. This strikes at the heart 
of the XML design goals. Recall, "The number of optional features in XML 
is to be kept to the absolute minimum, ideally zero" (5). Then too,
insistance 
on well-formedness is one of the key factors ensuring "It shall be easy to 
write programs which process XML documents" (4). 

As I understand it, Tim's proposal restricts only PARSER behaviour, not all 
XML applications and in that light, I support it whole-heartedly. Indeed, 
James' assertion gives me confidence that level-headed tools developers 
such as he will take the spec as a starting point rather than a final goal.
How 
difficult it would be to create an interface between an XML processor and 
clever, error-correcting bits would depend on just what error information 
the processor hands off and I suggest we move on to developing a 
recommendation.


Sarah Slocombe
sarah@attd.com
Received on Wednesday, 23 April 1997 22:59:00 UTC