Re: Error handling: yes, I did mean it
Greetings. I will cease my usual lurking and post as, like Lauren,
I feel particularly strongly about this topic.
At 10:44 AM 20/04/97 -0700, Tim Bray wrote:
>To summarize: I proposed that XML processors be required to stop
>passing data (other than error notifications) to applications after the
>first violation of well-formedness.
The concern has been that document information after the first error
may be of value to the user.
At 04:08 PM 21/04/97 GMT, Peter Murray-Rust wrote:
>In message <firstname.lastname@example.org> Martin Bryan
>> There is a difference between loss of markup and loss of data. Whilst both
>> consititute information, no data should be discarded just because there is
>> an error in a piece of markup. XML should at very least retain the
>> data as part of the last validly opened element.
>Although I'm not an SGML expert, I take a different view, in that markup and
>data are both essential parts of the document. I am prepared to write the
Like Peter, I must disagree. How much of the really important information
in a document is held as content and how much is carried by attribute values
or nesting of elements would depend on the DTD (or less formally, the use
at hand). The one thing of which I think we may be certain is that XML
users will deal in STRUCTURED information or they would use raw
Unicode. Passing only content in the case of a document which purports
to be XML seems to me a particularly poor idea.
At 03:08 PM 19/04/97 +0100, Sean Mc Grath wrote:
>One of the cool things about XML as a
>format is that some of the content can be recovered even in the face of
>error. Compare this
>to our binary document friends where a blown byte can render the entire
So true. And there are certainly numerous opportunities to exploit this
advantage. But we should not call recovered content "XML."
At 04:17 PM 21/04/97 +0700, James Clark wrote:
>I think users and application
>builders should have a choice with what they do with invalid data. I cannot
>see how a user or application builder can be disadvantaged by being provided
>with this choice, and I therefore plan to continue to provide it even if the
>spec says that this is non-conforming.
Quite reasonable. But could such an application not be considered a
tolerant XML-processing application consisting of a strict XML processor
plus smarts to handle weird cases?
I believe the distinction between "XML processor" and "XML application"
is key. An XML application need not be restricted to an XML processor
plus some display bits and an XML processor refusing to accept what it
deems a "broken" document does not preclude some other subsystem having
a crack at it.
I think, too, we should stop talking about "broken" XML documents. A
document must either be or not be XML and "A textual object is an XML
document if it is either valid or well-formed." If we broaden this to say
"valid or well-formed, except when it isn't quite" then all the documents
in the world are XML and that isn't terribly useful.
So the issue of what an XML processor does with a "broken" document
becomes the issue of what an XML processor does with a non-XML
document. And Tim's proposal becomes "do nothing at all save report
the earliest (and possibly subsequent) evidence that it IS a non-XML
At 09:41 AM 23/04/97 -0400, Dave Petersen wrote:
>Are you really still prohibiting the *parser* from attempting to make
>what sense it can from the "remaining text"? Sounds like that means
>each application that wanted to do what it could would have to have
>a built-in error-correcting parser, and the "real" parser and the
>application would have to pass the material up to the first error
>(which has been already parsed and is not part of the "remaining text"
Yes, exactly. I very much doubt that anyone cares whether, behind the
scenes, these two functions are one big jumble of code. But this XML
application should not be represented as simply an XML processor.
Some people DO want software that guesses. Should the fact that an
application's guessing at a relationship to XML make it an XML
Regardless of what an XML application does with non-XML documents,
it MUST correctly process well-formed documents. If an individual
application developer wishes to go further, super. But the handling of
non-XML documents must not be a REQUIREMENT of a conforming XML
system. I cannot see how we can therefore INSIST on passing more than
error messages, in the case of an invalid document.
If we do not restrict conforming XML processors to passing valid data,
we're adding one whopping great optional feature. This strikes at the heart
of the XML design goals. Recall, "The number of optional features in XML
is to be kept to the absolute minimum, ideally zero" (5). Then too,
on well-formedness is one of the key factors ensuring "It shall be easy to
write programs which process XML documents" (4).
As I understand it, Tim's proposal restricts only PARSER behaviour, not all
XML applications and in that light, I support it whole-heartedly. Indeed,
James' assertion gives me confidence that level-headed tools developers
such as he will take the spec as a starting point rather than a final goal.
difficult it would be to create an interface between an XML processor and
clever, error-correcting bits would depend on just what error information
the processor hands off and I suggest we move on to developing a