- From: Sarah Slocombe <sarah@attd.com>
- Date: Wed, 23 Apr 1997 23:00:52 -0400
- To: w3c-sgml-wg@w3.org
Greetings. I will cease my usual lurking and post as, like Lauren, I feel particularly strongly about this topic. At 10:44 AM 20/04/97 -0700, Tim Bray wrote: >To summarize: I proposed that XML processors be required to stop >passing data (other than error notifications) to applications after the >first violation of well-formedness. The concern has been that document information after the first error may be of value to the user. At 04:08 PM 21/04/97 GMT, Peter Murray-Rust wrote: >In message <1.5.4.32.19970421065409.0069c20c@mail.u-net.com> Martin Bryan writes: ... >> There is a difference between loss of markup and loss of data. Whilst both >> consititute information, no data should be discarded just because there is >> an error in a piece of markup. XML should at very least retain the following >> data as part of the last validly opened element. > >Although I'm not an SGML expert, I take a different view, in that markup and >data are both essential parts of the document. I am prepared to write the >following: Like Peter, I must disagree. How much of the really important information in a document is held as content and how much is carried by attribute values or nesting of elements would depend on the DTD (or less formally, the use at hand). The one thing of which I think we may be certain is that XML users will deal in STRUCTURED information or they would use raw Unicode. Passing only content in the case of a document which purports to be XML seems to me a particularly poor idea. At 03:08 PM 19/04/97 +0100, Sean Mc Grath wrote: ... >One of the cool things about XML as a >document >format is that some of the content can be recovered even in the face of >error. Compare this >to our binary document friends where a blown byte can render the entire content >inaccessible. So true. And there are certainly numerous opportunities to exploit this advantage. But we should not call recovered content "XML." At 04:17 PM 21/04/97 +0700, James Clark wrote: ... >I think users and application >builders should have a choice with what they do with invalid data. I cannot >see how a user or application builder can be disadvantaged by being provided >with this choice, and I therefore plan to continue to provide it even if the >spec says that this is non-conforming. Quite reasonable. But could such an application not be considered a tolerant XML-processing application consisting of a strict XML processor plus smarts to handle weird cases? I believe the distinction between "XML processor" and "XML application" is key. An XML application need not be restricted to an XML processor plus some display bits and an XML processor refusing to accept what it deems a "broken" document does not preclude some other subsystem having a crack at it. I think, too, we should stop talking about "broken" XML documents. A document must either be or not be XML and "A textual object is an XML document if it is either valid or well-formed." If we broaden this to say "valid or well-formed, except when it isn't quite" then all the documents in the world are XML and that isn't terribly useful. So the issue of what an XML processor does with a "broken" document becomes the issue of what an XML processor does with a non-XML document. And Tim's proposal becomes "do nothing at all save report the earliest (and possibly subsequent) evidence that it IS a non-XML document." At 09:41 AM 23/04/97 -0400, Dave Petersen wrote: ... >Are you really still prohibiting the *parser* from attempting to make >what sense it can from the "remaining text"? Sounds like that means >each application that wanted to do what it could would have to have >a built-in error-correcting parser, and the "real" parser and the >application would have to pass the material up to the first error >(which has been already parsed and is not part of the "remaining text" >as defined). Yes, exactly. I very much doubt that anyone cares whether, behind the scenes, these two functions are one big jumble of code. But this XML application should not be represented as simply an XML processor. Some people DO want software that guesses. Should the fact that an application's guessing at a relationship to XML make it an XML processor? Regardless of what an XML application does with non-XML documents, it MUST correctly process well-formed documents. If an individual application developer wishes to go further, super. But the handling of non-XML documents must not be a REQUIREMENT of a conforming XML system. I cannot see how we can therefore INSIST on passing more than error messages, in the case of an invalid document. If we do not restrict conforming XML processors to passing valid data, we're adding one whopping great optional feature. This strikes at the heart of the XML design goals. Recall, "The number of optional features in XML is to be kept to the absolute minimum, ideally zero" (5). Then too, insistance on well-formedness is one of the key factors ensuring "It shall be easy to write programs which process XML documents" (4). As I understand it, Tim's proposal restricts only PARSER behaviour, not all XML applications and in that light, I support it whole-heartedly. Indeed, James' assertion gives me confidence that level-headed tools developers such as he will take the spec as a starting point rather than a final goal. How difficult it would be to create an interface between an XML processor and clever, error-correcting bits would depend on just what error information the processor hands off and I suggest we move on to developing a recommendation. Sarah Slocombe sarah@attd.com
Received on Wednesday, 23 April 1997 22:59:00 UTC