- From: Jean Paoli <jeanpa@microsoft.com>
- Date: Tue, 29 Apr 1997 20:29:09 -0700
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
In short, we, at Microsoft (I discussed this with a lot of persons), think that we must start with a fresh start with this new great format and define very precisely the error handling strategy for XML. XML is a new open format and writing an XML parser is easy and should stay easy. This is why my position for parsing XML would be : "Error handling in XML should obey to public rules and not to some random heuristics". ( I know this seems unbelievable to have to state such things, but hey!, we know what happens on the WWW) The spec already says that all reportable errors should be reported (under control of a user-settable option). I think vendors and implementors should compete on the quality of error reporting. This is great. But I think it is *essential* to define what happens after the error(s) are reported. I see 2 possibilities: First possibility: The processor stops doing anything, do not build any internal data structure corresponding to the fragment which contains errors and basically the net net is that the erroneous fragment of an XML document is not useable. [Tim Bray proposal] Second possibility: If there is a strong agreement that the XML syntax is too rigid, let us change the XML syntax. This is what I understand when I hear people complaining about things like: "<a><b>xxxx </a>" by saying " this is obvious that it means <a><b>xxx</b></a>" So technically, this is not error recovery. For example, we can state that if: 1/ A tag is not closed <a> <b> xxxx </a> <b> is automatically closed before <a> is closed 2/ A tag is not closed and we hit EOF <a> <b> xxxx </b>EOF <a> is automatically closed 3/ Extra end tags <a><b> xxxx </b> </c> </a> </c> are skipped 4/ Etc etc ..... you see the problem here. It seems to me that it is very defficult to propose easy rules. But I am open to any suggestion. To be short, I think that it is *essential* to state, from the beginning, what happens after the error(s) are reported. I do not buy Sperberg-McQueen saying that the problem in the HTML browsers was not error recovery but the lack of error messages. Error messages could always be turned off. Error messages means UI but there is a tons of applications of XML which do NOT have UI. (CDF, Database applications). Error messages could happen in the middle of complex scripts and sometime you could skip them in the code. When you describe mission-critical information like financial applications, you do not expect any error recovery. When you write Java or C code, you do not expect error recovery. I think also that we must give a sign, a direction to the web community, being hardcore, from the beginning. If we do not do that, this means that incorrect documents are going to be published, that they will stay on the web because some tools are going to be able to display or process them whithout requiring that their author modifies them. This means that heuristics are going to be used because users are going to find them as they are invented by tool providers. Sounds familiar. I hope we will not go this way for XML. -Jean > ---------- > From: Michael Sperberg-McQueen[SMTP:U35395@UICVM.UIC.EDU] > Sent: Saturday, April 26, 1997 4:15 PM > To: W3C SGML Working Group > Subject: Re: Error handling: yes, I did mean it > > Summary: we cannot in practice require that XML processors ignore or > discard data following the first detected error; as a result we should > not try to do so, even if doing so were a good idea (which it is not). > > Tim has suggested, and a number of people have supported the idea, > that > after detecting a violation of a well-formedness constraint an XML > processor be required to stop sending information to the downstream > application. A number of people have already argued against this > idea, > using arguments I agree with and won't repeat. Here I just want to > point out that (with a single exception) neither Tim nor anyone else > has > made any argument that, even if taken at face value, would lead to the > conclusion that this is a good idea. > > The sole exception is Tim's rhetorical question "can any application > hope to do anything useful with ill-formed data?" to which the only > realistic answer is 'Yes, of course, many applications do hope to do > useful things with ill-formed data, and some of them are right.' James > gave this answer, and no one has attempted any refutation, so I won't > say any more about it. > > Otherwise, the arguments of the Draconian camp are all centered around > the unquestioned observations that > - there are applications where ill-formed data is useless or worse > than useless, and where ill-formedness must be detected > - by their unwillingness to issue error messages, and their > determination to provide attractive displays even of badly ill-formed > documents, HTML browser makers have made their own lives very > difficult > > Neither of these observations supports a blanket ban on error recovery > by > XML processors. > > Tim and others have, in the meantime, conceded that some applications > can usefully attempt error recovery, and hope to salvage the Draconian > Rule by suggesting that such applications should use programs which > aren't 'XML procesors' in the strict sense. This amounts to saying > "implementors can pick and choose which parts of XML to implement, and > can keep themselves blameless even when flouting basic requirements of > the spec, if only they call themselves XML Handlers or some other name > instead of 'XML processors'". I cannot think of a worse approach to > the problem of ensuring uniform error reporting by XML software. > > Whether it's possible to prevent vendors from attempting to compete on > the basis of the quality of their error recovery, I don't know. I > doubt > it. I also don't see why it's necessary to prevent it. It's not the > error recovery in HTML browsers that has led HTML to its current > state, > but the *silence* of that error recovery. We complain that most > authors > validate by looking at the document onscreen -- what else do we want? > I > do that myself, in SGML. Yes, I do check the return code from the > parser, but I also check to see that everything looks all right -- if > it > doesn't, the validity of the document is deceptively hiding errors in > the tagging or in the style sheet. The only thing wrong with checking > by visual inspection is that in most HTML browsers it's not a > sufficient > check. An author who does want to find errors can't do so with the > software at hand, because the browser won't report them. > > So I agree with whoever it was who said that the real problem is the > absence of an error-reporting mode in HTML browsers. > > If this is true, then what we need to do is to ensure that XML > processors *always* allow the user to request error reports, even if > the > software recovers from the errors in question. That way, the user who > says "program X displays my data all right, why don't you?" can be > told > "look, even program X says your document is ill-formed: look at it > with > error-checking turned on!" > > As it happens, the xml-lang spec already requires this. I don't think > it can realistically or usefully require more, except perhaps that it > should also explicitly require that the processor notify any > down-stream > app, as well as the 'end' user if any. I don't think it should > require > less. > > If we want the culture of XML usage to differ from that of HTML, we > need > to ensure that implementors pay attention to the requirement that they > report reportable errors unless the user says not to. We can do that > by > complaining unmercifully about any implementation that doesn't provide > error reporting, and by pointing out -- correctly -- that it's not a > conforming implementation of XML. > > -C. M. Sperberg-McQueen >
Received on Tuesday, 29 April 1997 23:29:11 UTC