- From: <lee@sq.com>
- Date: Thu, 8 May 97 23:03:11 EDT
- To: w3c-sgml-wg@w3.org
[search for "Proposal" if you are in a hurry] > But are you willing to bet me your life that the average parser writer > will correctly guess which well-formed string from among those given No. But the application may be able to, given enough information. I have in the past (a long time ago!) used versions of YACC that did optional error recovery by inserting or deleting symbols. A C compiler that used this technology did not generate code if there were errors, but it _did_ give much better second and subsequent error messages. Most modern C compilers do this sort of error recovery, now forbidden in XML. And if the parser writer is not in fact "average" but "excellent" or "experienced", do you still want to forbid that person from using XML in environments where fatal errors are not the correct approach? If I write an XML parser in C that says if (foundAnError) { /* sneer at the user */ Error("You dildo! You gave me a bad file!\n"); exit(1); } now (1) I have written (as I understand it) a conformant XML parser, and am correctly passing to the user the first error (which is that the user is stuid) and then exiting; (2) the application using the parser also exits at this point -- if it's an editor, no chance to save work; (3) the user probably doesn't like this program. I know _I_ wouldn't. Even SGML does not forbid error recovery. In fact, SGML's behaviour on incorrect input isn't defined in the standard. This is why Author/Editor can do error recovery and still be a conforming SGML System (it says it is, on the splash screen, Eve!) -- and so can NSGMLS, SPAM, Panorama, Omnimark and HoTMetaL (OK, HoTMetaL is an SGML Application) (swap Application and System if I have them the wrong way round, I can never remember the obscure terminology, and reading the definitions in the standard didn't help me!). SGML defines the concept of conforming (valid) documents. A system that works with conforming documents and says so is a conforming SGML application [4.50, 4.51]. The standard says that it has to _require_ documents to be conforming. Hence, a system that works with non-SGML docuemnts (however close those may happen to be to actual conforming SGML documents) is not an SGML application, and can do what it likes. But the standard in no way precludes a pre-processing phase that takes a document and turns it into a conforming SGML document. Hence, just because HoTMetaL can read Microsoft Word files dosn't mean it isn't an SGML application (or system or whatever), even though most Word files are not conforming SGML documents. So I don[t accept arguments based on ``this is what SGML does, we need to make the web as robust as SGML'' because this simply isn't true. Existing SGML software often does error correction, and proceeds past the first error. It generally doesn't do it silently, though. I'd hate to see James have to take out the error handling in NSGMLS which is so useful, for example. On the other foot, neither does it make sense to _require_ any kind of error correction. You'd make it too hard to write XML parsers for small applications. Proposal: So it seems clear to me that (1) implementers should be encouraged to report errors wherever it makes sense to do so. (2) Validating Parsers must indicate whether a document is conforming or not both at the point of the first detected error that precludeth conformance, and also at the end of processing, shoudl that be at some other juncture. (3) No file or collection of files can be said to constitute an XML document if they are not in fact conformant. They must be well formed, and, if a complete DTD is supplied, entirely valid. (4) The XML specification should go no further than this. Lee > (and from the infinite number of other well-formed strings that could be > transformed into the original string by interruptions in the > transmission) was 'intended' by whatever created the original ill-formed > example? > > > -C. M. Sperberg-McQueen > ACH / ACL / ALLC Text Encoding Initiative > University of Illinois at Chicago > tei@uic.edu > >
Received on Thursday, 8 May 1997 23:03:15 UTC