- From: Tim Bray <tbray@textuality.com>
- Date: Fri, 18 Apr 1997 22:35:26 -0700
- To: w3c-sgml-wg@w3.org
In recent discussions, some but not all at the recent WWW6 conference, it has become apparent that we have an opportunity, if we act now, to avoid one of the big problems that has caused HTML a lot of grief. This is the area of error-handling. HTML doesn't have any. As a result, the browser and tool vendors are stuck on an endless treadmill of trying to enhance the system while at the same time handling any and all collections of bytes that Netscape 1.X did. Get a couple of beers into anyone from the big N or the big M and you'll see some real tears over this. In my former life as a Web indexer, I cried some of those tears myself. So let's not let it happen again. The subject is violations of well-formedness. Well-formedness should be easy for a document to attain. In XML, documents will carry a heavy load of semantics and formatting, attached to elements and attributes, probably with significant amounts of indirection. Can any application hope to accomplish meaningful work in this mode if the document does not even manage to be well-formed!?!? I suggest that we add language to section 5, "conformance", which says: "An XML processor which encounters a violation of the constraints of well-formedness must not thereafter pass any information about text or markup to the application. It must pass to the application a notification of the first such violation encountered. It MAY thereafter, at user option, pass to the application information about well-formedness violations encountered after the first." [or in English: you gotta tell the app about the first syntax botch you hit; you're allowed to send the app more error messages, but you're not allowed to send anything but error messages after you've detected an error] If we wanted to avoid phrasing this in terms of the actions of a processor (which might be a good idea in general for the spec) we could redefine "markup" and "character data" in such a way that they are considered not to exist in a document which is not well-formed. Some might argue that this violates the Internet creed: "Be conservative in what you supply, and liberal in what you accept." I can live with that: the consequences of the second half of that creed have led to intolerable results in the quality and usability of the data on the Net. Furthermore, if you want to serve up ill-formed dogshit, this will presumably remain possible, because: "text/html means never having to say you're sorry." Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592
Received on Saturday, 19 April 1997 01:37:01 UTC