- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Sat, 19 Apr 1997 13:46:43 GMT
- To: w3c-sgml-wg@w3.org
<?XML VERSION="1.0"?> <FLAME/> I hope this represents a WF and appropriate message :-) [I certainly didn't want to be confrontational, and I deliberately used epsilon to take any value :-)] So here are some (hopefully constructive) points to provide a meeting point. In message <199704191012.LAA14198@mail.iol.ie> digitome@iol.ie (Digitome Ltd.) writes: > [Peter Murray Rust] > ><AXIOM NEGOTIABILITY="epsilon"> > >My basic tenet is that an XML document is either WF or it isn't. If there is > >one error, then the result is a null document. If that isn't true then I think > >we lose a large number of people who see XML as a robust and reliable way of > >passing information. > ></AXIOM> > > I do not think this is so. I think it is hugely important to say "this is > not well formed" > but overly spartan to then say "so fix it and try again!". > OK. But what happens to the result? I think there are (at least) two possible ways in which XML will be used: human->editor/validator->XML(WWW)->parser/browser+stylesheet [->printer] program ->XML(WWW)->parser(->robot->program)+ I have no problems with the first. Indeed I will probably be as mad as anyone if I download 100Kbyte of XML and the browser says 'sorry :-), goof-up on line 1. Request better document from author'. So it's clear there needs to be a mechanism for this. But people are already starting to think of robotic applications of XML. Here we can see distributed servers with different components. If a user agent (= robot) gets a document (=data to be processed) and this is not WF, then it cannot reliably do anything further with that information. Guessing what the author really meant is a recipe for disaster - we've all seen that with HTML. Remember that we are already designing into XML the ability to carry large collections of links which have to be precise. These will (I assume) be routinely processed by machines, not humans. I can see the value of trying to recover document content *for human consumption*, but not for machine consumption. The danger is that laxness with the first invalidates the value of the second completely. I'm particularly concerned that if non-WF documents get output in apparently WF-form and then *automatically* passed to a second application, the original error warning will be lost. A document with 'recoverable errors' (if this exists) must be indelibly stamped with this and any containing document must be aware of it. Rather like banknotes with 'specimen' stamped over them. I appreciate the fact that it is very difficult to get XML fully specified, and that different parsers will produce different output and that they may converge rather slowly. And there are undefined things like '++a + ++a' in C. But it will be necessary to educate authors about the most difficult areas (whitespace, parameter entities, etc.) rather than leaving these fuzzy and hoping. When the ERB comes up with the final spec about (say) whitespace then we have to implement it as faithfully as we can, compare notes if we differ, and attempt to resolve the issue. P. [I hope it didn't come across that I have anything other than total admiration for James Clark's software, as I have stated before. Indeed I couldn't be using SGML if it didn't exist.] -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Saturday, 19 April 1997 09:19:00 UTC