- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Mon, 28 Apr 1997 13:57:37 GMT
- To: w3c-sgml-wg@w3.org
In message <199704262345.TAA19868@www10.w3.org> Michael Sperberg-McQueen writes: [...] > > Otherwise, the arguments of the Draconian camp are all centered around From the Encyclopedia Britannica [about Draco's code] <Q>for nearly all crimes there was the same penalty of death</Q> (Plut. <I>Solon</I>) - documents? authors? both? Constructively: [...] > > Tim and others have, in the meantime, conceded that some applications > can usefully attempt error recovery, and hope to salvage the Draconian > Rule by suggesting that such applications should use programs which > aren't 'XML procesors' in the strict sense. This amounts to saying > "implementors can pick and choose which parts of XML to implement, and > can keep themselves blameless even when flouting basic requirements of > the spec, if only they call themselves XML Handlers or some other name > instead of 'XML processors'". I cannot think of a worse approach to > the problem of ensuring uniform error reporting by XML software. My main problem is that I see 'applications', 'parsers' or whatever being complex aggregates of components. If these components take different views on error handling (or anything else for that matter) information can become corrupted. As a proto-implementor I think any graduated policy of error-handling is going to be complex. Assume that a user-agent consists of a parser+browser+transformation module. Assume that these have been written by different groups, and that some run without human intervention. Then all have to have the same view of error processing. If not, we can assume that the quality of the result is at least as bad as, and probably worse than, the least Draconian component. If that component does not interact with humans it may even be unclear that an error is being 'laundered'. At present we might have document -> NXP -> JUMBO -> human -> output For a WF document, the display represents the document faithfully and there is a 'Save as XML' button if required. At present a non-WF document causes NXP to terminate, so JUMBO could only process what it's got up to then. It chooses not to (since elements are likely to have semantics dependent on their context in the whole document.) If the parser sends JUMBO a non-WF document, JUMBO has to *at least* remember that the document was not WF. Every action during that session, and any persistent record of that session must *at least* have the capability of remembering that the document or its products came from a non-WF source. Therefore JUMBO has to have: - a flag for *humans* which (IMO) requires action from the humans before it proceeds. (Not just a 1-second banner at the bottom saying the input was not-WF). - a different flag for automatic processing (JUMBO can be run in 'batch' if required). - a stamp on saved documents stating that they came from non-WF documents. It gets worse when the document is used to collect linked documents. For example: 'This document contains links to the current rates of exchange, ...' In this case incorrect WF documents could be retrieved automatically as part of this process. The collection might have to have a stamp saying 'the documents in here are WF but we aren't sure if they are the right ones'. The point I'm making is that *as an implementor* this looks like a lot of work and a lot of opportunities to go wrong. I'm not saying it's not possible, but it's some way from the CS grad student (who never makes mistakes anyway). So IF the policy is to be non-Draconian, then it is critical to enumerate the possibilities where it may go wrong. [...] > > If this is true, then what we need to do is to ensure that XML > processors *always* allow the user to request error reports, even if the ^^^^^^^^^^^^^^^ As always, we have to expect that the 'user' may be a robot, agent or other piece of software and it doesn't understand 'looks all right'. So it needs a very clear indication not to try anything clever. My problem is that the same pieces of code in applications will be used by programs as by humans. Application programmers may not always realise this. > software recovers from the errors in question. That way, the user who > says "program X displays my data all right, why don't you?" can be told > "look, even program X says your document is ill-formed: look at it with > error-checking turned on!" > > As it happens, the xml-lang spec already requires this. I don't think > it can realistically or usefully require more, except perhaps that it > should also explicitly require that the processor notify any down-stream > app, as well as the 'end' user if any. I don't think it should require > less. Agreed. I'm just worried that the error flag drops off somewhere along the way, especially if we haven't given it a vehicle. For example, every 'Save as XML' button should tell the 'user' "This came from a corrupt document, do you *really* want to save it?" A related concern is that some software might assume that missing elements were relatively unimportant. e.g. <CHAPTER> [10 bytes garbled here..., do you want to continue?...] <P>It doesn't matter that ... [important announcement]...</P> [ 10 more bytes of garbled data, continue to ignore errors?...] This could look harmless to a human - 10 bytes is not much - but if they were (say) <REVOKED and </REVOKED (the parser isn't clever enough to insert the '>' and omits the malformed tags) it could be disastrous. > > If we want the culture of XML usage to differ from that of HTML, we need > to ensure that implementors pay attention to the requirement that they > report reportable errors unless the user says not to. We can do that by > complaining unmercifully about any implementation that doesn't provide > error reporting, and by pointing out -- correctly -- that it's not a > conforming implementation of XML. I agree completely with this. On a practical level I doubt if the early prototype tools (at least from amateurs like myself) will manage thorough error-handling in the way that is being required. In my case I have only two options Draconian, or fuzziness. For the sake of the cause I will have to stick with the former :-) P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Monday, 28 April 1997 11:41:10 UTC