- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: 06 Feb 2003 17:07:47 +0000
- To: Richard Tobin <richard@cogsci.ed.ac.uk>
- Cc: www-tag@w3.org
Richard Tobin <richard@cogsci.ed.ac.uk> writes: > > I am unconvinced of the necessity for subsetting the language, as > > opposed to identifying a new conformance class alongside the two > > already provided ('validating' and 'non-validating'). > > > > Call such a conformance class 'minimal' -- it can be trivially defined > > as a further restriction of 'non-validating', as follows (this is an > > edited copy of text from section 5.1 of XML 1.0 2e [1], changes in > > bold): > > I think the minimal conformance Henry describes is either too minimal > or not minimal enough. It ignores all entity declarations, and > doesn't provide attribute defaults, but it still requires parsing > ATTLIST declarations to do attribute normalization. If we allow > processors to ignore part of the internal subset, we might as well > allow them to ignore the whole thing so that they don't even have to > parse it. Attribute normalization seems the least valuable of the > three since it can usually be handled in the application. > > (Skipping the internal subset is just a few lines of code: you have to > watch out for comments and quoted literals.) I'm happy with this too -- I made the proposal in the form I did to catch all and only what I understood the SOAP requirements to be, but I suspect you're right that just skipping the whole thing is better. > The choice between defining a subset and defining a conformance level > is a choice between two reductions in interoperability: we break > either the rule that all parsers can handle all documents, or the rule > that all parsers produce the same infoset for a given document. Of > course, both these rules are already broken. The first is broken by > unsupported encodings and URI schemes, the second by parsers that > don't read the external subset or external general entities. I take the former breakage to be quite different in kind from the second -- unsupported encodings and URI schemes do not in practice compromise my ability to get my XML through every processor there is, because the REC provides a minimal bar which all processors must step up to. The second is a very different kind of variation -- it's pervasive, widely experienced, and mostly well-understood. By exploiting this my proposal amounts to suggesting that we can meet the requirements which the proponents of a subset are advancing, while still preserving the coherence of XML as a type of document. Meta point: I'm concerned that the TAG asked the Core WG to explore a solution, instead of asking that they explore satisfying a requirement. As put to Core, the request would on the face of it not even allow for exploration of the kind of approach I've suggested, since that approach doesn't define a subset at all. None-the-less I believe it does satisfy the requirement as I understand it, with fewer negative side-effects than any subset approach would have. ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Thursday, 6 February 2003 12:07:37 UTC