- From: Tim Bray <tbray@textuality.com>
- Date: Sat, 16 Nov 1996 19:44:40 -0800
- To: w3c-sgml-wg@w3.org
Lots of good comments on the spec; for which, thanks. I am still unconvinced by Gavin's jeremaiads on the terrible evil of our <?XML encoding=?> technique, but keep trying Gavin, the ERB is demonstrably responsive to WG misgivings. However, even though I voted for the -XML-SPACE trick, and actually drafted that part of the spec myself, upon re-reading it, and listening to David Durand's arguments, it now feels intolerably inelegant. Look, HTML processors are going to do what they do with spaces. Full-text indexers and authoring tools and database engines are all going to do radically different sets of things. I now fail to see why an XML processor should get in the way of what an application wants to do with white space, particularly since both WG8 and now (in my view) the ERB have failed to come up with a simple, clean way to specify anything more useful. And not for lack of effort. Also - a very material problem - with the current language, it is simply impossible to base a full-text indexer on an XML parser; indexers often need to know the byte offsets of words in entities. OK, there are other problems: the processor needs to provide more data, e.g. lengths of excised comments and entity references, but these can be added without breaking the spec - the application of -xml-space="COLLAPSE" to any element fatally cripples a full-text indexer. For this reason, if we must retain COLLAPSE, the spec should say that the application can cause the processor to ignore this behavior. So let's lose the Space Handling bit. I would retain the provision that *if* there's a DTD *and* you know you're in Element Content, the processor must inform the application of this since the application could thus know that any white space cannot be character data. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-488-1167
Received on Saturday, 16 November 1996 22:50:12 UTC