Feedback on the spec from Tim Bray on 1996-11-17 (w3c-sgml-wg@w3.org from November 1996)

From: Tim Bray <tbray@textuality.com>
Date: Sat, 16 Nov 1996 19:44:40 -0800
To: w3c-sgml-wg@w3.org
Message-Id: <3.0b33.32.19961116131938.009e8804@pop.intergate.bc.ca>

Lots of good comments on the spec; for which, thanks.  I am still unconvinced 
by Gavin's jeremaiads on the terrible evil of our <?XML encoding=?>
technique, 
but keep trying Gavin, the ERB is demonstrably responsive to WG misgivings.

However, even though I voted for the -XML-SPACE trick, and actually drafted
that part of the spec myself, upon re-reading it, and listening to David
Durand's arguments, it now feels intolerably inelegant.  Look, HTML processors
are going to do what they do with spaces.  Full-text indexers and authoring
tools and database engines are all going to do radically different sets of 
things.  I now fail to see why an XML processor should get in the way of what 
an application wants to do with white space, particularly since both WG8 and 
now (in my view) the ERB have failed to come up with a simple, clean way to 
specify anything more useful.  And not for lack of effort.

Also - a very material problem - with the current language, it is simply
impossible to base a full-text indexer on an XML parser; indexers often
need to know the byte offsets of words in entities.  OK, there are other
problems: the processor needs to provide more data, e.g. lengths of excised 
comments and entity references, but these can be added without breaking the 
spec - the application of -xml-space="COLLAPSE" to any element fatally 
cripples a full-text indexer.  For this reason, if we must retain COLLAPSE, 
the spec should say that the application can cause the processor to ignore 
this behavior.

So let's lose the Space Handling bit.  I would retain the provision
that *if* there's a DTD *and* you know you're in Element Content, the
processor must inform the application of this since the application could 
thus know that any white space cannot be character data.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

Received on Saturday, 16 November 1996 22:50:12 UTC