- From: Tim Bray <tbray@textuality.com>
- Date: Wed, 11 Dec 1996 12:01:33 -0800
- To: w3c-sgml-wg@w3.org
On September 11th, the ERB spent considerable time revisiting the RS/RE question. We're just a fun-loving bunch, and it became apparent that the WG is going to have to put some more thought into this. At the moment, we have the -XML-SPACE mechanism, which toggles two behaviors: collapsing of leading and multiple spaces (essentially the HTML semantic), and passing through all the bytes to the application. Some feel the mechanism is unaesthetic and prone to misinterpretation as it stands, and should simply be discarded, with all non-markup bytes being passed to the application to do with as it will. This can be made 8879-compliant in the short term via a mechanism proposed by Charles Goldfarb, and in the medium/long term with a TC via WG8. There are some problems with both the current and revised approaches: o -XML-SPACE, although this is not documented, really only deals with mixed content; many feel it's important to ignore white space in element content; <list> <item>..</item> <item>..</item> </list> ^^ e.g. the above but XML, when there's no DTD, doesn't know where element content is and *cannot* be made to do this. o SGML's world-view tripartions the set of characters: those that are text, those that are markup, and those that are insignificant white space. Can XML really afford to discard this distinction? o Many real-world editors, largely to deal with the fact that text (whether or not we like it) is stored in files in what amounts to a series of records, freely insert line breaks and other white space because they know SGML processors will ignore it. Can we afford to make that white space significant? o Some applications, e.g. full-text indexers, really need to know where everything is by byte offset, whether or not the bytes are significant; thus the -XML-SPACE="COLLAPSE" behavior means they can't read the text with an XML processor (unless they can turn off -XML-SPACE processing through the API) So there are a few things we could do, which are not entirely mutually exclusive. 1. Go to the RE Delenda Est model. This has the advantage that it's trivially easy to explain, document, and implement. It has some of the disadvantages listed above; there is some very strong sentiment on the ERB against this - look from a follow-up from other ERB-folk. 2. Expand the -XML-SPACE attribute from two values to three. The third would be named REMOVE or DISCARD or something, and would be designed to signal element content, i.e. all this white space can be safely ignored. 3. Add language to the spec allowing the application to force the processor to pass through all the bytes regardless of the -XML-SPACE setting. Your input is requested. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-488-1167
Received on Wednesday, 11 December 1996 15:01:58 UTC