- From: Tim Bray <tbray@textuality.com>
- Date: Tue, 15 Oct 1996 21:16:50 -0700
- To: w3c-sgml-wg@w3.org
I can't believe I'm addressing this again. But in a lengthy discussion of the Vancouver SGML ERB/WG caucus this afternoon, Peter Sharpe dreamed up the following and won't have time to post it, so I agreed to. So it's all his fault. It smells to me like it might work. 1. If you have a DTD and you know where element content is, you lose all white space in element content. 2. DTD or none, if a line contains some markup, and aside from that only white space, you lose said white space and the trailing record boundary character(s). "Markup" meaning tags, comments, and PIs. 3. Every byte in mixed or PCDATA content that is not lost in this fashion and is not markup is passed to the application. "Lose" means "don't pass to the application". This has the virtues that - it can be explained *very* briefly - the behavior is an awful lot like what ordinary people think that 8879 is trying to do - it's easy to build - it allows users to put all sorts of gratuitous white space in their data without getting in the way - it doesn't use the [inaccurate and counter-intuitive to most programmers] terms "RS" and "RE" On the downside, I suspect that this will eat a few white spaces between tags and RE's, and around comments and PIs, and maybe a few REs after comments and PIs, that a real SGML parser would pass on. But (a) few will ever notice, and (b) those that do will be surprised at the SGML behavior. It may not be perfect. But it does provide an example of the maximum level of complexity in this area that I for one am willing to tolerate in XML. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-488-1167
Received on Wednesday, 16 October 1996 00:17:14 UTC