- From: Joe English <jenglish@crl.com>
- Date: Mon, 30 Sep 1996 17:21:33 -0700
- To: w3c-sgml-wg@w3.org
Paul Prescod <papresco@calum.csclub.uwaterloo.ca> wrote: > > What about a third rule: > > 1. All white space, including RS and RE, immediately following start tags and > immediately preceding end tags is not significant. > > 2. All other RS/REs are collapsed to a single space. > > 3. All quasi-elements containing only whitespace characters are not significa > nt. That should be, erm, "quasi-pseudoelement"... or maybe not :-) You may be onto something here. How about the following as a heuristic to distinguish element content from mixed content: 3. If the only data appearing between two tags is a sequence of lexical SEPCHARs (including RS and RE), then it is deemed insignificant. where "lexical" means SEPCHARs that appear as SEPCHARs in the input (as opposed to e.g., <P>&#RE;&space;&space;&#RE;</P>), and "data" is as per ISO 8879. This heuristic will incorrectly strip out any "true" pseudoelements that contain nothing but lexical whitespace -- these would have to be escaped or entered as references as you point out -- but I think it will do the right thing in all other cases. I forget... what was the rationale behind rules (1) and (2)? (I know it's a common application convention, but what was the reason for making it mandatory for all XML document types?) --Joe English jenglish@crl.com
Received on Monday, 30 September 1996 20:21:20 UTC