Paul Prescod <papresco@calum.csclub.uwaterloo.ca> wrote: > > What about a third rule: > > 1. All white space, including RS and RE, immediately following start tags and > immediately preceding end tags is not significant. > > 2. All other RS/REs are collapsed to a single space. > > 3. All quasi-elements containing only whitespace characters are not significa > nt. That should be, erm, "quasi-pseudoelement"... or maybe not :-) You may be onto something here. How about the following as a heuristic to distinguish element content from mixed content: 3. If the only data appearing between two tags is a sequence of lexical SEPCHARs (including RS and RE), then it is deemed insignificant. where "lexical" means SEPCHARs that appear as SEPCHARs in the input (as opposed to e.g., <P>&#RE;&space;&space;&#RE;</P>), and "data" is as per ISO 8879. This heuristic will incorrectly strip out any "true" pseudoelements that contain nothing but lexical whitespace -- these would have to be escaped or entered as references as you point out -- but I think it will do the right thing in all other cases. I forget... what was the rationale behind rules (1) and (2)? (I know it's a common application convention, but what was the reason for making it mandatory for all XML document types?) --Joe English jenglish@crl.comReceived on Monday, 30 September 1996 20:21:20 EDT
This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:24 EDT