Re: Element content the real issue?...
Paul Prescod <firstname.lastname@example.org> wrote:
> What about a third rule:
> 1. All white space, including RS and RE, immediately following start tags and
> immediately preceding end tags is not significant.
> 2. All other RS/REs are collapsed to a single space.
> 3. All quasi-elements containing only whitespace characters are not significa
That should be, erm, "quasi-pseudoelement"... or maybe not :-)
You may be onto something here. How about the following
as a heuristic to distinguish element content from mixed content:
3. If the only data appearing between two tags is a sequence of
lexical SEPCHARs (including RS and RE), then it is deemed
where "lexical" means SEPCHARs that appear as SEPCHARs
in the input (as opposed to e.g., <P>&#RE;&space;&space;&#RE;</P>),
and "data" is as per ISO 8879.
This heuristic will incorrectly strip out any "true" pseudoelements
that contain nothing but lexical whitespace -- these would have to be
escaped or entered as references as you point out -- but I think it
will do the right thing in all other cases.
I forget... what was the rationale behind rules (1) and (2)?
(I know it's a common application convention, but what was the
reason for making it mandatory for all XML document types?)