Re: Element content the real issue?... from Joe English on 1996-10-01 (w3c-sgml-wg@w3.org from September 1996)

From: Joe English <jenglish@crl.com>
Date: Mon, 30 Sep 1996 17:21:33 -0700
To: w3c-sgml-wg@w3.org
Message-Id: <199610010021.AA22165@mail.crl.com>

Paul Prescod <papresco@calum.csclub.uwaterloo.ca> wrote:
>
> What about a third rule:
>
> 1. All white space, including RS and RE, immediately following start tags and
>    immediately preceding end tags is not significant.
>
> 2. All other RS/REs are collapsed to a single space.
>
> 3. All quasi-elements containing only whitespace characters are not significa
> nt.

That should be, erm, "quasi-pseudoelement"...  or maybe not :-)

You may be onto something here.  How about the following
as a heuristic to distinguish element content from mixed content:

    3. If the only data appearing between two tags is a sequence of
       lexical SEPCHARs (including RS and RE), then it is deemed
       insignificant.

where "lexical" means SEPCHARs that appear as SEPCHARs
in the input (as opposed to e.g., <P>&#RE;&space;&space;&#RE;</P>),
and "data" is as per ISO 8879.

This heuristic will incorrectly strip out any "true" pseudoelements
that contain nothing but lexical whitespace -- these would have to be
escaped or entered as references as you point out -- but I think it
will do the right thing in all other cases.

I forget... what was the rationale behind rules (1) and (2)?
(I know it's a common application convention, but what was the
reason for making it mandatory for all XML document types?)

--Joe English

  jenglish@crl.com

Received on Monday, 30 September 1996 20:21:20 UTC