Re: Element content the real issue?... from lee@sq.com on 1996-10-01 (w3c-sgml-wg@w3.org from September 1996)

From: <lee@sq.com>
Date: Mon, 30 Sep 96 23:12:34 EDT
To: jenglish@crl.com, w3c-sgml-wg@w3.org
Message-Id: <9610010312.AA14806@sqrex.sq.com>

Joe English <jenglish@crl.com>

> [...] How about the following
> as a heuristic to distinguish element content from mixed content:
> 
>     3. If the only data appearing between two tags is a sequence of
>        lexical SEPCHARs (including RS and RE), then it is deemed
>        insignificant.

<P><emph>That</emph> <strong>doesn't</strong> work.</P>
                    ^-- you lose this space.

If you want to inspect the entire element to see if it contains
anything except spaces and sub-elements, you're in for a lot of
lookahead (consider <HTML> in a well-formed RFC 1822 document!).

And in any case, just because my paragraph only contains individual
emphasised words does not mean that the spaces (or record ends)
are insignificant.

<P><emph>That</emph>
<strong>doesn't</strong>
work.</P>

should be the same, right?

I don't think any white-space should be discarded by the parser.

Lee

Received on Monday, 30 September 1996 23:13:38 UTC