[Prev][Next][Index][Thread]

Re: Element content the real issue?...



Joe English <jenglish@crl.com>

> [...] How about the following
> as a heuristic to distinguish element content from mixed content:
> 
>     3. If the only data appearing between two tags is a sequence of
>        lexical SEPCHARs (including RS and RE), then it is deemed
>        insignificant.

<P><emph>That</emph> <strong>doesn't</strong> work.</P>
                    ^-- you lose this space.

If you want to inspect the entire element to see if it contains
anything except spaces and sub-elements, you're in for a lot of
lookahead (consider <HTML> in a well-formed RFC 1822 document!).

And in any case, just because my paragraph only contains individual
emphasised words does not mean that the spaces (or record ends)
are insignificant.

<P><emph>That</emph>
<strong>doesn't</strong>
work.</P>

should be the same, right?

I don't think any white-space should be discarded by the parser.

Lee


Follow-Ups: