Re: Element content the real issue?...
At 11:57 AM 9/30/96 -0700, Joe English wrote:
>> According to the proposal:
>What about (where '@' denotes an RE and '.' denotes a space):
>The RE and space characters preceding the CCC element
>are not deemed insignificant by rule 1, whereas an
>SGML parser would treat them as such if AAA had
Good point! I was reading the first rule wrong.
What about a third rule:
1. All white space, including RS and RE, immediately following start tags and
immediately preceding end tags is not significant.
2. All other RS/REs are collapsed to a single space.
3. All quasi-elements containing only whitespace characters are not significant.
I know that the wording is very shaky, but the point is to get rid of the
space in this:
<P>Some <EM>emph text</EM> </P>
but not in this:
<P>Some <EM>emph</EM> text</P>
The latter has a "quasi-element" containing only whitespace characters. Just
as with the first two rules, you can either apply it a) directly in an XML
parser, b) in a preprocessor for SGML (a quick Perl hack) or c) after
receiving the results of an SGML parser. In a sense, I've reintroduced the
concept of mixed content, but my new mixed content means: "content where
whitespace characters are mixed with other data characters". You don't have
to look at the DTD to figure that out.
If you wanted a quasi-element containing only whitespace characters, you
could use the escaping mechanisms, or the verbatim delimiter that I
suggested a few messages back.