- From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Date: Mon, 30 Sep 1996 16:00:51 -0400
- To: w3c-sgml-wg@w3.org
At 11:57 AM 9/30/96 -0700, Joe English wrote: >> According to the proposal: >What about (where '@' denotes an RE and '.' denotes a space): > > <aaa>@ > ...<bbb>XXX</bbb>@ > ...<ccc>XXX</ccc>@ > </aaa>@ > >The RE and space characters preceding the CCC element >are not deemed insignificant by rule 1, whereas an >SGML parser would treat them as such if AAA had >element content. Good point! I was reading the first rule wrong. What about a third rule: 1. All white space, including RS and RE, immediately following start tags and immediately preceding end tags is not significant. 2. All other RS/REs are collapsed to a single space. 3. All quasi-elements containing only whitespace characters are not significant. I know that the wording is very shaky, but the point is to get rid of the space in this: <P>Some <EM>emph text</EM> </P> but not in this: <P>Some <EM>emph</EM> text</P> The latter has a "quasi-element" containing only whitespace characters. Just as with the first two rules, you can either apply it a) directly in an XML parser, b) in a preprocessor for SGML (a quick Perl hack) or c) after receiving the results of an SGML parser. In a sense, I've reintroduced the concept of mixed content, but my new mixed content means: "content where whitespace characters are mixed with other data characters". You don't have to look at the DTD to figure that out. If you wanted a quasi-element containing only whitespace characters, you could use the escaping mechanisms, or the verbatim delimiter that I suggested a few messages back. Paul Prescod
Received on Monday, 30 September 1996 16:06:02 UTC