- From: Paul Grosso <paul@arbortext.com>
- Date: Thu, 26 Sep 96 18:02:31 CDT
- To: w3c-sgml-wg@w3.org
> From: "W. Eliot Kimber" <kimber@passage.com> > > The rules we came up with are: > > An XML parser shall interpret white space and record ends in XML documents > as follows: > > 1. All white space, including RS and RE, immediately following start tags > and immediately preceding end tags is not significant. > > 2. All other RS/REs are collapsed to a single space. Do you mean this as stated, or do you mean all sequences of white space consisting of RSs, REs, and spaces are collapsed to a single space? > > This approach has the effect that the white space and RS/RE collapsing can > be done before or after SGML RE rules are applied without affecting the > result. The only place this is not true is record ends followed by one or > more PIs followed by data. In SGML, the RE will be considered to have > occurred *after* the PIs, whereas in XML it will be considered to have > occurred *before* the PIs (there are many who consider this behavior of > SGML to be a bug that should be fixed, or at least made optional, in the > SGML revision). I'm not so worried about the RE's that migrate around PIs at the moment. What I'm hoping for is that--discounting the case where the relative order of the PI and the RE affects the resulting display--when the user likes what they see presented by an "XML browser" and they then bring up an "SGML browser/viewer" on the document, they'll see the same thing. I don't know how much to weigh cases such as: <p>He was <em>over- </em>sensitive.</p> which would come out in XML as: He was over-sensitive. and in SGML as: He was over- sensitive. At least if the XML were "XML-normalized" to strip ignored spaces, then the result would be handled the same by both XML and SGML parsers.
Received on Thursday, 26 September 1996 19:24:39 UTC