Re: Current Status of Discussion on RE/RS Handling
> From: "W. Eliot Kimber" <email@example.com>
> The rules we came up with are:
> An XML parser shall interpret white space and record ends in XML documents
> as follows:
> 1. All white space, including RS and RE, immediately following start tags
> and immediately preceding end tags is not significant.
> 2. All other RS/REs are collapsed to a single space.
Do you mean this as stated, or do you mean all sequences of white space
consisting of RSs, REs, and spaces are collapsed to a single space?
> This approach has the effect that the white space and RS/RE collapsing can
> be done before or after SGML RE rules are applied without affecting the
> result. The only place this is not true is record ends followed by one or
> more PIs followed by data. In SGML, the RE will be considered to have
> occurred *after* the PIs, whereas in XML it will be considered to have
> occurred *before* the PIs (there are many who consider this behavior of
> SGML to be a bug that should be fixed, or at least made optional, in the
> SGML revision).
I'm not so worried about the RE's that migrate around PIs at the moment.
What I'm hoping for is that--discounting the case where the relative
order of the PI and the RE affects the resulting display--when the user
likes what they see presented by an "XML browser" and they then bring
up an "SGML browser/viewer" on the document, they'll see the same thing.
I don't know how much to weigh cases such as:
<p>He was <em>over- </em>sensitive.</p>
which would come out in XML as:
He was over-sensitive.
and in SGML as:
He was over- sensitive.
At least if the XML were "XML-normalized" to strip ignored spaces, then
the result would be handled the same by both XML and SGML parsers.