[Prev][Next][Index][Thread]

Re: Current Status of Discussion on RE/RS Handling



> From: "W. Eliot Kimber" <kimber@passage.com>

> 
> The rules we came up with are:
> 
> An XML parser shall interpret white space and record ends in XML documents
> as follows:
> 
> 1. All white space, including RS and RE, immediately following start tags
>    and immediately preceding end tags is not significant.
> 
> 2. All other RS/REs are collapsed to a single space.

Do you mean this as stated, or do you mean all sequences of white space
consisting of RSs, REs, and spaces are collapsed to a single space?

> 
> This approach has the effect that the white space and RS/RE collapsing can
> be done before or after SGML RE rules are applied without affecting the
> result.  The only place this is not true is record ends followed by one or
> more PIs followed by data. In SGML, the RE will be considered to have
> occurred *after* the PIs, whereas in XML it will be considered to have
> occurred *before* the PIs (there are many who consider this behavior of
> SGML to be a bug that should be fixed, or at least made optional, in the
> SGML revision).

I'm not so worried about the RE's that migrate around PIs at the moment.

What I'm hoping for is that--discounting the case where the relative
order of the PI and the RE affects the resulting display--when the user
likes what they see presented by an "XML browser" and they then bring
up an "SGML browser/viewer" on the document, they'll see the same thing.

I don't know how much to weigh cases such as:

	<p>He was <em>over- </em>sensitive.</p>
which would come out in XML as:
	He was over-sensitive.
and in SGML as: 
	He was over- sensitive.

At least if the XML were "XML-normalized" to strip ignored spaces, then
the result would be handled the same by both XML and SGML parsers.