Current Status of Discussion on RE/RS Handling

The SGML-ERB met yesterday by phone and discussed the RS/RE problem at some
length. While we haven't necessarily arrived at concensus, our discussion
did result in the following proposal that appears to satisfy the
requirement of SGML compatibility while being simple and more or less
consistent with existing HTML practice.

The rules we came up with are:

An XML parser shall interpret white space and record ends in XML documents
as follows:

1. All white space, including RS and RE, immediately following start tags and
   immediately preceding end tags is not significant.

2. All other RS/REs are collapsed to a single space.

This approach has the effect that the white space and RS/RE collapsing can
be done before or after SGML RE rules are applied without affecting the
result.  The only place this is not true is record ends followed by one or
more PIs followed by data. In SGML, the RE will be considered to have
occurred *after* the PIs, whereas in XML it will be considered to have
occurred *before* the PIs (there are many who consider this behavior of
SGML to be a bug that should be fixed, or at least made optional, in the
SGML revision).

This approach also requires that truly significant record ends in data must
be escaped in some way.

Cheers,

Eliot

--
W. Eliot Kimber (kimber@passage.com) 
Senior SGML Consultant and HyTime Specialist
Passage Systems, Inc., (512)339-1400
10596 N. Tantau Ave., Cupertino, CA 95014-3535 (408) 366-0300, (408)
366-0320 (fax)
2608 Pinewood Terrace, Austin, TX 78757 (512) 339-1400 (fone/fax)
http://www.passage.com (work) http://www.drmacro.com (home)
"If I never had existed, would you still remember me?..."
                                   --Austin Lounge Lizards, "1984 Blues"

Received on Thursday, 26 September 1996 13:42:32 UTC