RE/RS Options: Trying to Focus
I'm trying to make some sense of the RE/RS issue and not having much luck.
What follows is an attempt to state as tersely and precisely as possible
what the issue are and what the alternatives are.
As James put it so well, the problem is mixed content: if you allow mixed
content, then you must provide some mechanism for distinquishing data
record ends from non-data record ends. If you don't allow mixed content,
then there is no problem, but then you have the problem of explicitly
delimiting character data content, which currently neither SGML nor HTML
require. Note too that it's not just an RE/RS problem, but an SSEP problem.
The proposals, as I understand them, are:
A. Disallow mixed content. Furthermore, disallow PIs and markup
character data (with the possible exception of CDATA and RCDATA marked
sections). This requires an element type whose only semantic
is to contain character data. Short references can be used to enable
using a single character to quote character data.
This solves the problem by making it clear when record ends are in
character data context. Record end handling rules are not changed
in any way.
Assumes that inclusions are not allowed (at least within the character
data containing element), thus avoiding the "record ends following
included subelements are not taken as data" rule.
B. Treat XML documents as a single record by mapping RS and RE to character
codes that cannot occur in documents. There are *no* record ends. This
has the disadvantage that some other mechanism must be used to indicate
data record ends, one that must be understood and processed by
rendering systems, thus raising the likely possibility that different
tools will provide different results for the same input data. It also
has the problem that many SGML tools do not support this kind of
remapping, making it difficult or impossible to process XML documents
as SGML. It would also require transformation of SGML documents
before they could be processed accurately as XML documents.
This also doesn't solve the SSEP problem generally.
C. Treat all record ends as data. This requires that authors must do things
like put record ends before tag closes in order to format their
markup on multiple lines. It also means that SGML documents can't be
made into XML documents simply by quoting character data but must move all
SSEP inside of markup. Talk about making 5-line Perl hacking harder.
Note that if we want to allow DTD-less parsing, we can't use the SGML rules
as-is and keep mixed content because without the DTD you have no way to
knowing when you're in element content and when you're in mixed content
(for a dramatic example of this problem, create an SGML document with lots
of SSEP in element context then view it with Panorama with and without the
This also means that there's no way to define a "simpler" set of RE/RS
rules and keep mixed content, because you'll have the same problem.
My conclusion is that eliminating mixed content by quoting character data
is the simplest solution overall and retains the most compatibility with
SGML as is. While quoting may seem unnatural to those of us who grew up
typing SGML markup (it was to me when I put together some examples), I
don't think it will be hard for newcomers to learn and it should be easy
for SGML editors to add the quotes as an export option.
W. Eliot Kimber (email@example.com)
Senior SGML Consultant and HyTime Specialist
Passage Systems, Inc., (512)339-1400
10596 N. Tantau Ave., Cupertino, CA 95014-3535 (408) 366-0300, (408)
2608 Pinewood Terrace, Austin, TX 78757 (512) 339-1400 (fone/fax)
http://www.passage.com (work) http://www.drmacro.com (home)
"If I never had existed, would you still remember me?..."
--Austin Lounge Lizards, "1984 Blues"