New RS/RE Proposal

Well, everyone is probably tired of RS/RE proposals. Let me highlight the 
costs and benefits of mine (presented below), so you can decide whether to 
read on:

 * handles RS/RE in tables and other elements that would usually be considered 
"element content" in the usual way (RS/RE is not significant)

 * handles other whitespace in a predictable way (always significant)

 * handles verbatim elements
 
 * only three simple rules (and one implied simple rule)

 * believed to be compatible with the all SGML parsers and the great majority 
   of SGML tool output.

 * not too weird looking to SGML and HTML users

 * most documents will require NO extra delimiters, nor will they have to 
   remove insignificant RS/REs inserted by traditional SGML tools (other 
   insignificant whitespace that is inserted will have to be removed)

 * only a few habits/macros have to change

Here are my proposed rules:

#1. All REs are insignificant unless they occur within verbatim
sections or between non-whitespace data characters 
(i.e. between words).

#2. Verbatim sections begin with <" and end with "> and may contain only
data characters. REs within verbatim sections are significant.

#3. REs between non-whitespace data characters (i.e. between words) are 
collapsed to a single space.

(implied rule #4.) All other (non RS/RE) whitespace (outside of markup) is 
significant. 

Examples:

I believe that under these rules that these fragments will behave as 
a typical author would intend them to.

RS/RE ignored by rule 1
=======================

<TABLE>
<TR>
<TD>abcde</TD><TD>fghijk</TD><TD>...</TD>
</TR>
</TABLE>

<P>
Isn't the sky blue?
</P>

RS/RE becomes space by rule 2
=============================

<P>This is a long sentence and my text editor is going to put a newline in for
word wrap. Good thing XML knows what I mean!</P>

Verbatim Content:
================

<PRE>
<"
Column 1       Column 2       Column 3      Column 4
=======        ========       ========      ========
12345          12345          12345         12345            
12345          12345          12345         12345            
12345          12345          12345         12345            
">

Significant Whitespace:
=======================
<P>This    is     a    <EM> paragraph </EM>   <STRONG> with </STRONG>    a    
bunch   of    space     in      it.</P>

All spaces are retained.

Significant Whitespace workarounds:
==================================
<TABLE>
<TR><TD>Workaround Number</TD><TD>Workaround Description</TD>
<TR><TD>1</TD><TD                >Put space after GI for indentation.</TD></TR>
<TR><TD>2</TD                ><TD>Space after previous element's GI.</TD></TR>
<TR><TD>3</TD><!--         --><TD>Use traditional comments          </TD></TR>
<TR><TD>4</TD>~              ~<TD>Use some form of SGMLDECL comment.</TD></TR>

This is the only major downside...

Rationale:
===========

This proposal gets rid of the SGML RS/RE compatibility problem by making all
RS/REs insignificant except those explicitly asked for (as per Charles'
proposal). It does not, however, require ALL mixed content to be delimited.
Only the mixed content that requires the significant RE's must be delimited.
In other words, since insignificant RE's are the norm in SGML document, 
RE's will be insignificant by default.

Other whitespace cannot be handled so easily, because whitespace is used
in significant contexts so often. I've taken the opposite stand to Charles on
this issue. Since meaningful whitespace is the NORM, make meaningful whitespace
the default. There are various tricks to put in insignificant whitespace. 
We can play SGML declaration games to make them easier if we want.

Verbatim text is allowed through the verbatim delimiter as per Charles'
proposal. Since it would be used less frequently than in Charles' proposal,
a different shortref would be preferred.

 Paul Prescod

Received on Tuesday, 1 October 1996 20:49:15 UTC