- From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Date: Tue, 1 Oct 1996 19:37:14 -0400 (EDT)
- To: w3c-sgml-wg@w3.org
Well, everyone is probably tired of RS/RE proposals. Let me highlight the costs and benefits of mine (presented below), so you can decide whether to read on: * handles RS/RE in tables and other elements that would usually be considered "element content" in the usual way (RS/RE is not significant) * handles other whitespace in a predictable way (always significant) * handles verbatim elements * only three simple rules (and one implied simple rule) * believed to be compatible with the all SGML parsers and the great majority of SGML tool output. * not too weird looking to SGML and HTML users * most documents will require NO extra delimiters, nor will they have to remove insignificant RS/REs inserted by traditional SGML tools (other insignificant whitespace that is inserted will have to be removed) * only a few habits/macros have to change Here are my proposed rules: #1. All REs are insignificant unless they occur within verbatim sections or between non-whitespace data characters (i.e. between words). #2. Verbatim sections begin with <" and end with "> and may contain only data characters. REs within verbatim sections are significant. #3. REs between non-whitespace data characters (i.e. between words) are collapsed to a single space. (implied rule #4.) All other (non RS/RE) whitespace (outside of markup) is significant. Examples: I believe that under these rules that these fragments will behave as a typical author would intend them to. RS/RE ignored by rule 1 ======================= <TABLE> <TR> <TD>abcde</TD><TD>fghijk</TD><TD>...</TD> </TR> </TABLE> <P> Isn't the sky blue? </P> RS/RE becomes space by rule 2 ============================= <P>This is a long sentence and my text editor is going to put a newline in for word wrap. Good thing XML knows what I mean!</P> Verbatim Content: ================ <PRE> <" Column 1 Column 2 Column 3 Column 4 ======= ======== ======== ======== 12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 "> Significant Whitespace: ======================= <P>This is a <EM> paragraph </EM> <STRONG> with </STRONG> a bunch of space in it.</P> All spaces are retained. Significant Whitespace workarounds: ================================== <TABLE> <TR><TD>Workaround Number</TD><TD>Workaround Description</TD> <TR><TD>1</TD><TD >Put space after GI for indentation.</TD></TR> <TR><TD>2</TD ><TD>Space after previous element's GI.</TD></TR> <TR><TD>3</TD><!-- --><TD>Use traditional comments </TD></TR> <TR><TD>4</TD>~ ~<TD>Use some form of SGMLDECL comment.</TD></TR> This is the only major downside... Rationale: =========== This proposal gets rid of the SGML RS/RE compatibility problem by making all RS/REs insignificant except those explicitly asked for (as per Charles' proposal). It does not, however, require ALL mixed content to be delimited. Only the mixed content that requires the significant RE's must be delimited. In other words, since insignificant RE's are the norm in SGML document, RE's will be insignificant by default. Other whitespace cannot be handled so easily, because whitespace is used in significant contexts so often. I've taken the opposite stand to Charles on this issue. Since meaningful whitespace is the NORM, make meaningful whitespace the default. There are various tricks to put in insignificant whitespace. We can play SGML declaration games to make them easier if we want. Verbatim text is allowed through the verbatim delimiter as per Charles' proposal. Since it would be used less frequently than in Charles' proposal, a different shortref would be preferred. Paul Prescod
Received on Tuesday, 1 October 1996 20:49:15 UTC