- From: W. Eliot Kimber <kimber@passage.com>
- Date: Tue, 24 Sep 1996 08:35:27 -0900
- To: w3c-sgml-wg@w3.org
I'm trying to make some sense of the RE/RS issue and not having much luck. What follows is an attempt to state as tersely and precisely as possible what the issue are and what the alternatives are. As James put it so well, the problem is mixed content: if you allow mixed content, then you must provide some mechanism for distinquishing data record ends from non-data record ends. If you don't allow mixed content, then there is no problem, but then you have the problem of explicitly delimiting character data content, which currently neither SGML nor HTML require. Note too that it's not just an RE/RS problem, but an SSEP problem. The proposals, as I understand them, are: A. Disallow mixed content. Furthermore, disallow PIs and markup declarations in character data (with the possible exception of CDATA and RCDATA marked sections). This requires an element type whose only semantic is to contain character data. Short references can be used to enable using a single character to quote character data. This solves the problem by making it clear when record ends are in character data context. Record end handling rules are not changed in any way. Assumes that inclusions are not allowed (at least within the character data containing element), thus avoiding the "record ends following included subelements are not taken as data" rule. B. Treat XML documents as a single record by mapping RS and RE to character codes that cannot occur in documents. There are *no* record ends. This has the disadvantage that some other mechanism must be used to indicate data record ends, one that must be understood and processed by rendering systems, thus raising the likely possibility that different tools will provide different results for the same input data. It also has the problem that many SGML tools do not support this kind of remapping, making it difficult or impossible to process XML documents as SGML. It would also require transformation of SGML documents before they could be processed accurately as XML documents. This also doesn't solve the SSEP problem generally. C. Treat all record ends as data. This requires that authors must do things like put record ends before tag closes in order to format their markup on multiple lines. It also means that SGML documents can't be made into XML documents simply by quoting character data but must move all SSEP inside of markup. Talk about making 5-line Perl hacking harder. Note that if we want to allow DTD-less parsing, we can't use the SGML rules as-is and keep mixed content because without the DTD you have no way to knowing when you're in element content and when you're in mixed content (for a dramatic example of this problem, create an SGML document with lots of SSEP in element context then view it with Panorama with and without the DTD). This also means that there's no way to define a "simpler" set of RE/RS rules and keep mixed content, because you'll have the same problem. My conclusion is that eliminating mixed content by quoting character data is the simplest solution overall and retains the most compatibility with SGML as is. While quoting may seem unnatural to those of us who grew up typing SGML markup (it was to me when I put together some examples), I don't think it will be hard for newcomers to learn and it should be easy for SGML editors to add the quotes as an export option. Cheers, E. -- W. Eliot Kimber (kimber@passage.com) Senior SGML Consultant and HyTime Specialist Passage Systems, Inc., (512)339-1400 10596 N. Tantau Ave., Cupertino, CA 95014-3535 (408) 366-0300, (408) 366-0320 (fax) 2608 Pinewood Terrace, Austin, TX 78757 (512) 339-1400 (fone/fax) http://www.passage.com (work) http://www.drmacro.com (home) "If I never had existed, would you still remember me?..." --Austin Lounge Lizards, "1984 Blues"
Received on Tuesday, 24 September 1996 10:36:10 UTC