Re: RS/RE considered confusing?

>The Durand/Nicol proposal on RS/RE is, if I've understood it
>correctly, that when an XML document is parsed by an SGML parser, then
>it will use an SGML declaration that includes something like:
>  RE 65531  -- not a legal ISO 10646 character --
>  RS 65532  -- not a legal ISO 10646 character --
>  SPACE 32       


>1. Most SGML parsers won't handle such an SGML declaration.  The
>practical benefit for XML of SGML compatibility is to enable SGML
>tools to be used on XML documents.  If XML uses features of SGML that
>aren't implemented in most SGML parsers, this practical benefit is

Does SP allow this?

>2. It's not clear to me that this in general will work with a
>conforming SGML system.  The entity manager is supposed to transform
>whatever mechanism the OS uses for representing lines into RS/RE. 

I have never heard that "lines" are equivalent to "records". Many OS's
do not have "records" at all. 

>4. It is a fact of life that OSs delimit lines using different
>methods. One of the ideas underlying the RS/RE concept in SGML is
>that these different methods should canonicalized into a single form,
>so that applications are isolated from these differences.  

Again, who says that lines and records are equivalent?

>5. In mixed content there are some newlines that must get ignored by
>some part of the system (whether the parser or the application).  For
>example, if I have
>This is a paragraph
>of text.
>I don't want the newline after the <p> to result in a space at the
>beginning of the paragraph.  

What do you do if you *do* want a linefeed to occur there? It's just
as easy to tell people "if you don't want a space, you must start up
hard against the tag":

  <P>This is a paragraph.

>6. Consider a paragraph like this:
>The SGML rules specify that the newline folling the comment is

Does SGML actually say *newline* or record end?

>7. Moving the responsibility of ignoring newlines to a part of the
>system other than the XML parser is going to mean that XML documents
>and SGML documents will need different processing.  For example, when
>I write a style sheet I would have to know whether I've got an SGML
>document or an XML document so that I can specify that newlines are
>ignored in the XML document in the way I want.

I do not think this is true if you have a *conformant* SGML system
that can accept and correctly use a declaration similar to XML.
This is probably the fundamental problem: many SGML systems are not
conformant in this respect.

Follow-Ups: References: