Re: Current Status of Discussion on RE/RS Handling
[Eliot Kimber for the ERB]
> An XML parser shall interpret white space and record ends in XML
Really? If an XML application is implemented strictly, as an SGML
application, then there's separation between the parser and the
application, and this step should happen in the application. If the
XML application is not implemented in terms of SGML, then this still
seems to me to be an application convention. It's only a semantic
point, really, but I think it makes a difference:
> This approach also requires that truly significant record ends in
> data must be escaped in some way.
If the whitespace interpretation rules are phrased as an application
convention, and not a change in parsing rules, it buys this:
o An SGML parser can be an XML parser without any changes.
o A DSSSL-based XML application can do the whitespace rules as a grove
operation, after an SGML parse.
o Record ends need not be escaped. The stylesheet can affect
whitespace normalization behavior; if the element is to be styled
"verbatim" or "preformatted", then whitespace is not normalized.
o Even an SGML application can parse XML with no (or an ANY) DTD; all
elements are mixed content, but the extra whitespace between
elements is irrelevant, since it'll be normalized away by the
application. True, this won't be ESIS-identical to a parse with the
DTD, but after application-level normalization, the "true content"
will come out to the same thing.
The rules adopted by the ERB are good ones, but I think they should be
stated as application conventions, not parsing rules, for the benefit
of those implementing XML as an SGML application.
<!NOTATION SGML.Geek PUBLIC "-//GCA//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//EBT//NONSGML Christopher R. Maden//EN" SYSTEM
"<URL>http://www.ebt.com <TEL>+1.401.421.9550 <FAX>+1.401.521.2030
<USMAIL>One Richmond Square, Providence, RI 02906 USA" NDATA SGML.Geek>