More on RE/RS

Several people have proposed defining the RE/RS problem out of
existence by defining the RE and RS function charcters as codes that
won't occur in entities, via the SGML application for XML.

James Clark takes the stand that the definitions of RE and RS are what
codes the parser should communicate to the application when they are
encountered, so redefining them won't change whether or not they
occur.  He also posits that lines in an input file from DOS or UNIX
should be interpreted as records.

That certainly seems a reasonable interpretation, but I can't find
anything to that effect in 8879.  Clause 7.6.1, "Record Boundaries",
defines the rules for RE ignorance or preservation, but doesn't say
anything about when the parser generates an RS or RE signal.  Charles
Goldfarb's commentary thereto (pp. 321+322 of the SGML Handbook)
discuss translating lines into records, but that's not normative.

The best normative thing I can find is 4.140, "function character
identification parameter: A parameter of an SGML declaration that
identifies the characters assigned to the RE, RS, and SPACE functions,
and allows additional functions to be defined."

This suggests that, since characters are assigned to functions, that
the characters in the document should assume the roles of these
functions; ergo, if non-ocurring characters are the ones assigned to
those roles, the function characters never occur.  Is that not the
intended meaning?  If not, what is?

I think that, if the RE/RS problem can be redefined out of existence,
that it can be very easily handled at the application level.  Some
have suggested this already; I outlined a proposal in conversation
with Gavin Nicol, and he seemed to think it worthwhile.  I'll send
that in another message if others agree that it is possible for an
application of ISO 8879:1986 (not :2001) to define every entity to
have a single record.

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//GCA//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//EBT//NONSGML Christopher R. Maden//EN" SYSTEM
"<URL>http://www.ebt.com <TEL>+1.401.421.9550 <FAX>+1.401.521.2030
<USMAIL>One Richmond Square, Providence, RI 02906 USA" NDATA SGML.Geek>

Received on Wednesday, 25 September 1996 17:06:53 UTC