Re: RS/RE from Gavin Nicol on 1996-09-12 (w3c-sgml-wg@w3.org from September 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Thu, 12 Sep 1996 17:41:59 GMT
To: tbray@textuality.com
CC: w3c-sgml-wg@w3.org
Message-Id: <199609121741.RAA14051@wiley.EBT.COM>

>>David Durand and I independently came up with a good way of dealing
>>with such things 
>
>I've heard about this but never seen it.  Could you or David or
>someone please post it to the group?  In our informal discussions
>before the advent of the WG, figuring out what to do about RS/RE, without
>busting our 8879 compliance, was one of the most worrying things.

Well, from my reading of the SGML Handbook, it seems that RE and RS
are not *required* at all. If they occur, they are put there by the
entity manager. In fact, RE and RS are not really even characters per
se, they are kind of psuedo-characters (they have a code, and a name,
but they aren't real characters).

Anyway, if we assigned some character codes to them that are
guaranteed to never occur in input, then the parser will never even
see them (another syntax trick). This will, of course, mean that \n
and \r will be seen in content, but they could be mapped such that
they get converted to a space on input.

I don't claim to be intimately familiar with all the effects that this
will have in terms of markup regognition etc. but it seems to me that
this would simplify parsers a great deal, and also get around problems
with MIME text type requirements (canonical form) etc.

Received on Thursday, 12 September 1996 13:43:21 UTC