>>David Durand and I independently came up with a good way of dealing
>>with such things
>I've heard about this but never seen it. Could you or David or
>someone please post it to the group? In our informal discussions
>before the advent of the WG, figuring out what to do about RS/RE, without
>busting our 8879 compliance, was one of the most worrying things.
Well, from my reading of the SGML Handbook, it seems that RE and RS
are not *required* at all. If they occur, they are put there by the
entity manager. In fact, RE and RS are not really even characters per
se, they are kind of psuedo-characters (they have a code, and a name,
but they aren't real characters).
Anyway, if we assigned some character codes to them that are
guaranteed to never occur in input, then the parser will never even
see them (another syntax trick). This will, of course, mean that \n
and \r will be seen in content, but they could be mapped such that
they get converted to a space on input.
I don't claim to be intimately familiar with all the effects that this
will have in terms of markup regognition etc. but it seems to me that
this would simplify parsers a great deal, and also get around problems
with MIME text type requirements (canonical form) etc.
- From: Tim Bray <firstname.lastname@example.org>