[Prev][Next][Index][Thread]

RS/RE considered confusing?



   I've been trying to catch up, but at least on the RS/RE issue I think I
have my ideas in order. I think that the rules for RS/RE processing in SGML
are confusing. I'd like to make sure that they are _never_ invoked. My thought
for SGML is that we could assign them codes (in the delcaration) that the
entity manager would _never_ see or produce. For instance an 8-bit
entity manager would assing RS to code 256 and RE to code 257. The
codes for CR and LF would be declared to be the same class as TAB and
SPACE. For SGML this fits the letter of the standard, but parsers will
currently choke, or assume that a wider character set is in use.

   For XML we can simplify things by just treating LF and CR (well,
actually their 10646 equivalents) as whitespace. We can simply skip
the entire notion of non-significant and significant RS/RE entirely.

   -- David



Follow-Ups: