- From: Arjun Ray <aray@nmds.com>
- Date: Sat, 14 Sep 1996 00:33:53 -0400
- To: w3c-sgml-wg@w3.org
At 12:24 PM 9/13/96 -0400, David Durand wrote: >I think that the rules for RS/RE processing in SGML are confusing. I prefer Dan Connolly's "phase-of-the-moon" characterisation, myself:-) >I'd like to make sure that they are _never_ invoked. Aside the (religious) issue of 8879-compliance, can anybody explain why they are necessary? For instance, exactly what are RS and RE in a byte stream over TCP? Sorry, RS/RE has "fuddy-duddy" written all over it (can you say "punch card is all I grok"?) The basic problem has always been to allow relatively "free-form" markup, mainly for neatness and readability. If RS/RE had really been an issue (as opposed to grandfathering ante-diluvian kludgery) the solution was to allow only *one tag per record*, at the beginning (modulo leading WS?): this would have given relevant if not functional meaning to the term "record". But all we have now is a bunch of rules just getting in the way. May I propose a focus on relevant concepts rather than rules. In the context of essentially free-form text and markup, exactly what does a record-end/end-of-line/whatchamacallit mean? 1. Is it part of the instance text? 2. Is it (processable) markup? 3. Is it an artifact of the storage strategy the environment was too brain-dead not to have encapsulated? In some ways, #3 is a special case of (in the sense of imposition on) #2. And #1 is clearly unworkable. So, if we want simple and strong rules (the kind that establish a two-way relation between ease of programming and ease of understanding) the rational approach IMHO is to treat these animals as markup always. When needed as instance text, an inline escape mechanism should suffice (how about '\' as MSSCHAR?). The problem is reduced to one of lexical tokenization, which is where I believe it always belonged. Regards, Arjun My thought >for SGML is that we could assign them codes (in the delcaration) that the >entity manager would _never_ see or produce. For instance an 8-bit >entity manager would assing RS to code 256 and RE to code 257. The >codes for CR and LF would be declared to be the same class as TAB and >SPACE. For SGML this fits the letter of the standard, but parsers will >currently choke, or assume that a wider character set is in use. > > For XML we can simplify things by just treating LF and CR (well, >actually their 10646 equivalents) as whitespace. We can simply skip >the entire notion of non-significant and significant RS/RE entirely. > > -- David > > >
Received on Saturday, 14 September 1996 00:32:34 UTC