Re: RS/RE

At 05:41 PM 09/12/96 GMT, Gavin Nicol wrote:

>Well, from my reading of the SGML Handbook, it seems that RE and RS
>are not *required* at all. If they occur, they are put there by the
>entity manager. In fact, RE and RS are not really even characters per
>se, they are kind of psuedo-characters (they have a code, and a name,
>but they aren't real characters).

Right. For example, consider a file from VM (IBM mainframe OS): VM does not
represent record boundaries by characters: instead, it either makes all
lines the same length (RECFM F to us ancients), or puts a length prefix in
front of each record (RECFM V). So RS and RE clearly cannot be characters in
the source document; if they show up at all, it is because the entity
manager inserted them. Same is true on other systems, even though RS looks a
lot like linefeed, and RS looks a lot like carriage return, they're not the
same thing. 

So all you need do is tell your entity manager that all your files are
recordless, and that CR and LF are merely whitespace characters (spacechar)
on par with TAB. To avoid numeric conflict, you'd want to move either them
or RS and RE somewhere else.

Having done this, you're safe. It would be most helpful for the SGML
revision to add the ability to turn OFF any delimiter of function character
(presumably by setting it to #OFF, or the null string, or some such). Then
we wouldn't have to go find two unoccupied character codes to assign RS and
RE to.

Such a change would also have the benefit that any other delimiters XML
doesn't use (DSO? PIO? whatever) could be turned off. Why is that important?
Let's say we ended up not having PIs (just an example!!!): Then our grammar
would have no mention of the string "<?", so I suppose it would be treated
as literal data if it showed up in an XML file. Oooooooops, so much for SGML
being able to parse XML data. But if we could turn off the PIO delimiter,
we'd be home free. Same argument for various other delimiters we might moot.

Steve

Received on Thursday, 12 September 1996 18:08:29 UTC