- From: <lee@sq.com>
- Date: Fri, 27 Sep 96 12:52:45 EDT
- To: dgd@cs.bu.edu, w3c-sgml-wg@w3.org
> This is just one example, but the uniformity of file handling in Unix > suggests that '\n' is an application convention, and not a system feature > (so we are even within our rights to ignore it). Yes. It was an early goal of Unix (it's in one of the Bell Tech. Journal Special Issue papers, I think) to avoid record-based files altogether. I think a lot of the RS/RE trouble comes from confusion between the representation and that which is represented. Some people (many people) want to be able to say this is a monospaced example and the columns line up. and have spaces, tabs and newlines be significant in the markup in order to represent that in a ``WYSIWYG'' way. Then people with no background in design can try and lay out their source code or DTDs and align unrelated things at the expense of clarity :-), and neither the receiving application nor the screen or paper layout designer can correct the errors without extensive hand work, it's awful. Rants aside :-), this is an insidious form of minimisation. Insidious because it does not in fact have a fully normalised form! It is one thing to use SHORTREF or DATATAG to map newline to <BR> or &linebreak; or something. It is quite another to use spaces tabs and newlines to draw ASCII art. If people want to do this, they should use a NOTATION. Perhaps we need a kind of Unix vgrind that takes an SGML instance or DTD fragment and pretty-prints it into an XML representation. If the same were done for C (I have a C --> SMLG (sic) program I wrote that lets you view C in Panorama, for example) and plain text, how important would this issue still be? You can't include C or C++ in SGML without pre-processing anyway, because of int *ip = &i; being not only legal but common, unless you change ERO to be @ or something else saner than &. Same with perl. Unix shell scripts are OK with & because it's not usually followed by a name, although 1>&2; occurs fairly often in both shell and awk. But you do get prog <input>output, which isn't OK in #PCDATA, and even prog </tmp/xxx which can't go in CDATA. You can edit the files to put spaces in or quote the < and & < signs, but that's pre-processing. So I suggest that in the Twenty Page XML Book there by a note about how to include program listings, and exactly which characters need to be escaped, and how to escape them, and have done with it. And if multiple spaces and newlines need to be escaped, as I expect, say how. Lee
Received on Friday, 27 September 1996 12:53:01 UTC