[Prev][Next][Index][Thread]

Re: More on RE/RS



>   This is just one example, but the uniformity of file handling in Unix
> suggests that '\n' is an application convention, and not a system feature
> (so we are even within our rights to ignore it).

Yes.  It was an early goal of Unix (it's in one of the Bell Tech. Journal
Special Issue papers, I think) to avoid record-based files altogether.

I think a lot of the RS/RE trouble comes from confusion between the
representation and that which is represented.  Some people (many people)
want to be able to say
    this is
    a monospaced    example
    and the columns line up.

and have spaces, tabs and newlines be significant in the markup in order
to represent that in a ``WYSIWYG'' way.  Then people with no background
in design can try and lay out their source code or DTDs and align unrelated
things at the expense of clarity :-), and neither the receiving application
nor the screen or paper layout designer can correct the errors without
extensive hand work, it's awful.

Rants aside :-), this is an insidious form of minimisation.  Insidious
because it does not in fact have a fully normalised form!
It is one thing to use SHORTREF or DATATAG to map  newline to <BR> or
&linebreak; or something.  It is quite another to use spaces tabs and
newlines to draw ASCII art.  If people want to do this, they should use
a NOTATION.

Perhaps we need a kind of Unix vgrind that takes an SGML instance or
DTD fragment and pretty-prints it into an XML representation.
If the same were done for C (I have a C --> SMLG (sic) program I wrote
that lets you view C in Panorama, for example) and plain text, how important
would this issue still be?

You can't include C or C++ in SGML without pre-processing anyway,
because of int *ip = &i; being not only legal but common, unless you change
ERO to be @ or something else saner than &.  Same with perl.
Unix shell scripts are OK with & because it's not usually followed by
a name, although 1>&2; occurs fairly often in both shell and awk.  But you
do get prog <input>output, which isn't OK in #PCDATA, and even
prog </tmp/xxx which can't go in CDATA.

You can edit the files to put spaces
in or quote the < and & < signs, but that's pre-processing.

So I suggest that in the Twenty Page XML Book there by a note about how
to include program listings, and exactly which characters need to be escaped,
and how to escape them, and have done with it.  And if multiple spaces and 
newlines need to be escaped, as I expect, say how.

Lee