[Prev][Next][Index][Thread]

Re: RS/RE: basic questions



At 12:10 AM 10/3/96 EDT, lee@sq.com wrote:
>Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
>>> You also need the newline to be significant between <emph>these</emph>
>>> <number>two</number> lines.
>> 
>> Most text editors seem to put spaces
>> at the end of lines before they word-wrap.
>
>They do?  None of the ones I use do.

Well, try notepad. Type until the end of a line and watch it wrap. Then go
back to the line before. There's a space there.

>> Touch-typists will put them in without thinking about it.
>Not the touch typists I know!  Why on earth would they do that?

If you're typing away, you are putting spaces between every word, right? So
why would the ones at the ends of lines be magcially different unless you
are very wedded to the concept that your computer is really a typewriter and
that you should manually hit Return at the end of each line.

>Note also that mail software often strips trailing spaces, as users
>of older versions of uudecode may sadly attest.

That's true. That would be a problem according to my scheme.

>> If they are illegal, then your markup will be very "terse" (i.e. no
>> formatting newlines). If they are passed on, then XML applications will
>> sometimes be working with different data than SGML applications are.
>
>Of course, an XML-aware SGML program won't have a problem, and an
>SGML-aware XML editor could easily give a choice of Save options.

Wouldn't the SGML specification _require_ a validating parser to report that
a document had data characters between element content elements? Does this
mean that your application should do a preprocess before feeding the data to
a validating parser? If A/E does this okay, it will be because it does NOT
support the SGML DECL hacks that make this proposal possible. 

Since newlines are demoted to space characters, and space characters are
essentially ordinary data characters, my obssesive table would be basically
equivalent to this:

<TABLE>b<TR>b<TD>b...</TD>b</TR>b</TABLE>

I use the character b to indicate that it is not fair to expect a validating
parser to just ignore arbitrary (whether they be space or 'b') data
characters between cells in element content.

In the same way, an application written to the XML spec has no good reason
to expect that an author is going to throw random newlines at them in places
where there should be no data (element content).

If we are intent on allowing users to markup documents with newlines between
elements in element content, then we MUST make those newlines insignificant
(i.e. eaten by the parser). I suggest that we just make all newlines outside
of verbatim elements insignificant and be done with it. But now I'm sounding
like a broken record.

 Paul Prescod


Follow-Ups: