- From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Date: Thu, 03 Oct 1996 00:31:26 -0400
- To: lee@sq.com, papresco@calum.csclub.uwaterloo.ca, w3c-sgml-wg@w3.org
At 12:10 AM 10/3/96 EDT, lee@sq.com wrote: >Paul Prescod <papresco@calum.csclub.uwaterloo.ca> >>> You also need the newline to be significant between <emph>these</emph> >>> <number>two</number> lines. >> >> Most text editors seem to put spaces >> at the end of lines before they word-wrap. > >They do? None of the ones I use do. Well, try notepad. Type until the end of a line and watch it wrap. Then go back to the line before. There's a space there. >> Touch-typists will put them in without thinking about it. >Not the touch typists I know! Why on earth would they do that? If you're typing away, you are putting spaces between every word, right? So why would the ones at the ends of lines be magcially different unless you are very wedded to the concept that your computer is really a typewriter and that you should manually hit Return at the end of each line. >Note also that mail software often strips trailing spaces, as users >of older versions of uudecode may sadly attest. That's true. That would be a problem according to my scheme. >> If they are illegal, then your markup will be very "terse" (i.e. no >> formatting newlines). If they are passed on, then XML applications will >> sometimes be working with different data than SGML applications are. > >Of course, an XML-aware SGML program won't have a problem, and an >SGML-aware XML editor could easily give a choice of Save options. Wouldn't the SGML specification _require_ a validating parser to report that a document had data characters between element content elements? Does this mean that your application should do a preprocess before feeding the data to a validating parser? If A/E does this okay, it will be because it does NOT support the SGML DECL hacks that make this proposal possible. Since newlines are demoted to space characters, and space characters are essentially ordinary data characters, my obssesive table would be basically equivalent to this: <TABLE>b<TR>b<TD>b...</TD>b</TR>b</TABLE> I use the character b to indicate that it is not fair to expect a validating parser to just ignore arbitrary (whether they be space or 'b') data characters between cells in element content. In the same way, an application written to the XML spec has no good reason to expect that an author is going to throw random newlines at them in places where there should be no data (element content). If we are intent on allowing users to markup documents with newlines between elements in element content, then we MUST make those newlines insignificant (i.e. eaten by the parser). I suggest that we just make all newlines outside of verbatim elements insignificant and be done with it. But now I'm sounding like a broken record. Paul Prescod
Received on Thursday, 3 October 1996 00:36:27 UTC