- From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Date: Wed, 02 Oct 1996 20:12:08 -0400
- To: w3c-sgml-wg@w3.org
At 10:23 PM 10/2/96 GMT, Christopher R. Maden wrote: > - Potentially damaging whitespace (like the table-row problem in > Navigator) is eliminated when formatting; whitespace that doesn't > make sense in the context of a certain flow object is ignored. We're back to the True Information problem. Let's ignore tables, for a second, and look at something else. I don't want to give the impression that I'm obsessed with tables. <BIBLIO> <BOOK> ... </BOOK> <BOOK>... </BOOK> <ARTICLE>... </ARTCILE> </BILBIO> Okay, an SGML application knows that the newlines are insignificant because of the DTD. Now what about a formatter? It will, by default, leave the newlines in (because the central tenet of this proposal is that newlines are significant). So how do you turn them off? Okay, maybe with a stylesheet. Now your document looks okay in an SGML editor and in Netscape (presuming that SGML editors handle the RS/RE remapping hack). Now you want to convert it to RTF. Okay, maybe your conversion program has a stylesheet language that allows you to strip out newlines. Now you want to put it in an "XML database". Each element will be stored individually in the database, for later retrieval alone. How does the database determine which newlines go in the database, and which are "formatting". I guess you need some other "style sheet thing" (or a DTD). The problem is not immediate/practical. It is long-term/theoretical. I think that long-term degredation of your data will occur if you depend on "application conventions" (like "table smarts") to determine what is the real information and what is formatting. Therefore, the only safe way to encode this bibliography in the proposed markup language is with no insignificant newlines. In this case, the author can either have convenient editing or unambiguous true content. If we go with this proposal, we should be clear on that and encourage users to sacrifice convenience in favour of rigour. I could accept this, but would rather go the opposite way, like most markup languages (I think) and make newlines and tabs insignificant unless you declare them to be so (in some form of verbatim section). In most SGML documents, this is what authors intend most of the time. Unlike most markup languages, however I would proclaim that space characters are significant outside of markup(as Liam Quin said to me: "I kinda need the spaces between words." =) ). Certain kinds of formatting would have to be done with tabs and comments intead of spaces, and authors would have to be careful to put a space at the end of each line if they don't want their words concatenated. Most editors do this for you automatically. On both Windows and the Mac, the standard text editor widgets Do the Right Thing. Paul Prescod
Received on Wednesday, 2 October 1996 20:17:16 UTC