[Prev][Next][Index][Thread]

Re: RS/RE: Yet Another Proposal



At 10:23 PM 10/2/96 GMT, Christopher R. Maden wrote:
>  - Potentially damaging whitespace (like the table-row problem in
>    Navigator) is eliminated when formatting; whitespace that doesn't
>    make sense in the context of a certain flow object is ignored.

We're back to the True Information problem.

Let's ignore tables, for a second, and look at something else. I don't want
to give the impression that I'm obsessed with tables.

<BIBLIO>
<BOOK> ...
</BOOK>
<BOOK>...
</BOOK>
<ARTICLE>...
</ARTCILE>
</BILBIO>

Okay, an SGML application knows that the newlines are insignificant because
of the DTD. Now what about a formatter? It will, by default, leave the
newlines in (because the central tenet of this proposal is that newlines are
significant). So how do you turn them off? 

Okay, maybe with a stylesheet. Now your document looks okay in an SGML
editor and in Netscape (presuming that SGML editors handle the RS/RE
remapping hack). Now you want to convert it to RTF. Okay, maybe your
conversion program has a stylesheet language that allows you to strip out
newlines.

Now you want to put it in an "XML database". Each element will be stored
individually in the database, for later retrieval alone. How does the
database determine which newlines go in the database, and which are
"formatting". I guess you need some other "style sheet thing" (or a DTD).

The problem is not immediate/practical. It is long-term/theoretical. I think
that long-term degredation of your data will occur if you depend on
"application conventions" (like "table smarts") to determine what is the
real information and what is formatting. Therefore, the only safe way to
encode this bibliography in the proposed markup language is with no
insignificant newlines.

In this case, the author can either have convenient editing or unambiguous
true content. If we go with this proposal, we should be clear on that and
encourage users to sacrifice convenience in favour of rigour.

I could accept this, but would rather go the opposite way, like most markup
languages (I think) and make newlines and tabs insignificant unless you
declare them to be so (in some form of verbatim section). In most SGML
documents, this is what authors intend most of the time.

Unlike most markup languages, however I would proclaim that space characters
are significant outside of markup(as Liam Quin said to me: "I kinda need the
spaces between words." =) ). Certain kinds of formatting would have to be
done with tabs and comments intead of spaces, and authors would have to be
careful to put a space at the end of each line if they don't want their
words concatenated. Most editors do this for you automatically. On both
Windows and the Mac, the standard text editor widgets Do the Right Thing.

 Paul Prescod


Follow-Ups: