Re: RS/RE: basic questions from Charles F. Goldfarb on 1996-10-01 (w3c-sgml-wg@w3.org from September 1996)

From: Charles F. Goldfarb <Charles@SGMLsource.com>
Date: Tue, 01 Oct 1996 03:19:59 GMT
To: "David G. Durand" <dgd@cs.bu.edu> (David G. Durand)
Cc: w3c-sgml-wg@w3.org
Message-ID: <325067d5.8619086@mail.alink.net>

On Mon, 30 Sep 1996 12:10:33 -0400, "David G. Durand" <dgd@cs.bu.edu> (David G.
Durand) wrote:

>So you beg the question in another way. The issue is _why_ we should treat
>the RE as markup rather than data. Since there is no _requirement_ to
>include an RE after a tag, why _must_ it not be significant. 

Because the tag isn't part of the data. Therefore, any RE whose presence depends
on where a tag is positioned can't possibly be part of the data either. 

>I've shown how
>you can trivially add line RE and whitespace as needed to pretty-print your
>text without the RE ignoring rule.
><p
>>This is a paragraph.</p>

It's not quite as pretty, or as natural, as:

<p>
This is a paragraph.
</p>

or even:

<p>
"This is a paragraph."
</p>

>
>   Other than pretty printing, why should that information not be
>significant? I would appreciate a user-level justification, not a another
>"true information" comment that pre-supposes that whitespace after markup
>must be insignificant. The requirement that it be insignificant is the
>issue at question. I don't see that the rule gains us anything of
>importance, but I could be convinced, given an argument.

O.K. I'll try again.

A user wants to capture a particular piece of data. He knows what it is. He
knows whether there are REs in it. He takes the following steps (which you can
follow on your own text editor):

1. He opens a text editor and types in the data, putting the REs where he wants
them.

Now is the time for all good men
to come to the aid of the party.

2. He now wants to identify that data as a "p" element, so he enters tags,  like
this:

<p>Now is the time for all good men
to come to the aid of the party.</p>

3. He decides that the structure would look clearer during editing if he put the
tags on their own input lines, like this:

<p>
Now is the time for all good men
to come to the aid of the party.
</p>

In all of these steps, he hasn't changed one character of the data. So why
should the application see a single RE in step 2 and three REs in step 3? The
data has only ever had a single RE.

>   I keep thinking about a bug I was told about where a database was driven
>by the ESIS, and later spat out the markup without a CR after every tag.
>For some elements with several REs at the beginning they would thus lose
>one _significant_ RE on every check-in/check-out cycle. They had to put an
>RE after every tag to make things work right (or they would have had to
>hack their parser to inform the application of the "non-significant"
>whitespace).

The operant word is "bug". If you are regenerating  SGML from a database and the
data of an element starts with a significant RE, you obviously need to insert an
insignificant RE after the start-tag. In other cases you don't have to, but it
is always safe to do so. Both algorithms are pretty simple, so a good system
will allow the user to choose between them.

--
Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
           13075 Paramount Drive * Saratoga CA 95070 * USA
  International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
 Prentice-Hall Series Editor * CFG Series on Open Information Management
--

Received on Monday, 30 September 1996 23:17:39 UTC