Re: RS/RE: basic questions

At 9:31 AM 9/30/96, Charles F. Goldfarb wrote:
>On Sat, 28 Sep 1996 21:37:35 -0400, "David G. Durand" <dgd@cs.bu.edu> (David G.
>Durand) wrote:
>>
>>The definition of "true content" presupposes that RE after markup should
>
>^^^^
>>not be significant (the core of the confusion around the SGML RE rules).
>>With that defintion in hand we can quickly come to the conclusion that
>>SGML's RE/RS rules are necessary to return the true content.
>
>No, the rule is that an RE *caused by* markup is not significant. By
>definition,
>markup is that which is not data. Data is the true information. Therefore,
>an RE
>that wouldn't be there if some markup weren't there isn't data and isn't true
>information.
>
>To put it another way, you should be able to remove all the markup and get
>exactly the same data as when the markup is present.

So you beg the question in another way. The issue is _why_ we should treat
the RE as markup rather than data. Since there is no _requirement_ to
include an RE after a tag, why _must_ it not be significant. I've shown how
you can trivially add line RE and whitespace as needed to pretty-print your
text without the RE ignoring rule.
<p
>This is a paragraph.</p>

   Other than pretty printing, why should that information not be
significant? I would appreciate a user-level justification, not a another
"true information" comment that pre-supposes that whitespace after markup
must be insignificant. The requirement that it be insignificant is the
issue at question. I don't see that the rule gains us anything of
importance, but I could be convinced, given an argument.

  One consistent proposal that would make more sense to me would be to
_require_ tags to occupy a line by themselves. The RE would always be
there, and would always be ignored. This guarantees that it's easy to
explain as well. (I don't think people would like it, though).

   I keep thinking about a bug I was told about where a database was driven
by the ESIS, and later spat out the markup without a CR after every tag.
For some elements with several REs at the beginning they would thus lose
one _significant_ RE on every check-in/check-out cycle. They had to put an
RE after every tag to make things work right (or they would have had to
hack their parser to inform the application of the "non-significant"
whitespace).

   -- David

RE delenda est.

--------------------------------------------+--------------------------
David Durand                  dgd@cs.bu.edu | david@dynamicDiagrams.com
Boston University Computer Science          | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/    | http://dynamicDiagrams.com/

Received on Monday, 30 September 1996 12:06:47 UTC