Re: A17: keep or drop entities?

On Thu, 10 Oct 1996, Charles F. Goldfarb wrote:

>On Thu, 10 Oct 1996 10:12:13 -0400, "Eve L. Maler" <elm@arbortext.com> wrote:

>>At 04:49 AM 10/10/96 GMT, Charles F. Goldfarb wrote:

>>>                                                The keyword SDATA in the ISO
>>>character entity set is unnecessary because the replacement text is a symbolic
>>>string. (My original intention was that a system would use an equivalent entity
>>>set in which the replacement text was real system data.)

>>The [xxxxxx] replacement text "templates" have been widely implemented 
>>to produce the desired glyphs.  But this doesn't mean they're not system
>>data, does it?  It's still essentially a "processing instruction that
>>returns data" (clause 8).  Regular internal text entities aren't 
>>supposed to have this property.

>Eve has made a very sensible observation, so let me explain my reasoning.

>There are two principal purposes for labeling SDATA and PI:
>1. To make it easy to locate and revise or remove system-specific information.
>This, of course, enhances document portability and reuse by containing system
>2. To prevent generated text from being parsed in context with the SGML
>document. This enhances portability and reuse by assuring that all applications
>will "see" the same data.

>The symbolic replacement text in the ISO 8879 character entity sets don't
>present a problem on either of those counts. They are not system-dependent and
>they parse identically in all environments. That is because the generation of
>system-specific data takes place in the *result* document; it is never seen by
>the parser. In pernicious SDATA, the entity text is system-specific and
>therefore needs to be labeled.

There may have been just two principal purposes, but there is certainly
a third in this day and age:  When one is inserting an ordinary text entity,
the replacement text is specifically intended to become characters in the
document.  When "&amp;" is inserted in a document, one does not normally
intend it to be the same as though "[amp   ]" were directly in the document.
Many processors rely on the SDATA designation to trigger the special
handling.  Leave it out and we're in trouble.

Dave Peterson