Re: A17: keep or drop entities?
On Thu, 10 Oct 1996 22:35:31 -0500 (CDT), DAVEP@acm.org wrote:
>On Thu, 10 Oct 1996, Charles F. Goldfarb wrote:
>>On Thu, 10 Oct 1996 10:12:13 -0400, "Eve L. Maler" <firstname.lastname@example.org> wrote:
>>>At 04:49 AM 10/10/96 GMT, Charles F. Goldfarb wrote:
>>>> The keyword SDATA in the ISO
>>>>character entity set is unnecessary because the replacement text is a symbolic
>>>>string. (My original intention was that a system would use an equivalent entity
>>>>set in which the replacement text was real system data.)
>>>The [xxxxxx] replacement text "templates" have been widely implemented
>>>to produce the desired glyphs. But this doesn't mean they're not system
>>>data, does it? It's still essentially a "processing instruction that
>>>returns data" (clause 8). Regular internal text entities aren't
>>>supposed to have this property.
>>Eve has made a very sensible observation, so let me explain my reasoning.
>>There are two principal purposes for labeling SDATA and PI:
>>1. To make it easy to locate and revise or remove system-specific information.
>>This, of course, enhances document portability and reuse by containing system
>>2. To prevent generated text from being parsed in context with the SGML
>>document. This enhances portability and reuse by assuring that all applications
>>will "see" the same data.
>>The symbolic replacement text in the ISO 8879 character entity sets don't
>>present a problem on either of those counts. They are not system-dependent and
>>they parse identically in all environments. That is because the generation of
>>system-specific data takes place in the *result* document; it is never seen by
>>the parser. In pernicious SDATA, the entity text is system-specific and
>>therefore needs to be labeled.
>There may have been just two principal purposes, but there is certainly
>a third in this day and age: When one is inserting an ordinary text entity,
>the replacement text is specifically intended to become characters in the
>document. When "&" is inserted in a document, one does not normally
>intend it to be the same as though "[amp ]" were directly in the document.
>Many processors rely on the SDATA designation to trigger the special
>handling. Leave it out and we're in trouble.
We're only in trouble for non-portable SDATA.
Let's approach this from another direction.
All data has the potential of being application-specific, in the sense that the
inherent semantics of the element type causes the data to be interpreted a
particular way. For example, a "time" element in some document type may treat
data content in the form "hh:mm.ss" as a time of day and always convert it to
GMT in the formatted output. As the data recognized by the parser is always the
same, this data is portable and non-pernicious, even though it is an
"instruction that returns data" or, if you prefer, a data content notation.
In the specific case of the ISO character entity sets and XML, the SGML system
does not need to know that the symbolic character names are SDATA because they
are portable (i.e., always produce the same ESIS). The XML application can rely
on the convention that "[tttttttt]" is always treated as a symbolic character
name for which a specific glyph is produced in the formatted output.
We could formalize this convention by defining an "xmlpcdata" notation that is
the fixed value of a notation attribute declared for every element type that has
#pcdata in its content model.
Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
13075 Paramount Drive * Saratoga CA 95070 * USA
International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
Prentice-Hall Series Editor * CFG Series on Open Information Management