Questions about external entities and entity declarations

I'm going to hijack something Lee said to raise a question:

At 09:11 PM 9/25/96 EDT, lee@sq.com wrote:
>For example, Pinnacles-style reflections combined with TEI Writing Sets
>would do more than supplant SDATA entities, I think, as well as shaming
>those HTML weenies into humble submission :-)  In fact, the Reflection
>idea is generally very useful.
>
>Long Note For Those Not Familiar With Pinnacles:
>    You use Pinnacles reflections as follows:
>    (1) create a source element with arbitrary content, and give it an ID
>    (2) have a reflection element that uses IDREF to point to the source ID.
>    (3) the application makes the source appear in place of the corresponding
>	reflection element wherever it appears.
>    One could use an architectural form to specify this behaviour.

--

After some offline discussions with Charles Goldfarb and James Clark, I
have revised my attempt at a restatement of the RE rules of clause
7.6.1., and will re-post it, for the edification of the participants, in
a moment.  Mostly the changes make it clearer, I hope, but there was one
outright error, namely the claim that everything in an SGML document is
either markup or content -- since the start- and end-tags of subelements
are both markup and part of the content of the parent element, the
correct opposition is not markup and content, but markup and non-markup,
or markup, data, and separators.

In sum, the rules prescribed by 8879 are these.  RE is insignificant
(i.e. not passed to any downstream application, not part of the
XML grove plan) when it occurs in any of the following patterns:

  start-tag nondata* RE
  RE nondata* end-tag
  RS nondata+ RE

where non-data is defined this way:

  nondata ::= comment declaration
             | processing instruction
             | character reference
             | entity reference
             | entity-end
             | marked section declaration
             | included subelement
             | short reference
             | shortref use declaration
             | link set use declaration

The element Q contains no REs in any of the following cases:

  <q>
  Listen to my heart beat.
  </q>

This is the simple case:  RE adjacent to a start-tag or end-tag.

  <q>
  <!-- sound track is silent -->
  Listen to my heart beat <!-- --
  ><?DIRECTOR begin: audio>
  and beat and beat and beat.
  </q>

Here rule (a) takes care of line 1, rule (c) of line 2, the comment of
line 3, rule (c) again of line 4, and rule (b) of line 5.

  <q><!-- sound track is silent -->
  Listen to my heart beat.
  </q>

This is the one case I can think of where the first RE is not
actually adjacent to the start-tag.

-C. M. Sperberg-McQueen


--

James Clark and I have prepared this concise definitive specification
of the rule for determining insignificant REs in data, with both XML
and SGML variants. It is based on Michael Sperberg-McQueen's clever
"nondata" formalism, which replaces a great deal of confusing text in
8879. I intend to propose this to WG8 at the November meeting.
(Note that for XML, the rule is 14 lines long, 9 of them formal.)


For XML and SGML:

An RE in data is insignificant (i.e. not passed to an application,
which is to say, not part of the grove) when it occurs in any of the
following patterns:

  start-tag  nondata*  RE
  RE         nondata*  end-tag
  RS         nondata+  RE

In applying this rule, a reference is transparent; only its
replacement is considered.

For XML only:

  nondata ::=
               comment declaration
             | processing instruction

  reference ::=
               character reference
             | entity reference


For SGML only:

  nondata ::=
               comment declaration
             | processing instruction
             | marked section declaration start
             | marked section end
             | included subelement
             | shortref use declaration
             | link set use declaration

  reference ::=
               character reference
             | entity reference
             | short reference

  marked section declaration start ::=
               marked section start
             , status keyword specification
             , dso

The rule is applied recursively to the data of included subelements.

--
Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
           13075 Paramount Drive * Saratoga CA 95070 * USA
  International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
 Prentice-Hall Series Editor * CFG Series on Open Information Management
--

Received on Saturday, 28 September 1996 22:22:59 UTC