Re: Parameter entities vs. GI name groups from Steven J. DeRose on 1997-06-24 (w3c-sgml-wg@w3.org from June 1997)

From: Steven J. DeRose <sjd@eps.inso.com>
Date: Tue, 24 Jun 1997 15:03:50 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19970624190350.00b610e8@pop>
Finally caught up on all this.

If PEs were pure macro substitution, then grammar, implementation, and
pedagogy would be relatively simple; the language, like cpp, is simply not
responsible for protecting the user from idiocy (this has plusses and minusses).

At the other extreme, if PEs were pure syntax objects (more like Scheme
macros), there would be relatively great safety/protection, while still
being quite easy to implement.

But as it is, we have something in the middle. It's still immensely useful,
but a serious pain to implement *because it doesn't pick either clear
choice*: it's neither macros not subroutines (to oversimplify a bit),
neither fish nor fowl: it has aspects of both. Thus our pain.

I see two main paths open: Kill or Simplify. Status quo doesn't work.
Featurecide is easier, simplification is more effective. The painful loss
with Featurecide is that PEs are all we have in SGML for accomplishing
subclassing (like many name group examples) and structs (like many attribute
list PEs). 

I think those who find these losses substantial have been pretty convincing.
The problem facing us, then, is that a useful conceptual notion is mixed up
with piddly details of syntax, such that parsers and applications really
don't "know" what's going on.

For example, one table model is documented as allowing you to declare your
own element types for table, row, etc, by setting a parameter entity. This
is sensible, but unless the application is very tightly integrated with the
parser it can't know which element in this shell game is now the "real"
table. Oops.

For another example, the old AAP DTD had something like:
   <!ENTITY % date "month, day, year">
that got used in a bunch of content models -- but no date element (well, it
had several date-ish elements but they didn't use the macro!). Oops: the
lexical stuff and the conceptual stuff got out of sync.

What we really want, I think, is a way to explicitly define groups of other
constructs, name the groups, and refer to them later. This would be very
easy if we were back in 1985; it would let the parser really *know*:

   <!ELEM-GRP LISTS (UL | OL | DL)>
   <!ATTR-GRP LOC   (IDREF  IDREF   #IMPLIED
                     ENT    ENTITY  #IMPLIED)>
   <!MS-KEY   mac    INCLUDE>

Such constructs would also preclude bizarre cases: in SGML86 you can easily
set a paremeter entity that's only used for marked sections keywords, to a
string that's always invalid there -- and that's not an erroreous
declaration. And we would have gotten even more powerful architectural forms
for free. But we ain't in 1985; yet Featurecide has too high a cost. Thus I
think we should apply our collective wit to finding a simplification. If it
takes WG8 changes, so be it. If it takes a WF/validity weasel, so be it. But
let's make the situation incrementally better instead of incrementally worse.

I'm tempted by the proposal to make PEs be purely a pre-parsing substitution
in WF, with the additional 8879 constraints required only in valid
documents. That seems almost identical to what we do with element structure:
any properly nested tree is happy in WF, and the DTD invokes additional
constraints. Nice and symmetrical. 

Steve

Steven J. DeRose, Ph.D., Chief Scientist
Inso Electronic Publishing Solutions
   (formerly EBT)
Received on Tuesday, 24 June 1997 15:07:42 UTC