- From: Steven J. DeRose <sjd@eps.inso.com>
- Date: Tue, 24 Jun 1997 15:03:50 -0400
- To: w3c-sgml-wg@w3.org
Finally caught up on all this. If PEs were pure macro substitution, then grammar, implementation, and pedagogy would be relatively simple; the language, like cpp, is simply not responsible for protecting the user from idiocy (this has plusses and minusses). At the other extreme, if PEs were pure syntax objects (more like Scheme macros), there would be relatively great safety/protection, while still being quite easy to implement. But as it is, we have something in the middle. It's still immensely useful, but a serious pain to implement *because it doesn't pick either clear choice*: it's neither macros not subroutines (to oversimplify a bit), neither fish nor fowl: it has aspects of both. Thus our pain. I see two main paths open: Kill or Simplify. Status quo doesn't work. Featurecide is easier, simplification is more effective. The painful loss with Featurecide is that PEs are all we have in SGML for accomplishing subclassing (like many name group examples) and structs (like many attribute list PEs). I think those who find these losses substantial have been pretty convincing. The problem facing us, then, is that a useful conceptual notion is mixed up with piddly details of syntax, such that parsers and applications really don't "know" what's going on. For example, one table model is documented as allowing you to declare your own element types for table, row, etc, by setting a parameter entity. This is sensible, but unless the application is very tightly integrated with the parser it can't know which element in this shell game is now the "real" table. Oops. For another example, the old AAP DTD had something like: <!ENTITY % date "month, day, year"> that got used in a bunch of content models -- but no date element (well, it had several date-ish elements but they didn't use the macro!). Oops: the lexical stuff and the conceptual stuff got out of sync. What we really want, I think, is a way to explicitly define groups of other constructs, name the groups, and refer to them later. This would be very easy if we were back in 1985; it would let the parser really *know*: <!ELEM-GRP LISTS (UL | OL | DL)> <!ATTR-GRP LOC (IDREF IDREF #IMPLIED ENT ENTITY #IMPLIED)> <!MS-KEY mac INCLUDE> Such constructs would also preclude bizarre cases: in SGML86 you can easily set a paremeter entity that's only used for marked sections keywords, to a string that's always invalid there -- and that's not an erroreous declaration. And we would have gotten even more powerful architectural forms for free. But we ain't in 1985; yet Featurecide has too high a cost. Thus I think we should apply our collective wit to finding a simplification. If it takes WG8 changes, so be it. If it takes a WF/validity weasel, so be it. But let's make the situation incrementally better instead of incrementally worse. I'm tempted by the proposal to make PEs be purely a pre-parsing substitution in WF, with the additional 8879 constraints required only in valid documents. That seems almost identical to what we do with element structure: any properly nested tree is happy in WF, and the DTD invokes additional constraints. Nice and symmetrical. Steve Steven J. DeRose, Ph.D., Chief Scientist Inso Electronic Publishing Solutions (formerly EBT)
Received on Tuesday, 24 June 1997 15:07:42 UTC