Parameter Entities: 'To PE or not to PE' (long)

This posting is based on development experience, but it's directed
specifically at the question of just how Simple we can Keep It,
Stupid, so I've posted it here.

There are two levels of complexity to implementing PEs as they
currently stand:  Stacking sources and restricting contexts.

1) By 'stacking sources', I simply mean that your input scanner has to be
able to push and pop input sources in response to entity references
and entity text/stream ends.  This is vanilla computer science stuff,
is necessary for general entities as well, and therefore in my view
doesn't bare on the PE question as such.  Only if what got rid of
everything except character references would this one go away.

2) By 'restricting contexts' I mean two things.  One is that the
editors, under considerable pressure from a number of people including
me, came up with a presentationally very elegent way of expressing the
requirement that parameter entity substitution had to respect a class
of coherence constraints which 8879 expresses in terms of the EE
pseudo-character.  The second is the set of rules governing exactly
when entity 'expansion' happens.  This latter is in my view subtle,
but not difficult.  Once understood, its implementation is simple.
It's the first which is problematic.  Although cleanly stated, and
having low visual impact, the %<non-terminal> approach is NOT liable
to simple implementation in a straight-forward scanner+parser
implementation.  I'm not sure I understand what Tim and Norbert have
done about this from their messages:  what we have done in LT XML is
simply allow and expand PE references virtually anywhere in the DTD.
The scanner implements this, so it basically means that a PE will be
recognised and replaced at any token boundary in the XML syntax.  If
the clause "its replacement text must match 'S? a S?'" is taken as a
well-formedness constraint, then our parser is not conformant, and
probably cannot be made to be.

My reluctant conclusion from all this is that we should go back to the
November draft on this one, and allow PE references in the same
context as we allow conditional sections, namely at the level of
production 29, with an explicit well-formedness constraint which says
that the expansion must match 'S? markupdecl S?'.

I say this despite my recognition of the validity of the points Terry,
Eve and others have made about the utility of sophisticated use of PEs
for DTD customisation, modularisation and maintenance.  I'm a paid-up
charter member of the PE Hackers club myself.  But I'm becoming
increasingly convinced that the right thing to do is take XML back
DOWN towards greater simplicity, and build new and principled
solutions to the customisation and modularisation problems USING and ON
TOP of XML, rather than building them INTO it.

As far as I can see, a simple but powerful namespace scoping mechanism
(see last week's proposal :-) and a simple top-level external PE
mechanism are all I need to have the building blocks in place to start
experimenting with that approach.

ht

Received on Sunday, 1 June 1997 08:42:10 UTC