Some progress on PE's from Tim Bray on 1997-06-26 (w3c-sgml-wg@w3.org from June 1997)

From: Tim Bray <tbray@textuality.com>
Date: Thu, 26 Jun 1997 09:30:11 -0700
To: w3c-sgml-wg@w3.org
Message-Id: <3.0.32.19970626093004.00a6c710@pop.intergate.bc.ca>
The ERB met on June 25; everyone but Dave Hollander was present.  There
was considerable discussion of Parameter Entities, an unofficial summary
follows:

1. The discussion in the WG makes it clear that XML's utility
   as an authoring environment would be severely compromised by omission
   of PE's, or even by constraining them much more than they are now.
2. It is generally agreed that implementation of the full suite of
   constraints on the placement of and replacement text for PE's is 
   beyond what should be expected of a lightweight nonvalidating parser.
3. It would be dangerous to relax the constraints, i.e. say that 
   PE references can go anywhere in the DTD, as this would tend to
   create a large legacy class of instances that would be well-formed
   but not 8879-conformant, and hard to make conformant.

Thus it was unanimously agreed to have two sets of PE rules, one for the
external and one for the internal subset.  In the external subset, the
rules will stand as they are now, although we'll try to improve the
explanation in the spec.  (Those who have said it could be explained more
clearly are hereby invited to submit specific suggestions).

In the internal subset, PEs must expand to match "markupdecl" (prod. 28
in the current draft), and references can only be placed where a 
markupdecl can be recognized.  The feeling is that this level of 
recognition is well within the capabilities of the most modest parsers.

In follow-on discussion, we realized that this highlights a weakness in 
the spec.  Currently, it only dicusses validating and non-validating
processors; the former are required to read the DTD.  In fact, there
are already non-validating parsers which *do* read and use the DTD, if
only to extract default attributes and entity declarations.  This seems
like an obviously good thing to do.  Yet, such processors are unlikely
to want to fetch and retrieve the whole external subset, since if they're
not validating they don't care about content models; it seems reasonable
for there to be a common class of XML documents which group the 
markup declarations not required for validation, but useful to a 
processor, in the internal subset for efficient transfer down the wire.

This is closely related to the question of the RMD; a conformant processor
cannot refuse, at the moment, to read the external subset; should this be
allowed in some class of nonvalidating parsers?  And yet the phrase "some
class" suggests the use of an option, something we have to date vigorously
resisted introducing into XML.

Solving this problem *may* be made easier by removing discussion of the
processor entirely from the spec, as suggested by both Henry Thompson
and Dan Conolly.

We judged that this lack of clarity is not fatal to the progress toward
a July 1st version of XML-lang (processors are de facto apparently 
doing what seems like the right thing) - but there will be a work item
on our agenda later this year to address and clean up this area.  
Furthermore, the next release of the draft will contain an editorial
note acknowledging the existence of this set of issues.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592
Received on Thursday, 26 June 1997 12:32:11 UTC