ERB decisions, 23 October 1996

The ERB met today, 23 October 1996, and decided a number of
questions.  All members of the ERB were present (Bosak, Bray,
Clark, DeRose, Hollander, Kimber, Magliery, Maler, Paoli,
Sharpe, Sperberg-McQueen); decisions were taken by consensus
except as noted.

As usual, summaries of the rationale for the decisions made
have not been reviewed by the ERB and are thus subject to
correction and further explanation.



A.17 Should XML have entities, or not?

The ERB had already agreed that XML should have internal text entities
and external NDATA entities.  Today, after discussion, we agreed that
support for external text entities would be an optional feature of XML
1.0 (dissenting: Clark, Paoli, Sharpe).

The rationale for the decision was that support for external entities
is (a) essential if XML is to be useful as an authoring language, but
(b) a heavy burden for network-based client software.  A proposal to
define XML in such a way that external text entities were legal only
if in local files (and thus not legal in network use of XML) attracted
some support, but not enough.

The dissenting view on this decision was that allowing an optional
feature and losing the monolithic definition of XML was too high a
cost; the dissenters all also felt that external text entities should
be disallowed unconditionally.

External text entities will be placed on the list of topics to be
reviewed in preparing future versions of XML.

This topic may also be revisited in the near future (i.e. before
version 1.0), depending on reports on the progress and status of
W3C-based work on this and related topics.

The question of SDATA entities will be taken up again before XML 1.0
is published.


C.1 should XML require all entities to be synchronous with the
document's logical structure?

Agreed unanimously that XML will require all entities to be
synchronous with the document's element structure.

The rationale is that this simplifies parsing somewhat, allows entity
expansion to be delayed if the implementation desires to do so, and
makes possible simple checks for the well-formedness of external (and
internal) entities.


C.2 should XML prescribe the use of an ENTITY-END character as the
canonical method of handling entity boundaries, as a way of
simplifying exposition and implementation (6.2.2)?

Agreed unanimously not to prescribe any particular method of handling
entity ends; rationale: the proposal would tend to confuse, not
simplify, the issue.


C.3 should XML retain or relax SGML's prohibition on ENTITY attributes
referring to SGML text entities (7.9.4.3)?


Agreed unanimously to retain the prohibition.  Rationale:
compatibility.


C.4 if XML makes DTDs optional and allows partial DTDs, what must or
may a parser do when it encounters references to undeclared entities
(9.4)?  Should XML declare any set of entities automatically?

Agreed unanimously that reference to an entity not declared and not
included in the list of 'automatic' declarations is a reportable
error.  No particular error recovery strategy will be prescribed.
Rationale: defining this as a non-error would weaken validation too
much; error recovery should be left to the implementation, as
different strategies are appropriate for different purposes.

Agreed unanimously to define automatically the entities lt, gt, amp,
and two entities for double and single quotation (for use in attribute
value literals), names to be determined in separate discussion.

Proposals to declare other sets of entities automatically (e.g.  all
of ISO Latin 1 or all entities declared in HTML 3.2) remain open
questions.


C.5 if XML uses ISO 10646, should there be a special form of character
reference using hexadecimal, not decimal, numbers, since most
references to ISO 10646 and Unicode use hex, not decimal (9.5)?


Agreed (Clark dissenting) to specify that XML documents may refer to
characters in ISO 10646 using the form '&u-' or '&U-' followed by four
hexadecimal digits, followed by semicolon.

Rationale: Unicode and ISO 10646 documentation is in hexadecimal, not
decimal, so this constitutes a small but important convenience and aid
to reliability.  The proposal to use '&u' was preferred to the '&#u'
proposal since it is believed to allow SGML systems to handle these
references (which appear to an SGML parser to be general entity
references) using a default entity declaration.  (Consult James Clark
for details.)


C.6 Should XML retain SGML's prohibition on multiple declarations for
the same element (11.2.1)?

Agreed unanimously to retain the prohibition.  Rationale:
compatibility.  (Some ERB members may also apply the same
rationale as for the dissent on question C.8.)


C.7 Should XML prohibit the use of inclusion and exclusion exceptions in
element declarations? (11.2.4, 11.2.5)?

Agreed unanimously to prohibit their use in XML 1.0, and (with
dissents from Bray, Magliery, and Sperberg-McQueen) to place
them on the list of topics to be considered in preparing future
versions.  Rationale:  simplification of validation and
harmonization of XML parsing model with standard formal-language
theory and practice.


C.8 Should XML prohibit content-model references to undeclared elements
(11.2.4)?

Agreed (Bray, DeRose, and Sharpe dissenting) to allow such
references.  Rationale:  this is a useful technique in the
construction of large public DTDs which may be subsetted
locally or document-by-document.  Rationale for the dissent:
clean grammars are easier to process and parse than dirty
grammars.  (N.B. 'clean' and 'dirty' here have the technical
senses usual in discussions of formal grammars.)


C.9 Should XML forbid use of the '&' connector in content models
(11.2.4.1)?

Agreed unanimously to forbid use of the '&' connector in XML.
Rationale:  harmonization with conventional regular expressions.


C.11  Should XML retain SGML's prohibition on multiple attribute-list
declarations for the same element (11.3.1) or on multiple declarations
for the same attribute (11.3.2)?

Agreed unanimously to retain the prohibition.  Rationale:
compatibility.  (Some ERB members may also apply the same
rationale as for the dissent on question C.8.)


C.12 Should XML change the set of types available for attributes?  E.g.
by suppressing NAME(S), NUMBER(S), NMTOKEN(S), NUTOKEN(S) and adding
constraints in the form of regular expressions, ISO dates,
language-code, external-id, type IDREF, ... (7.9.4, 11.3.3)

After discussion, agreed unanimously that XML should have the
following attribute types:  ID, IDREF, IDREFS, ENTITY, ENTITIES,
CDATA, enumerated attribute types, NOTATION attribute type,
NMTOKEN and NMTOKENS.  The types NUMBER(S), NUTOKEN(S), AND NAME(S)
are to be dropped.

Rationale: the distinctions among the lexically defined types are not
useful enough to justify retaining all of them, but they do provide
convenient case-folding and white-space normalization.  If just one
is to be kept, it should be NMTOKENS, since it subsumes all the others
and the other lexical types of SGML can be translated into XML by
retyping them as NMTOKENS and adding an application-level check on
the specific type of token required.  Such application-level checks
are in any case common among users of these types.  The type NMTOKEN
was retained in order to preserve the singular/plural symmetry with
IDREF and ENTITY.

Extensions to the set of declared-value types in ISO 8879, though
supported by Sperberg-McQueen, commanded no support for inclusion
in XML 1.0.


Other decisions in batch C are still pending.

-C. M. Sperberg-McQueen

Received on Wednesday, 23 October 1996 18:42:29 UTC