- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Wed, 23 Oct 96 17:39:24 CDT
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
The ERB met today, 23 October 1996, and decided a number of questions. All members of the ERB were present (Bosak, Bray, Clark, DeRose, Hollander, Kimber, Magliery, Maler, Paoli, Sharpe, Sperberg-McQueen); decisions were taken by consensus except as noted. As usual, summaries of the rationale for the decisions made have not been reviewed by the ERB and are thus subject to correction and further explanation. A.17 Should XML have entities, or not? The ERB had already agreed that XML should have internal text entities and external NDATA entities. Today, after discussion, we agreed that support for external text entities would be an optional feature of XML 1.0 (dissenting: Clark, Paoli, Sharpe). The rationale for the decision was that support for external entities is (a) essential if XML is to be useful as an authoring language, but (b) a heavy burden for network-based client software. A proposal to define XML in such a way that external text entities were legal only if in local files (and thus not legal in network use of XML) attracted some support, but not enough. The dissenting view on this decision was that allowing an optional feature and losing the monolithic definition of XML was too high a cost; the dissenters all also felt that external text entities should be disallowed unconditionally. External text entities will be placed on the list of topics to be reviewed in preparing future versions of XML. This topic may also be revisited in the near future (i.e. before version 1.0), depending on reports on the progress and status of W3C-based work on this and related topics. The question of SDATA entities will be taken up again before XML 1.0 is published. C.1 should XML require all entities to be synchronous with the document's logical structure? Agreed unanimously that XML will require all entities to be synchronous with the document's element structure. The rationale is that this simplifies parsing somewhat, allows entity expansion to be delayed if the implementation desires to do so, and makes possible simple checks for the well-formedness of external (and internal) entities. C.2 should XML prescribe the use of an ENTITY-END character as the canonical method of handling entity boundaries, as a way of simplifying exposition and implementation (6.2.2)? Agreed unanimously not to prescribe any particular method of handling entity ends; rationale: the proposal would tend to confuse, not simplify, the issue. C.3 should XML retain or relax SGML's prohibition on ENTITY attributes referring to SGML text entities (7.9.4.3)? Agreed unanimously to retain the prohibition. Rationale: compatibility. C.4 if XML makes DTDs optional and allows partial DTDs, what must or may a parser do when it encounters references to undeclared entities (9.4)? Should XML declare any set of entities automatically? Agreed unanimously that reference to an entity not declared and not included in the list of 'automatic' declarations is a reportable error. No particular error recovery strategy will be prescribed. Rationale: defining this as a non-error would weaken validation too much; error recovery should be left to the implementation, as different strategies are appropriate for different purposes. Agreed unanimously to define automatically the entities lt, gt, amp, and two entities for double and single quotation (for use in attribute value literals), names to be determined in separate discussion. Proposals to declare other sets of entities automatically (e.g. all of ISO Latin 1 or all entities declared in HTML 3.2) remain open questions. C.5 if XML uses ISO 10646, should there be a special form of character reference using hexadecimal, not decimal, numbers, since most references to ISO 10646 and Unicode use hex, not decimal (9.5)? Agreed (Clark dissenting) to specify that XML documents may refer to characters in ISO 10646 using the form '&u-' or '&U-' followed by four hexadecimal digits, followed by semicolon. Rationale: Unicode and ISO 10646 documentation is in hexadecimal, not decimal, so this constitutes a small but important convenience and aid to reliability. The proposal to use '&u' was preferred to the '&#u' proposal since it is believed to allow SGML systems to handle these references (which appear to an SGML parser to be general entity references) using a default entity declaration. (Consult James Clark for details.) C.6 Should XML retain SGML's prohibition on multiple declarations for the same element (11.2.1)? Agreed unanimously to retain the prohibition. Rationale: compatibility. (Some ERB members may also apply the same rationale as for the dissent on question C.8.) C.7 Should XML prohibit the use of inclusion and exclusion exceptions in element declarations? (11.2.4, 11.2.5)? Agreed unanimously to prohibit their use in XML 1.0, and (with dissents from Bray, Magliery, and Sperberg-McQueen) to place them on the list of topics to be considered in preparing future versions. Rationale: simplification of validation and harmonization of XML parsing model with standard formal-language theory and practice. C.8 Should XML prohibit content-model references to undeclared elements (11.2.4)? Agreed (Bray, DeRose, and Sharpe dissenting) to allow such references. Rationale: this is a useful technique in the construction of large public DTDs which may be subsetted locally or document-by-document. Rationale for the dissent: clean grammars are easier to process and parse than dirty grammars. (N.B. 'clean' and 'dirty' here have the technical senses usual in discussions of formal grammars.) C.9 Should XML forbid use of the '&' connector in content models (11.2.4.1)? Agreed unanimously to forbid use of the '&' connector in XML. Rationale: harmonization with conventional regular expressions. C.11 Should XML retain SGML's prohibition on multiple attribute-list declarations for the same element (11.3.1) or on multiple declarations for the same attribute (11.3.2)? Agreed unanimously to retain the prohibition. Rationale: compatibility. (Some ERB members may also apply the same rationale as for the dissent on question C.8.) C.12 Should XML change the set of types available for attributes? E.g. by suppressing NAME(S), NUMBER(S), NMTOKEN(S), NUTOKEN(S) and adding constraints in the form of regular expressions, ISO dates, language-code, external-id, type IDREF, ... (7.9.4, 11.3.3) After discussion, agreed unanimously that XML should have the following attribute types: ID, IDREF, IDREFS, ENTITY, ENTITIES, CDATA, enumerated attribute types, NOTATION attribute type, NMTOKEN and NMTOKENS. The types NUMBER(S), NUTOKEN(S), AND NAME(S) are to be dropped. Rationale: the distinctions among the lexically defined types are not useful enough to justify retaining all of them, but they do provide convenient case-folding and white-space normalization. If just one is to be kept, it should be NMTOKENS, since it subsumes all the others and the other lexical types of SGML can be translated into XML by retyping them as NMTOKENS and adding an application-level check on the specific type of token required. Such application-level checks are in any case common among users of these types. The type NMTOKEN was retained in order to preserve the singular/plural symmetry with IDREF and ENTITY. Extensions to the set of declared-value types in ISO 8879, though supported by Sperberg-McQueen, commanded no support for inclusion in XML 1.0. Other decisions in batch C are still pending. -C. M. Sperberg-McQueen
Received on Wednesday, 23 October 1996 18:42:29 UTC