- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Thu, 14 Nov 96 15:55:55 CST
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
The ERB met yesterday, 13 November 1996, to discuss the XML working draft and approve the distribution of the current text at SGML '96 next week. We considered a number of topics arising from the draft, some of which have already been discussed, or are still being discussed, on this list, and other of which have not received much discussion. Present: Bosak, Bray (intermittently), Clark, DeRose, Kimber, Maler, Magliery, Paoli, Sharpe, and Sperberg-McQueen. Absent: Hollander. The author's apologies to busy members of the WG who would prefer a shorter account of the decisions; recent claims on the WG list that the ERB does not explain or discuss its decisions with the rest of the WG have led me, perhaps mischievously, to provide as full a discussion and explanation as my fingers can handle. There's an executive summary at the end. Given the number of major topics on which the WG appears not to have reached consensus and the volume of comment lately, it seems safe to say that some issues will require ongoing consideration and discussion, and the text of the working draft which we can distribute next week will be subject to change in non-trivial ways before we can leave this phase of the project's work behind. We considered dropping the plan to distribute printed copies at SGML '96, in order not to give a false impression of completeness. On the whole, however, the ERB thought that having printed copies available would be worthwhile, and we decided to go ahead with the plan. The cover page will, like the current Web copies, identify the document as a Working Draft, so the fact that it's not completely stable should be visible to any reader. And as the experience of the ERB and WG shows, having something that appears completed is one of the best ways to get people to read a draft and comment on it. Since Henry Thompson raised the question directly: no, it's not too late for comments on substantive issues. The document is a Working Draft, and when the ERB stops work on it and moves to the next phase, it will still be a Working Draft until the W3C advances it to Draft Recommendation status, using the normal W3C procedures. There is some sentiment for avoiding the kind of violent swings in philosophy and technical direction that characterize some working drafts in some organizations, but in principle and in practice, working drafts are subject to change, and discussion about what changes to make is always appropriate unless the rules of the WG make it out of order (e.g. while we focus on some specific issue). In the meantime, it is too late for typographic corrections to be included in the version distributed at SGML '96. * Jon Bosak suggested we reconsider the decision to include all the Cougar entities as predefined entities in XML; examining the list with more care while preparing it for inclusion in the spec, he had noticed a number of inconsistencies and infelicities -- especially the fact that the entity names for some Greek characters are taken from the ISOgrk1 set and others from the ISOgrk3 set. In discussion with Jon, Anders Berglund had also pointed out some other problems with the entity set. In view of the negative reaction from some members of the WG, it also appeared to some ERB members that the inclusion of these entities, intended as a convenience for users, would strike some users as the reverse. Agreed (Paoli abstaining) to remove the Cougar entities from the spec. * Agreed to keep the five entities lt, gt, amp, quot, and apos, for use in escaping markup delimiters, and to retain their current status as non-redeclarable. Rationale: we agreed long ago that entity reference was the preferred method of escaping markup characters, and it's clearly better if tools generating XML can rely on the standard entity names. To the lasting disappointment of the humorists among us, the name squot was dropped in favor of apos, which occurs in 8879 Annex D, set ISOnum, as do the others. * Discussed, once more, question C.10 (allow or prohibit non-deterministic content models). In a recent meeting, there had been wide support for reconsidering the decision (reported 6 November) to prohibit such models, for the sake of compatibility (thus ensuring that SGML-based processors can handle all XML documents). In particular, it was pointed out that many existing SGML systems have no trouble at all with non-deterministic models, and argued that the restriction to determinism is poorly motivated, since it does not in fact provide serious benefits to implementors (this is a disputed point) and (pace Charles Goldfarb) has at best a neutral effect on legibility of content models by end users. It was generally thought that the spec would be cleaner without the restriction. In this meeting, this discussion was continued. Some ERB members not present at the earlier discussion argued against lifting the prohibition, on the grounds that (a) WG 8 has not agreed to change this rule in the revision of SGML, and there is no reason to think such a change likely, (b) there are some widely used SGML systems (more than one or two) which rely crucially on determinism in the content model, and (c) some nondeterministic content models have no deterministic equivalent, so the idea of providing an algorithm for making all content models deterministic is not feasible. Determinism is not particularly important to XML, but in Full SGML, the AND connector and the definition of start-tag omission interact with it and make it far more important. A minority asked what AND has to do with it, and suggested that all cases of start-tag omission allowed by the current rules would also be possible with nondeterministic rules, as experimentation with any LALR(1) parser generator should show. We never did clarify how AND makes determinism more important, and on the other hand we never did hear anything like a proof that LR(1) parsing could handle all cases of start-tag omission, let alone -- the really hard part -- an argument showing that LALR(1) parsing can be documented suitably in ISO-type language. A digression into practical linguistics and stylistic criticism loomed, full of anecdotes about standard-speak, but was luckily averted. Decision: retain the prohibition. This was not unanimous, but my notes don't record the vote, so I don't recall who besides Tim Bray was in dissent. * We reconsidered also the decision on question C.14 (reported 6 Nov) to drop SGML's prohibition on overlapping sets of name tokens in enumerated attribute types. The SGML revision group is on record as favoring this change, it seems to be agreed that there is no technical reason for the prohibition, and dropping it gives users a much improved tool. The discussion in the WG, however, seems to suggest that some portions of the community will be extremely, perhaps excessively, alarmed if XML anticipates the SGML revision in this regard. We considered standing by the decision reported on 6 November (clean this area up), reversing it (follow ISO 8879:1986, even though on this point the rule is uniformly thought to be a design error), and drop enumerated data types for attributes, and place them on the list of constructs to be added in a later revision. There was almost no support for dropping enumerated types: they are extremely useful both for validation and for documenting the expected range of values, and they make it possible for authoring systems to provide much better support for attribute value specification. The other two possibilities were very closely matched, but after long discussion the majority view came to be that the symbolic importance of guaranteeing that all valid XML documents are valid SGML documents outweighed the technical arguments. The SGML community has over time become accustomed, or at least resigned, to this rule; those members of the HTML community who care at all about standardization and DTDs are at least aware of this restriction already, so that its inclusion in XML will not come as a total shock to them. The base political observation that they'll blame WG8, not us, may have occurred to some minds besides my own, but it was not spoken. It may not be true, in any case. Reversed the decision announced 6 Nov: XML 1.0 now prohibits overlap among enumerated attribute types declared in the same attribute-list declaration. Dissenting: Bray, Hollander, Magliery. * We also reconsidered, for the second time, our decision on the form of EMPTY elements. Initially, we had agreed (deciding question B.10 on 30 October) to allow both the form <e/> and the form <e>, restricting the latter to cases where the element was explicitly declared EMPTY but allowing <e/> whether a declaration was present or not. This was felt unsatisfactory by some members of the WG and ERB because it requires all parsers to read and parse the DTD, even if all they want is to detect element boundaries correctly in a single entity. It also leads to some unhappy choices between requiring parsers to fetch external DTD entities *before even starting to parse the document* or requiring users to include all declarations of EMPTY elements in the internal subset, which could create maintenance headaches of epic proportions. As a result, we revisited the question in early November (in the course of discussing question D.2, reported 6 November) and sought ways to allow the <e> form without effectively requiring all XML processors to read all of the DTD, all of the time, before parsing. As reported on 6 November, we agreed to restrict the use of the <e> form to a set list of known EMPTY elements, in order to allow users to create, if need be, documents which are simultaneously processable HTML and valid XML. There was some concern about singling out a single DTD for special treatment, but the importance of HTML on the Web (in the context of Documents on the Web, *are* there other DTDs?) and the fact that SGML users already have the same ability we were trying to give HTML users (i.e. the ability to create valid XML documents processable with their existing systems) outweighed those concerns. The ad hoc nature of the decision, however, continued to bother several people (as well as several members of the WG), so we took it up again. Paul Prescod suggested on 8 November that HTML's EMPTY elements could be handled by defining them (in an XML-friendly version of the HTML DTD) as empty but not EMPTY, allowing XML-aware users to write <br></br>, etc., which (a) is legal XML and (b) can be processed by existing HTML browsers. Some ERB members felt some misgivings about tagging EMPTY elements this way, but (a) it works, (b) it can't be all that bad, since it has been repeatedly suggested as an enhancement for SGML itself, (c) it works, (d) it's much less bad than the hard-coded list of elements, and (e) it works. Agreed (Bray and Sharpe dissenting) to replace the paragraph requiring XML processors to handle <e>-style tags for HTML 3.2's EMPTY elements, if they detected that they were handling HTML, and replace it with a paragraph explaining the Prescod technique of making valid XML documents which can work with HTML browsers not fitted out with <e/> handling. * Agreed (by general assent) to add a required version declaration in the form of an XML PI at the beginning of the document. The version information to be required, the character-set information optional. In the ensuing editorial work, the version, charset, and RMD information were all merged into a single 'XML declaration'. * Agreed (by general assent) to allow processing instructions in the DTD. (This may have been decided already, but it wasn't in the grammar and we voted rather than look it up. The deadline was looming pretty high at this point.) Other items remaining undiscussed and undecided were implicitly declared editorial questions for purposes of getting copy to the printer in time for SGML '96 distribution. The editors resisted the temptation to seize this opportunity to restore the DSD syntax for markup declarations. The spec has now gone to the printer; the editors would like to thank those members of the WG who sent us corrections and pointed out errors. It'll be a materially more complete, correct, and less confusing document thanks to your efforts. - C. M. Sperberg-McQueen Summary: * Removed (Paoli abstaining) the Cougar entities from XML 1.0. * Retained lt, gt, amp, quot, apos as non-redeclarable entities. * Retained ban on nondeterministic content models. * Prohibited (Bray, Hollander, Magliery dissenting) overlap among enumerated types in XML 1.0. * Dropped special handling of HTML EMPTY elements, added paragraph explaining Prescod method of making HTML valid XML (Bray, Sharpe dissenting). * Added version declaration. * Allowed processing instructions in DTD.
Received on Thursday, 14 November 1996 19:33:25 UTC