- From: Tim Bray <tbray@textuality.com>
- Date: Thu, 03 Oct 1996 17:52:56 -0700
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
A.28 Should XML use the markup-declaration syntax described by ISO 8879 clauses 10-11, or should XML define a specialized document type and let its markup declarations use the document-instance syntax, as proposed by MGML? This is a long posting in favor of using instance syntax for XML markup >declarations. 1. Terminology and Examples 2. Motivation 3. SGML Interoperability Issues 4. HTML Interoperability Issues 5. Appropriateness, Timing, and Jurisdiction Issues 1. Terminology and Examples If we use instance syntax for XML DTD's, then to avoid misunderstanding we shouldn't call them DTD's. I have been using the term "Document Structure Declaration" (DSD) since XML syntax is hardwired, and what's being declared is really structure. If we are using DSDs, and DSDs are XML documents, then clearly there must be a DSD for DSD's - I have been calling this the XML Reference DSD. I have made several documents available to help out: A draft of the XML reference DSD: http://www.textuality.com/dsd/xml-ref.dsd An SGML DTD for xml-ref.dsd: http://www.textuality.com/dsd/xml-ref.dtd A draft of a simple DSD for papers: http://www.textuality.com/dsd/paper.dsd The initial draft of the paper on this subject that I submitted to SGML'96 - accepted, but superseded by our work in this group. Only here to provide a reference point for paper.dsd: http://www.textuality.com/dsd/paper.xml I should note that xml-ref.dsd includes some significant contributions of both thought and syntax from Michael Sperberg-McQueen. Whether you want to read this stuff before or after the rest of this posting is up to you. I would advise waiting; the issue is not (supposed to be) the quality of the DSD, but the advisability of having one. 2. Motivation 2.1 Quality of Specification This is close to my heart as co-editor of the XML spec. If we use DTD syntax, the spec will have to: (a) specify the DTD syntax, (b) specify how the declarations in the DTD syntax impose constraints on instances. If we use instance syntax, we lose almost all of (a); it turns out that if you explain how recognizers for elements and attributes work, then you can do most of the job just by listing fragments from the Reference DSD and explaining the effect of the various elements and attributes. Michael Sperberg-McQueen has provided a minimalist literate-programming kind of DTD so that the spec can actually include the master copy of the reference DSD. If we use instance syntax, we will have a shorter, more elegant, more airtight, and more comprehensible spec, and something upon which the prospective XML processor author can easily get bootstrapped. 2.2 Integrity I for one, once we get XML done, plan to start browbeating the document management and authoring practitioners of the world along the lines of "now that XML exists, you have NO EXCUSE for not using descriptive markup in your structured documents!" If I convince them, and they then start using XML, and the first structured document they encounter is a DTD, with its rather ad-hoc syntax, then I've lied to them. I find this completely unacceptable. 2.3 Ease of Implementation If there is only one syntax for declaration and instance, then the prospective XML processor author only needs one lexer and one parser. Granted, the DTD language is not the hardest in the world, but two lexer/parsers are always harder than one to build. Once you've built one little perl/VB/rexx/Java thingie that can pull apart an XML instance (which we're planning to make easy), you can then pull apart XML declarations too. 2.4 Familiarity If we can go to the HTML community and tell them "not only can you now add your own structures to documents and still deliver them on the web, but here's how you do it, and it's just another bunch of tags", this removes another significant barrier to resistence. I cannot in good conscience defend to these people the proposition that to do this something straightforward, obvious, and good, they have to learn another language. 2.5 Tools If DTDs are instances, then you can use your existing SGML editors and document management systems and searchers and other value-adds to manage them - thus bringing an important class of metadata into the domain of an important class of tools. Yes, I think that formatting directives and search-metadata and entity catalogs should also be in SGML! 2.6 Other Improvements The proposed reference DSD makes it impossible to declare pernicious mixed content. Perhaps there are other things that could be wired in as well. 3. SGML Interoperability Issues There are some problems here. First of all, (a) how do we deal with the fact that SGML documents need to have a <!DOCTYPE and a subset, and if XML does too, then (b) how do you keep SGML parsers from stumbling over this weird syntax, and (c) how do you deal with the fact that you have to maintain two copies of these things? (a) XML processors must be prepared to read (and ignore) a <!DOCTYPE (b) I propose hiding the XML markup declarations from SGML processors inside a processing instruction as follows: <?XML DSD ... >. Similarly, I propose <?XML SSD for XML declaration subsets. [note: either we change PIC, or we live with a lot of > in markup declarations, or we figure out a better way to hide XML markup declarations] (c) By virtue of the ERB's resolution of 2 october, XML DSD's must be trivially & mechanically transformable into SGML DTD's. So to avoid maintaining two copies, you have an idiotic little processor (which dozens of people on this group could write by tomorrow) that (1) reads an XML instance, (2) finds the XML DSD & subset, generates an equivalent SGML DTD, (3) generates a <!DOCTYPE with a pointer to the new DTD, and a subset containing the SGML versions of any declarations in the XML subset. 4. HTML Interoperability Issues No problem here, due to a sleazy trick. The XML DSD language contains no character data anywhere - all text is in attribute values, and all elements are either EMPTY or have element content. So you can just drop the whole thing anywhere into an XML-masquerading-as-HTML document, and it will all be ignored by an HTML processor. You *may* have to remove the PIO and PIC, unless HTML decides to learn PI's. 5. Appropriateness, Scope, and Timing Issues The argument has been advanced that "this is a good idea, but we don't have enough time to do it properly, and anyhow it's a job for WG8." In fact this may be correct, I can only say "I disagree." Because: (a) I think that the simplicity and ease of understanding of the DSD option greatly increases XML's chances of acceptance. (b) I think that we need this done on an Internet timescale, not a WG8 timescale. (c) We have promised a converter to DTD notation, so if we make design errors, there is an easy escape hatch for SGML people. (d) We have at the moment a rare confluence of talent, focus, and energy, and thus a chance to make it happen, which will not be repeated. (e) It has been done before *at least* by Exoterica and Wayne Wohler and Michael Sperberg-McQueen. (f) WG8 will do a better job if they have a large-scale working experiment on the Web to learn from. (g) We don't have the time to do a proper document design for DTD's, but we don't have to: Goldfarb et al did that 10 years ago. This is the RIGHT THING TO DO. Looking back in a few years, it will be much easier to justify having made some errors in this effort, than it will be to justify having let the opportunity slip away. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-488-1167
Received on Thursday, 3 October 1996 20:49:39 UTC