- From: Norman Walsh <Norman.Walsh@Sun.COM>
- Date: Wed, 15 Jan 2003 17:39:10 -0500
- To: www-tag@w3.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Several weeks ago, I took an action item to draft up some of my thoughts about what an XML subset should look like. The TAG has discussed these ideas a couple of times and, while it's probably an overstatement to say that we have consensus, we did agreed that it was time to distribute these ideas more broadly and solicit more input. Comments, etc., most welcome, as always. Profiling XML, providing more implementation options, will necessarily increase the possibility of interoperability problems and it would be best to avoid doing so. Profiles are a bad idea on general principles and are in direct conflict with one of the original goals of XML[1]: "the number of optional features in XML is to be kept to the absolute minimum, ideally zero." Unfortunately, a number of user communities have expressed a need to work with only a subset of XML. The TAG is concerned that if these needs are not addressed quickly (and centrally), a number of slightly different XML subsets will arise and if this trend continues, the stability of XML as the basis of a whole range of technologies could be jeopardized. One way to avoid this problem is to produce a new recommendation that identifies a subset of XML for use in those environments where supporting all of XML is not practical. One obvious place where such a subset has been deployed is in SOAP[2]. SOAP forbids internal and external subsets and strongly discourages processing instructions. When asked, the XML Protocol WG listed these[3] among their reasons for subsetting: * Performance: processing internal subsets and buffer management for handling entity expansion would slow things down. * Simplicity: if an external subset is referenced, it has to be available when the parser runs (if it's available to some but not all processors, different results are possible). * Security: entity expansion introduces the possibility of DoS attacks; other security issues might arise Although it was explicitly not a goal of the XML Protocol WG to produce a subset of XML (independent of their own application needs), this seems like a good place to start. However, precisely how the subset is defined requires careful consideration as this is an exercise that should be conducted only once. The subset selected must be small enough so that no further subset will be required but also complete enough to be useful for a wide range of applications. One clear requirement of the subset is that it must exclude internal and external subsets (no <!DOCTYPE declaration is allowed). This requirement effectively removes DTDs from XML and consequently removes entities and notations. What remains are elements, attributes, namespace declarations, comments, processing instructions, and character data. While comments and processing instructions might conceivably be removed, they are sufficiently useful that we think they should remain. (Although the SOAP spec forbids senders from including processing instructions, it accepts that receivers might get them, so it's clear that removing processing instructions from the subset is not a requirement of the SOAP subset.) Some people have proposed that what is really needed is a "subset plus", that is a subset of XML with a new feature or two. The most often requested feature in this regard is support for xml:id. I feels strongly that it would be a mistake to introduce a single new feature, or a single change of any sort that would not be completely compatible with XML 1.1, in the work that subsets XML. (Support for xml:id or any other feature is an orthogonal issue and must not be conflated with the effort to define a subset, even if the subset makes a particular feature more necessary or desirable.) Along these lines, a number of people have suggested that the right approach to this problem is to define a new recommendation that combines the current suite of related recommendations (XML 1.1, XML Infoset, Namespaces in XML, and perhaps XML Base) into a single document. Tim Bray has demonstrated[4] one example of how this might appear. To the extent that this might be viewed as an editorial decision, one that may offer tangible benefits to XML users and implementors, particularly new users and implementors, but which makes no technical changes to the languages defined by (and definable by) XML, this seems not unreasonable. However, it's clear that performing this "unification" exercise on all of XML 1.1, without introducing any backwards incompatible changes, may be an extraordinarily large editorial task, out of proportion with the effort required simply to define the subset in some less invasive way. Conversely, defining only the subset in a unified document would be easier but would introduce issues of its own. Doing so would effectively split the XML specifications into two tracks: a unified "subset" track and a non-unified "full XML" track. This could be the source of considerable confusion and accidental divergence. Further consideration of these technical and editorial issues, and the eventual creation of a new recommendation, would seem to be within the scope of the XML Core WG's charter[5] which reads, in part, "[the] WG will also study the advisability of a version 2.0 of the XML specification and may undertake the preparation of such a specification, if deemed advisable." In short, it appears that a new recommendation-track document that defines a subset of XML 1.1 should be developed: * The subset must be backwards compatible with XML 1.1. * The subset must define a language that excludes DTD declarations How the new recommendation is constructed we leave to the editorial discretion of the group that undertakes it. [1] http://www.w3.org/TR/REC-xml#sec-origin-goals [2] http://www.w3.org/2000/xp/Group/2/11/08/soap12-part1.html#soapenv [3] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0119.html [4] http://www.textuality.com/xml/xmlSW.html [5] http://www.w3.org/2001/12/xmlbp/xml-core-wg-charter.html#deliverables Be seeing you, norm - -- Norman.Walsh@Sun.COM | There is a road from the eye to the heart XML Standards Architect | that does not go through the intellect.--G. Web Tech. and Standards | K. Chesterton Sun Microsystems, Inc. | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/> iD8DBQE+JeMOOyltUcwYWjsRAmsuAJ49pGFH6nPSmZvEXQNrVZGr37plygCeOUJD zHXfJSzY/7YujRkXD5o+yzo= =3UHP -----END PGP SIGNATURE-----
Received on Wednesday, 15 January 2003 17:40:51 UTC