- From: Ian B. Jacobs <ij@w3.org>
- Date: Thu, 30 Jan 2003 16:45:53 -0500
- To: Liam Quin <liam@w3.org>
- CC: Paul Grosso <pgrosso@arbortext.com>, www-tag@w3.org, Michael Sperberg-McQueen <cmsmcq@w3.org>
Liam, This email concerns TAG issue xmlProfiles-29 [0]: "When, whither and how to profile W3C specifications in the XML Family" Profiling XML, providing more implementation options, will necessarily increase the possibility of interoperability problems and it would be best to avoid doing so. Profiles are a bad idea on general principles and are in direct conflict with one of the original goals of XML[1]: "the number of optional features in XML is to be kept to the absolute minimum, ideally zero." Unfortunately, a number of user communities have expressed a need to work with only a subset of XML. The TAG is concerned that if these needs are not addressed quickly (and centrally), a number of slightly different XML subsets will arise and if this trend continues, the stability of XML as the basis of a whole range of technologies could be jeopardized. One way to avoid this problem is to produce a new Recommendation that identifies a subset of XML for use in those environments where supporting all of XML is not practical. One obvious place where such a subset has been deployed is in SOAP[2]. SOAP forbids internal and external subsets and strongly discourages processing instructions. When asked, the XML Protocol WG listed these[3] among their reasons for subsetting: * Performance: processing internal subsets and buffer management for handling entity expansion would slow things down. * Simplicity: if an external subset is referenced, it has to be available when the parser runs (if it's available to some but not all processors, different results are possible). * Security: entity expansion introduces the possibility of denial of service (DoS) attacks; other security issues might arise. Although it was explicitly not a goal of the XML Protocol WG to produce a subset of XML (independent of their own application needs), this seems like a good place to start. However, precisely how the subset is defined requires careful consideration as this is an exercise that should be conducted only once. The subset selected must be small enough so that no further subset will be required but also complete enough to be useful for a wide range of applications. One clear requirement of the subset is that it must exclude internal and external subsets (no <!DOCTYPE declaration is allowed). This requirement effectively removes DTDs from XML and consequently removes entities and notations. What remains are elements, attributes, namespace declarations, comments, processing instructions, and character data. While comments and processing instructions might conceivably be removed, they are sufficiently useful that we think they should remain. Some people have proposed that what is really needed is a "subset plus," that is a subset of XML with a new feature or two. The most often requested feature in this regard is support for xml:id. Others feel that it would be a mistake to design a "subset plus" with any new feature incompatible with XML 1.1 (at this time a W3C Candidate Recommendation). The TAG has not yet reached consensus on how an XML subset Recommendation should address the question of ids. The TAG expects to address this issue separately: Issue xmlIDSemantics-32: How should the problem of identifying ID semantics in XML languages be addressed in the absence of a DTD? [6] A number of people have suggested that the right approach to this problem is to define a new Recommendation that combines the current suite of related recommendations (XML 1.1, XML Infoset, Namespaces in XML, and perhaps XML Base) into a single document. Tim Bray has demonstrated[4] one example of how this might appear. To the extent that this might be viewed as an editorial decision, one that may offer tangible benefits to XML users and implementors, particularly new users and implementors, but which makes no technical changes to the languages defined by (and definable by) XML, this seems not unreasonable. However, it's clear that performing this "unification" exercise on all of XML 1.1, without introducing any backwards incompatible changes, may be an extraordinarily large editorial task, out of proportion with the effort required simply to define the subset in some less invasive way. Conversely, defining only the subset in a unified document would be easier but would introduce issues of its own. Doing so would effectively split the XML specifications into two tracks: a unified "subset" track and a non-unified "full XML" track. This could be the source of considerable confusion and accidental divergence. Further consideration of these technical and editorial issues, and the eventual creation of a new Recommendation, would seem to be within the scope of the XML Core WG's charter[5] which reads, in part, "[the] WG will also study the advisability of a version 2.0 of the XML specification and may undertake the preparation of such a specification, if deemed advisable." In short, it appears that a new Recommendation-track document that defines a subset of XML 1.1 should be developed: * The subset must be backwards compatible with XML 1.1. * The subset must define a language that excludes DTD declarations. How the new Recommendation is constructed we leave to the editorial discretion of the group that undertakes it. Thank you, - Ian Jacobs, for Norm Walsh, author of this summary, and Stuart Williams and Tim Berners-Lee, TAG co-Chairs [0] http://www.w3.org/2001/tag/ilist#xmlProfiles-29 [1] http://www.w3.org/TR/REC-xml#sec-origin-goals [2] http://www.w3.org/2000/xp/Group/2/11/08/soap12-part1#soapenv [3] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0119 [4] http://www.textuality.com/xml/xmlSW [5] http://www.w3.org/2001/12/xmlbp/xml-core-wg-charter#deliverables [6] http://www.w3.org/2001/tag/ilist#xmlIDSemantics-32 -- Ian Jacobs (ij@w3.org) http://www.w3.org/People/Jacobs Tel: +1 718 260-9447
Received on Thursday, 30 January 2003 16:45:58 UTC