- From: Tim Bray <tbray@textuality.com>
- Date: Wed, 10 Jul 2002 18:08:39 -0400 (EDT)
- To: public-qt-comments@w3.org
This is a set of comments on the suite of XQuery-related Working Drafts. It is based on perusal of the drafts dated April 30, 2002 (Formal Semantics is March 26, Requirements of Feb 15 2001). I paid particular attention to the XML Query and Use-Cases specifications. I've given my comments numbers in case somebody wants to call any of them out. I apologize in advance for using the contraction "XQuery" throughout. Also I use "XSD" to denote W3C XML Schemas. TB 1. Maximalism The family of XML Query specification makes no visible effort to hit an 80/20 point. It is trying very hard to stake out COMPLETE solution in the XML query space, which is rather courageous given the profound lack of industry experience. The immense amount of work that has gone into this specification would have a much higher chance of a positive impact on the world if the features and functions provided in XQuery were reduced by a huge factor, cutting back at least to XPath 1.0's level of semantic richness. Furthermore, this specification's size and complexity make it inevitable that its arrival will be delayed by amounts of time that seem unreasonable to those on the outside looking in. This will cause problems because vendors who need this functionality will release software based on unstable drafts, creating a combination of conversion and interoperability problems down the road. The size and complexity also ensure that when XQuery 1.0 finally arrives, it will be well-populated with bugs, some of which will be highly injurious to interoperability. Furthermore, the immense size of the XQuery language as specified here will make implementations difficult and time-consuming. This will lead to consideration of conformance levels. Industry experience with leveled conformance, specifically in the case of SQL, has been very bad; leveled conformance leads inevitably to interoperability problems. A core mandate of the W3C is to deliver specifications that promote interoperability. The extreme size and complexity of the current XQuery drafts clearly are harmful to interoperability, for the reasons detailed above. Radical surgery should be applied to the XQuery feature set. This will lead to a higher-quality, more widely-deployed result with a substantially smaller investment of work. TB 2. Spec Suite organization There needs to be an overview somewhere, a starting point, mostly tutorial in nature, that explains the relationships between XQuery, the data model, the use cases, the functions and operators, and XPath 2. Having read all of them at least in part, I remain fairly puzzled as to how they're supposed to fit together. TB 3. Function of the "Data Model" and "Formal Semantics" It is not clear that both the Data Model and Formal Semantics specs need to exist, or that they need to have independent lives outside of the XQuery spec. In particular, I'm pretty sure that a conformant XQuery implementation could be built with little or no reference to anything but the XQuery and F&O specs, raising questions as to whether all the work on DM and FS are cost-effective. The Data Model and Formal Semantics docs are sufficiently complex and hard to understand that they don't seem to serve any tutorial purpose. At the very least, the spec suite needs to be very clear as to whether implementors need to read them (in whole or in part), and if so why. TB 4. Overlapping material There is a large amount of overlapping material in XQuery, the Data Model, the Formal Semantics, and XPath 2. This has the negative effect that it's really hard to read both XQuery and XPath and pay attention, because the attention wanders as you realize you've already read this 15-page sequence. It would be highly desirable if the material that is *not* common could be called out somehow. I as an implementor would be very interested in which bits of machinery are XQuery-only, XPath-only, or shared. Since the portions that are shared are sensibly generated from a common source, I assume that such a call-out is achievablle. I note considerable overlap also in the FS and DM specs with each other and with XQuery. The same comment applies. TB 5. Use Cases for Type-based operations XQuery defines built-in primitives which operate in terms of data types: "cast", "treat", "assert", and "validate". The volume of design that has gone into building this framework is highly out of proportion to the scenarios presented in the Use Cases document. In particular, there are no use cases for the "cast", "assert", or "validate" built-ins. Almost every other aspect of XQuery has a far richer backing in the use-case document. It is difficult to understand how the design of such a framework can proceed intelligently without use-cases in mind. The best solution to this problem would be simply to drop most of these type-based operations in the interests of getting a reasonably interoperable XQuery 1.0 done in a reasonable amount of time. TB 6. XML Schema Data Types and Duration The reliance on XML Schema basic types seems well-thought-through, although the comprehensibility and ease of implementation of XQuery would be greatly increased by dropping support for some number of XSD basic types, without, it seems, much serious loss of functionality. The use of two types derived from XSD's "Duration" type is obviously necessary, but highlights a co-ordination problem. Anybody who wants to do computation with duration-typed data is pretty clearly going to want the XQuery version, not the XSD version. Since it seems that many different activities want to use XSD basic data types, it is highly unsatisfactory that they are going to have to call out to two specifications, XSD and XQuery. As a co-ordination issue, XML Schema should be required to fix this design defect. TB 7. PIs and Comments If I read XQuery 2.1.3.2 and 2.3.1.2 correctly, XQuery includes the capability of searching on the presence of comments and on PIs and their targets. PI search capability is guaranteed to provoke controversy since there is a body of opinion that PIs are architecturally second-class citizens and anything that promotes their use should be deprecated. This should be seriously considered for removal. XQuery access to comments seems simply incorrect given that there is no assurance that they will be present in the data model even if they are in the source document, and also because it is highly architecturally unsound to encourage the use of comments for holding information of lasting interest. This should be removed without further ado. The inclusion of Comment and PI in XQuery is further evidence of lack of attention to 80/20 thinking and cost/benefit trade-offs. For similar reasons, all of section 2.8.4 (constructors for CDATA sections, PIs, and comments) should be considered for removal. TB 8. Relation to Schema Languages At the moment, by conscious design choice traceable back to the requirements documents, XQuery is quite strongly linked to W3C XML Schemas in several ways. In retrospect, this choice was unfortunate. Fortunately, the situation can be rectified at moderate cost and with considerable benefit. Reasons why the linkage to XML Schema is problematic: - XML Schema is large, complex, and buggy. The linkage greatly increases the difficulty of understanding and implementing XQuery. - XML Schema is poorly suited to the needs of certain application classes (in particular publishing applications), and there are other schema alternatives available which are much better suited. These application classes are also likely to be heavy potential users of XQuery. - XML Schema is a radical step forward in declarative constraint technology, full of design choices that are based on speculation rather than experience. It is highly unlikely that XSD will be the last word in schema technology for XML, even in those application areas in which it specializes. In particular, ISO has a serious effort underway to create standards which describe multiple XML schema languages; it would be disadvantageous if the use of these were incompatible with XQuery. Decoupling XQuery from XSD will increase survivability in the face of inevitable (and desirable) evolution in schema languages. - Every cross-specification dependency introduces potential versioning problems that will increase the complexity and difficulty of maintaining the specification suite as time goes on. To the extent that such dependencies can be reduced, the W3C and the community win. Note that in the rather old XQuery requirements doc, section 3.5.5, it says that "Schema" can mean either XML Schema or DTD. This is an admirably open viewpoint, and note that since that time, the schema universe has grown. There is one dependency from XQuery on XSD which should not be severed, the dependency on atomic data types. XQuery clearly needs such a repertory of types, and those provided by XSchema are adequate. The remainder of this note discusses the ways in which XQuery is currently linked to XSD and how they might be dealt with. Linkage: The XQuery data model is described (in part) using terms defined in XML Schema, and a specific procedure is given for constructing it using the XSD PSVI as input. Resolution: This is not a problem; the Data Model is described in enough detail that it could be generated (as the draft notes) by a relational database or a variety of other software modules, and understanding of XSD (aside from the base data types) is not required to understand the data model. The construction procedure is not really normative in terms of the operation of XQuery. No change seems required. Linkage: XQuery (sect. 3.1) provides for Schema Imports, to establish the in-scope schema environment. It is assumed that these are W3C XML Schemas. Resolution: Add a clause to production [80] to identify the schema facility in use, by namespace name or or mime-type, for example: schema "http://www.w3.org/1999/xhtml" of namespace "http://www.w3.org/2001/XMLSchema" at "http:/www.w3.org/1999/xhtml/xhtml.xsd" Linkage: XQuery provides type-based querying, where the types are those identified by QNames in the data model. Examples from XQuery 2.1.3.2: element person of type Employee attribute color of type xs:integer Resolution 1: The semantics of matching the type identified by the qname depend on the in-scope schema class as identified above. XSD matches the type if it's identical to or is a derivation of the named type; other schema languages might have a more flexible notion of type matching. Resolution 2: Adjust XQuery to say that the "of type" clause is satisfied if and only if the type given in the query is identical to that found in the data model, requiring only direct qname comparison and bypassing schema semantics. Resolution 3: Drop type-based querying in the interests of the speedier delivery of a higher-quality recommendation. Linkage: XQuery provides run-time type processing through the "treat", "assert", and "cast" built-ins. Resolution 1: The semantics of these functions depend on the class of the in-scope schema as identified above. Resolution 2: Drop these primitives from XQuery 1.0 - they have weak support in the use cases anyhow. Linkage: XQuery provides run-time validation and type-checking through the "validate" built-in. Resolution 1: The semantics of this function depend on the class of the in-scope schema as identified above. Resolution 2: Drop this primitive from XQuery 1.0 - it has weak support in the use cases anyhow. Best regards, Tim Bray
Received on Thursday, 11 July 2002 05:12:48 UTC