- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Thu, 17 Oct 2002 04:13:07 +0200
- To: Tim Bray <tbray@textuality.com>, public-qt-comments@w3.org
Tim Bray wrote: > For brevity I'll refer to this as "Ashok's message" even though I > understand it's a group production. > > The problem: Ashok's message is non-responsive to the point that it > would not be remotely acceptable as part of a CR-stage "resolution of > comments". Paul Cotton suggested that it might be appropriate, given the difficulty of putting together an agreed WG response to all your comments, for individual WG members to give their own perspective on the issues you raised. So here is my attempt to take up the challenge. TB 1. Maximalism The family of XML Query specification makes no visible effort to hit an 80/20 point. It is trying very hard to stake out COMPLETE solution in the XML query space, which is rather courageous given the profound lack of industry experience. The immense amount of work that has gone into this specification would have a much higher chance of a positive impact on the world if the features and functions provided in XQuery were reduced by a huge factor, cutting back at least to XPath 1.0's level of semantic richness. MK: This is easy to say, and it's even easy to agree with, but it's much harder to do anything about. Most of us would agree that there are features in XQuery that we would be happy to leave out, but I doubt that there are many features that a majority of the group would want to leave out. Short of some draconian changes to the voting rules, like requiring a 70% majority to put something in and only 30% to take something out, it's hard to see how to get round this problem. It's not true that there's a lack of industry experience. Database technology is mature and well understood, and the requirements on query languages, both from a user perspective and an implementation perspective, are well established. The user community is sufficiently mature that a minimal solution without the features they expect in database languages would not be well received. Many of the companies participating in the exercise, including my own, have been in the database business for many years and cannot be accused of not understanding the user requirements or the technology. OK: applying these ideas to XML is relatively new, but my company has had an XML database in the field for 3 years and our users are not slow to tell us about missing features. TB: Furthermore, this specification's size and complexity make it inevitable that its arrival will be delayed by amounts of time that seem unreasonable to those on the outside looking in. This will cause problems because vendors who need this functionality will release software based on unstable drafts, creating a combination of conversion and interoperability problems down the road. MK: yes, this is certainly a problem. Software AG is planning to release an implementation this year because we can't afford to wait any longer. Unfortunately, it's difficult to show that a radical change in approach is likely to lead to faster delivery. TB: The size and complexity also ensure that when XQuery 1.0 finally arrives, it will be well-populated with bugs, some of which will be highly injurious to interoperability. MK: that's certainly a risk. But part of the reason it is taking so long is that we are being very thorough. TB: Furthermore, the immense size of the XQuery language as specified here will make implementations difficult and time-consuming. This will lead to consideration of conformance levels. Industry experience with leveled conformance, specifically in the case of SQL, has been very bad; leveled conformance leads inevitably to interoperability problems. MK: In comparison with database query languages delivered in existing popular database products, XQuery cannot possibly be described as having "immense size". This assertion is without foundation. TB: A core mandate of the W3C is to deliver specifications that promote interoperability. The extreme size and complexity of the current XQuery drafts clearly are harmful to interoperability, for the reasons detailed above. Radical surgery should be applied to the XQuery feature set. This will lead to a higher-quality, more widely-deployed result with a substantially smaller investment of work. MK: Remove a feature from XQuery, and vendors will invent a proprietary replacement for the missing functionality. That can hardly be said to improve interoperability. TB 2. Spec Suite organization There needs to be an overview somewhere, a starting point, mostly tutorial in nature, that explains the relationships between XQuery, the data model, the use cases, the functions and operators, and XPath 2. Having read all of them at least in part, I remain fairly puzzled as to how they're supposed to fit together. MK: I agree with you entirely that the document set is poorly structured. It is designed for the convenience of the authors, not of the readers. This is a difficult problem to fix, because it isn't always possible to allocate work to editors in an optimum way, but I personally think we should try. TB 3. Function of the "Data Model" and "Formal Semantics" It is not clear that both the Data Model and Formal Semantics specs need to exist, or that they need to have independent lives outside of the XQuery spec. In particular, I'm pretty sure that a conformant XQuery implementation could be built with little or no reference to anything but the XQuery and F&O specs, raising questions as to whether all the work on DM and FS are cost-effective. MK: I agree with you on this point too. The formal semantics, and the more formal parts of the data model definition, have been very useful to the working group as vehicles for testing and formalizing our ideas, but I do not personally think they are a good way of publishing normative specification material. Of course, there are others on the group who have made an immense contribution to these documents, which represent a significant intellectual achievement, and who would understandably disagree with me. TB: The Data Model and Formal Semantics docs are sufficiently complex and hard to understand that they don't seem to serve any tutorial purpose. At the very least, the spec suite needs to be very clear as to whether implementors need to read them (in whole or in part), and if so why. MK: yes. TB 4. Overlapping material There is a large amount of overlapping material in XQuery, the Data Model, the Formal Semantics, and XPath 2. This has the negative effect that it's really hard to read both XQuery and XPath and pay attention, because the attention wanders as you realize you've already read this 15-page sequence. It would be highly desirable if the material that is *not* common could be called out somehow. I as an implementor would be very interested in which bits of machinery are XQuery-only, XPath-only, or shared. Since the portions that are shared are sensibly generated from a common source, I assume that such a call-out is achievablle. MK: for internal use, we publish a combined spec in which the XPath and XQuery parts are highlighted. I think we should review whether it would be useful to publish this externally. TB: I note considerable overlap also in the FS and DM specs with each other and with XQuery. The same comment applies. MK: Here the solution is less easy, but I agree entirely with the aim. TB 5. Use Cases for Type-based operations XQuery defines built-in primitives which operate in terms of data types: "cast", "treat", "assert", and "validate". The volume of design that has gone into building this framework is highly out of proportion to the scenarios presented in the Use Cases document. In particular, there are no use cases for the "cast", "assert", or "validate" built-ins. Almost every other aspect of XQuery has a far richer backing in the use-case document. It is difficult to understand how the design of such a framework can proceed intelligently without use-cases in mind. The best solution to this problem would be simply to drop most of these type-based operations in the interests of getting a reasonably interoperable XQuery 1.0 done in a reasonable amount of time. MK: I personally have some sympathy with the view that the type machinery in XQuery is over-engineered. We have the benefit of having some excellent type theorists on the working group, and it is very hard for those of us who don't fall into this category to tell when they are solving real problems and when they are building castles in Spain. All I know is that we couldn't do the job without them. I also know that a good solid type system is absolutely crucial to a database query engine. I would love it to stay good and solid but to become far simpler, and if anyone can show how to achieve that I will buy them several pints. Part of the complexity, of course, derives directly from XML Schema. If XML Schema were much simpler, XQuery could also be much simpler. Some of us have argued that there are features in XML Schema we simply shouldn't support (for example, anonymous types), but this always gets a response from vendors that their users are already making heavy use of these features and we can't ask users to rewrite their schemas. TB 6. XML Schema Data Types and Duration The reliance on XML Schema basic types seems well-thought-through, although the comprehensibility and ease of implementation of XQuery would be greatly increased by dropping support for some number of XSD basic types, without, it seems, much serious loss of functionality. MK: I think we've got the balance roughly right on this. Our support for the lesser-used of the 19 primitive types is absolutely minimal, and withdrawing this support would remove about six paragraphs from the specs, which hardly seems worth the trouble. TB: The use of two types derived from XSD's "Duration" type is obviously necessary, but highlights a co-ordination problem. Anybody who wants to do computation with duration-typed data is pretty clearly going to want the XQuery version, not the XSD version. Since it seems that many different activities want to use XSD basic data types, it is highly unsatisfactory that they are going to have to call out to two specifications, XSD and XQuery. As a co-ordination issue, XML Schema should be required to fix this design defect. MK: We have been working closely with XML Schema on this. I don't think there is anything we could be doing that we aren't doing. TB 7. PIs and Comments If I read XQuery 2.1.3.2 and 2.3.1.2 correctly, XQuery includes the capability of searching on the presence of comments and on PIs and their targets. PI search capability is guaranteed to provoke controversy since there is a body of opinion that PIs are architecturally second-class citizens and anything that promotes their use should be deprecated. This should be seriously considered for removal. XQuery access to comments seems simply incorrect given that there is no assurance that they will be present in the data model even if they are in the source document, and also because it is highly architecturally unsound to encourage the use of comments for holding information of lasting interest. This should be removed without further ado. The inclusion of Comment and PI in XQuery is further evidence of lack of attention to 80/20 thinking and cost/benefit trade-offs. MK: I disagree with you on this. As an XML database vendor, we know that one of the things our users complain about is that the documents coming out of the database aren't the same as the ones they put in, for example, entities and CDATA sections are lost. Losing comments and PIs as well would certainly be unpopular. It would also create further problems with XPath 1.0 compatibility. TB: For similar reasons, all of section 2.8.4 (constructors for CDATA sections, PIs, and comments) should be considered for removal. Again, I disagree. There are target document formats that require these features to be present. We need to revise the CDATA stuff because CDATA sections are not in the model (that's a known issue), but we should otherwise support the full data model, including, in my view, additional XML quirks such as unparsed entities - whenever I assert that no-one uses them, someone proves me wrong. TB 8. Relation to Schema Languages At the moment, by conscious design choice traceable back to the requirements documents, XQuery is quite strongly linked to W3C XML Schemas in several ways. In retrospect, this choice was unfortunate. Fortunately, the situation can be rectified at moderate cost and with considerable benefit. MK: I agree with you that XML Schema is horribly over-complex. I don't agree that we can manage without it. There is no easy solution to this problem. TB: Reasons why the linkage to XML Schema is problematic: - XML Schema is large, complex, and buggy. The linkage greatly increases the difficulty of understanding and implementing XQuery. - XML Schema is poorly suited to the needs of certain application classes (in particular publishing applications), and there are other schema alternatives available which are much better suited. These application classes are also likely to be heavy potential users of XQuery. - XML Schema is a radical step forward in declarative constraint technology, full of design choices that are based on speculation rather than experience. It is highly unlikely that XSD will be the last word in schema technology for XML, even in those application areas in which it specializes. In particular, ISO has a serious effort underway to create standards which describe multiple XML schema languages; it would be disadvantageous if the use of these were incompatible with XQuery. Decoupling XQuery from XSD will increase survivability in the face of inevitable (and desirable) evolution in schema languages. - Every cross-specification dependency introduces potential versioning problems that will increase the complexity and difficulty of maintaining the specification suite as time goes on. To the extent that such dependencies can be reduced, the W3C and the community win. MK: You can't design a query language without a definition of the data model that it is designed to support. I don't believe that a typeless data model would be viable either for implementors or users. The only typed model in town is XML Schema. We don't like it, but we're stuck with it. TB: Note that in the rather old XQuery requirements doc, section 3.5.5, it says that "Schema" can mean either XML Schema or DTD. This is an admirably open viewpoint, and note that since that time, the schema universe has grown. There is one dependency from XQuery on XSD which should not be severed, the dependency on atomic data types. XQuery clearly needs such a repertory of types, and those provided by XSchema are adequate. TB: The remainder of this note discusses the ways in which XQuery is currently linked to XSD and how they might be dealt with. Linkage: The XQuery data model is described (in part) using terms defined in XML Schema, and a specific procedure is given for constructing it using the XSD PSVI as input. Resolution: This is not a problem; the Data Model is described in enough detail that it could be generated (as the draft notes) by a relational database or a variety of other software modules, and understanding of XSD (aside from the base data types) is not required to understand the data model. The construction procedure is not really normative in terms of the operation of XQuery. No change seems required. Linkage: XQuery (sect. 3.1) provides for Schema Imports, to establish the in-scope schema environment. It is assumed that these are W3C XML Schemas. Resolution: Add a clause to production [80] to identify the schema facility in use, by namespace name or or mime-type, for example: schema "http://www.w3.org/1999/xhtml" of namespace "http://www.w3.org/2001/XMLSchema" at "http:/www.w3.org/1999/xhtml/xhtml.xsd" MK: I can't see how supporting multiple schema languages can possibly be seen as a reduction in complexity. I don't know of any query language that has ever been designed with this kind of data-model-independence. We really would be researching new ground. I think this is an absolute non-starter. Linkage: XQuery provides type-based querying, where the types are those identified by QNames in the data model. Examples from XQuery 2.1.3.2: element person of type Employee attribute color of type xs:integer Resolution 1: The semantics of matching the type identified by the qname depend on the in-scope schema class as identified above. XSD matches the type if it's identical to or is a derivation of the named type; other schema languages might have a more flexible notion of type matching. MK: they might indeed. We could just say that the rules for type matching are defined in the schema language, and say no more. We are actually quite close to that, and moving further in that direction. Resolution 2: Adjust XQuery to say that the "of type" clause is satisfied if and only if the type given in the query is identical to that found in the data model, requiring only direct qname comparison and bypassing schema semantics. MK: we are doing that. Resolution 3: Drop type-based querying in the interests of the speedier delivery of a higher-quality recommendation. MK: I would personally buy that, but 75% of the WG members would howl at the suggestion. There are some features you can't leave out of v1 in the hope of adding them later. Linkage: XQuery provides run-time type processing through the "treat", "assert", and "cast" built-ins. Resolution 1: The semantics of these functions depend on the class of the in-scope schema as identified above. MK: actually treat and assert are largely compile-time, and assert has since been refactored. Cast works on the simple types, which you want to retain. We are all striving to find simplifications to these constructs, and have made some progress, but there's no magic wand. Resolution 2: Drop these primitives from XQuery 1.0 - they have weak support in the use cases anyhow. Linkage: XQuery provides run-time validation and type-checking through the "validate" built-in. Resolution 1: The semantics of this function depend on the class of the in-scope schema as identified above. Resolution 2: Drop this primitive from XQuery 1.0 - it has weak support in the use cases anyhow. MK: I agree with your aims here. My colleagues on the WG know that I have made many attempts to achieve simplification in these areas. But the devil is in the detail: most proposals to take things out end up leaving the language broken. Michael Kay Software AG
Received on Wednesday, 16 October 2002 22:13:20 UTC