RE: Comments on April XQuery drafts (long, sorry)

Thanks for this contribution, Tim. It echoes what a number of people are
saying inside the WG. I hope we'll give it the attention it deserves.

It's not going to be easy to cut back: when 20% of a group really want a
feature and the other 80% of the group consider it superfluous but harmless,
the tendency is to put it in (the gHorribleKludge syndrome). And the large
number of documents, I think, is a consequence of the large number of
creative and talented individuals who want to contribute to this effort:
it's hard to do anything about the root cause of this particular problem!

The only part of your comments where I tend to disagree with you is TB 7.
Wherever there is something in XML that can't be retained through an XSLT
transformation (DOCTYPE, CDATA sections, entity references), we get user
complaints. If we left out more (comments, PIs) we would get more
complaints. I don't think it would be right for XQuery to choose a smaller
subset of XML to include in its data model than the subset supported by
XSLT, XPath, and the InfoSet. In fact, I thought the whole aim of the
InfoSet concept was that we should all support the same definition as to
which constructs in an XML document are information-bearing and which are
not.

Michael Kay 
(personal response)

> -----Original Message-----
> From: Tim Bray [mailto:tbray@textuality.com] 
> Sent: 10 July 2002 23:09
> To: public-qt-comments@w3.org
> Subject: Comments on April XQuery drafts (long, sorry)
> 
> 
> 
> 
> 
> 
> This is a set of comments on the suite of XQuery-related 
> Working Drafts. It is based on perusal of the drafts dated 
> April 30, 2002 (Formal Semantics is March 26, Requirements of 
> Feb 15 2001).  I paid particular attention to the XML Query 
> and Use-Cases specifications. I've given my comments numbers 
> in case somebody wants to call any of them out.
> 
> I apologize in advance for using the contraction "XQuery" 
> throughout. Also I use "XSD" to denote W3C XML Schemas.
> 
> TB 1. Maximalism
> 
> The family of XML Query specification makes no visible effort 
> to hit an 80/20 point.  It is trying very hard to stake out 
> COMPLETE solution in the XML query space, which is rather 
> courageous given the profound lack of industry experience.
> 
> The immense amount of work that has gone into this 
> specification would have a much higher chance of a positive 
> impact on the world if the features and functions provided in 
> XQuery were reduced by a huge factor, cutting back at least 
> to XPath 1.0's level of semantic richness.
> 
> Furthermore, this specification's size and complexity make it 
> inevitable that its arrival will be delayed by amounts of 
> time that seem unreasonable to those on the outside looking 
> in.  This will cause problems because vendors who need this 
> functionality will release software based on unstable drafts, 
> creating a combination of conversion and interoperability 
> problems down the road.
> 
> The size and complexity also ensure that when XQuery 1.0 
> finally arrives, it will be well-populated with bugs, some of 
> which will be highly injurious to interoperability.
> 
> Furthermore, the immense size of the XQuery language as 
> specified here will make implementations difficult and 
> time-consuming.  This will lead to consideration of 
> conformance levels.  Industry experience with leveled 
> conformance, specifically in the case of SQL, has been very 
> bad; leveled conformance leads inevitably to interoperability 
> problems.
> 
> A core mandate of the W3C is to deliver specifications that 
> promote interoperability.  The extreme size and complexity of 
> the current XQuery drafts clearly are harmful to 
> interoperability, for the reasons detailed above.  Radical 
> surgery should be applied to the XQuery feature set. This 
> will lead to a higher-quality, more  widely-deployed result 
> with a substantially smaller investment of work.
> 
> TB 2. Spec Suite organization
> 
> There needs to be an overview somewhere, a starting point, 
> mostly tutorial in nature, that explains the relationships 
> between XQuery, the data model, the use cases, the functions 
> and operators, and XPath 2. Having read all of them at least 
> in part, I remain fairly puzzled as to how they're supposed 
> to fit together.
> 
> TB 3. Function of the "Data Model" and "Formal Semantics"
> 
> It is not clear that both the Data Model and Formal Semantics 
> specs need to exist, or that they need to have independent 
> lives outside of the XQuery spec.  In particular, I'm pretty 
> sure that a conformant XQuery implementation could be built 
> with little or no reference to anything but the XQuery and 
> F&O specs, raising questions as to whether all the work on DM 
> and FS are cost-effective.
> 
> The Data Model and Formal Semantics docs are sufficiently 
> complex and hard to understand that they don't seem to serve 
> any tutorial purpose. At the very least, the spec suite needs 
> to be very clear as to whether implementors need to read them 
> (in whole or in part), and if so why.
> 
> TB 4. Overlapping material
> 
> There is a large amount of overlapping material in XQuery, 
> the Data Model, the Formal Semantics, and XPath 2.  This has 
> the negative effect that it's really hard to read both XQuery 
> and XPath and pay attention, because the attention wanders as 
> you realize you've already read this 15-page sequence.  It 
> would be highly desirable if the material that is
> *not* common could be called out somehow.
> 
> I as an implementor would be very interested in which bits of 
> machinery are XQuery-only, XPath-only, or shared.
> 
> Since the portions that are shared are sensibly generated 
> from a common source, I assume that such a call-out is achievablle.
> 
> I note considerable overlap also in the FS and DM specs with 
> each other and with XQuery.  The same comment applies.
> 
> TB 5. Use Cases for Type-based operations
> 
> XQuery defines built-in primitives which operate in terms of 
> data types: "cast", "treat", "assert", and "validate".  The 
> volume of design that has gone into building this framework 
> is highly out of proportion to the scenarios presented in the 
> Use Cases document.
> 
> In particular, there are no use cases for the "cast", 
> "assert", or "validate" built-ins.  Almost every other aspect 
> of XQuery has a far richer backing in the use-case document. 
> It is difficult to understand how the design of such a 
> framework can proceed intelligently without use-cases in mind.
> 
> The best solution to this problem would be simply to drop 
> most of these type-based operations in the interests of 
> getting a reasonably interoperable XQuery 1.0 done in a 
> reasonable amount of time.
> 
> TB 6. XML Schema Data Types and Duration
> 
> The reliance on XML Schema basic types seems 
> well-thought-through, although the comprehensibility and ease 
> of implementation of XQuery would be greatly increased by 
> dropping support for some number of XSD basic types, without, 
> it seems, much serious loss of functionality.
> 
> The use of two types derived from XSD's "Duration" type is 
> obviously necessary, but highlights a co-ordination problem.  
> Anybody who wants to do computation with duration-typed data 
> is pretty clearly going to want the XQuery version, not the 
> XSD version.  Since it seems that many different activities 
> want to use XSD basic data types, it is highly unsatisfactory 
> that they are going to have to call out to two 
> specifications, XSD and XQuery.  As a co-ordination issue, 
> XML Schema should be required to fix this design defect.
> 
> TB 7. PIs and Comments
> 
> If I read XQuery 2.1.3.2 and 2.3.1.2 correctly, XQuery 
> includes the capability of searching on the presence of 
> comments and on PIs and their targets.
> 
> PI search capability is guaranteed to provoke controversy 
> since there is a body of opinion that PIs are architecturally 
> second-class citizens and
> anything that promotes their use should be deprecated.   This 
> should be
> seriously considered for removal.
> 
> XQuery access to comments seems simply incorrect given that 
> there is no assurance that they will be present in the data 
> model even if they are in the source document, and also 
> because it is highly architecturally unsound to encourage the 
> use of comments for holding information of lasting interest.  
> This should be removed without further ado.
> 
> The inclusion of Comment and PI in XQuery is further evidence 
> of lack of attention to 80/20 thinking and cost/benefit trade-offs.
> 
> For similar reasons, all of section 2.8.4 (constructors for 
> CDATA sections, PIs, and comments) should be considered for removal.
> 
> TB 8. Relation to Schema Languages
> 
> At the moment, by conscious design choice traceable back to 
> the requirements documents, XQuery is quite strongly linked 
> to W3C XML Schemas in several ways.
> 
> In retrospect, this choice was unfortunate.  Fortunately, the 
> situation can be rectified at moderate cost and with 
> considerable benefit.
> 
> Reasons why the linkage to XML Schema is problematic:
> 
> - XML Schema is large, complex, and buggy.  The linkage 
> greatly increases the difficulty of understanding and 
> implementing XQuery.
> 
> - XML Schema is poorly suited to the needs of certain 
> application classes (in particular publishing applications), 
> and there are other schema alternatives available which are 
> much better suited.  These application classes are also 
> likely to be heavy potential users of XQuery.
> 
> - XML Schema is a radical step forward in declarative 
> constraint technology, full of design choices that are based 
> on speculation rather than experience.  It is highly unlikely 
> that XSD will be the last word in schema technology for XML, 
> even in those application areas in which it specializes.  In 
> particular, ISO has a serious effort underway to create 
> standards which describe multiple XML schema languages; it 
> would be disadvantageous if the use of these were 
> incompatible with XQuery. Decoupling XQuery from XSD will 
> increase survivability in the face of inevitable (and 
> desirable) evolution in schema languages.
> 
> - Every cross-specification dependency introduces potential 
> versioning problems that will increase the complexity and 
> difficulty of maintaining the specification suite as time 
> goes on.  To the extent that such dependencies can be 
> reduced, the W3C and the community win.
> 
> Note that in the rather old XQuery requirements doc, section 
> 3.5.5, it says that "Schema" can mean either XML Schema or 
> DTD.  This is an admirably open viewpoint, and note that 
> since that time, the schema universe has grown.
> 
> There is one dependency from XQuery on XSD which should not 
> be severed, the dependency on atomic data types.  XQuery 
> clearly needs such a repertory of types, and those provided 
> by XSchema are adequate.
> 
> The remainder of this note discusses the ways in which XQuery 
> is currently linked to XSD and how they might be dealt with.
> 
> Linkage: The XQuery data model is described (in part) using 
> terms defined in XML Schema, and a specific procedure is 
> given for constructing it using the XSD PSVI as input.
> 
> Resolution: This is not a problem; the Data Model is 
> described in enough detail that it could be generated (as the 
> draft notes) by a relational database or a variety of other 
> software modules, and understanding of XSD (aside from the 
> base data types) is not required to understand the data 
> model.  The construction procedure is not really normative in 
> terms of the operation of XQuery.  No change seems required.
> 
> Linkage: XQuery (sect. 3.1) provides for Schema Imports, to 
> establish the in-scope schema environment.  It is assumed 
> that these are W3C XML Schemas.
> 
> Resolution: Add a clause to production [80] to identify the 
> schema facility in use, by namespace name or or mime-type, 
> for example:
> 
>     schema "http://www.w3.org/1999/xhtml"
>       of namespace "http://www.w3.org/2001/XMLSchema"
>       at "http:/www.w3.org/1999/xhtml/xhtml.xsd"
> 
> Linkage: XQuery provides type-based querying, where the types 
> are those identified by QNames in the data model.  Examples 
> from XQuery 2.1.3.2:
> 
>     element person of type Employee
>     attribute color of type xs:integer
> 
> Resolution 1: The semantics of matching the type identified 
> by the qname depend on the in-scope schema class as 
> identified above.  XSD matches the type if it's identical to 
> or is a derivation of the named type; other schema languages 
> might have a more flexible notion of type matching.
> 
> Resolution 2: Adjust XQuery to say that the "of type" clause 
> is satisfied if and only if the type given in the query is 
> identical to that found in the data model, requiring only 
> direct qname comparison and bypassing schema semantics.
> 
> Resolution 3: Drop type-based querying in the interests of 
> the speedier delivery of a higher-quality recommendation.
> 
> Linkage: XQuery provides run-time type processing through the 
> "treat", "assert", and "cast" built-ins.
> 
> Resolution 1: The semantics of these functions depend on the 
> class of the in-scope schema as identified above.
> 
> Resolution 2: Drop these primitives from XQuery 1.0 - they 
> have weak support in the use cases anyhow.
> 
> Linkage: XQuery provides run-time validation and 
> type-checking through the "validate" built-in.
> 
> Resolution 1: The semantics of this function depend on the 
> class of the in-scope schema as identified above.
> 
> Resolution 2: Drop this primitive from XQuery 1.0 - it has 
> weak support in the use cases anyhow.
> 
> Best regards, Tim Bray
> 
> 

Received on Thursday, 11 July 2002 08:51:29 UTC