Comments on April XQuery drafts (long, sorry)

This is a set of comments on the suite of XQuery-related Working Drafts.
It is based on perusal of the drafts dated April 30, 2002 (Formal
Semantics is March 26, Requirements of Feb 15 2001).  I paid particular
attention to the XML Query and Use-Cases specifications. I've given my
comments numbers in case somebody wants to call any of them out.

I apologize in advance for using the contraction "XQuery" throughout.
Also I use "XSD" to denote W3C XML Schemas.

TB 1. Maximalism

The family of XML Query specification makes no visible effort to hit an
80/20 point.  It is trying very hard to stake out COMPLETE solution in
the XML query space, which is rather courageous given the profound lack
of industry experience.

The immense amount of work that has gone into this specification would
have a much higher chance of a positive impact on the world if the
features and functions provided in XQuery were reduced by a huge factor,
cutting back at least to XPath 1.0's level of semantic richness.

Furthermore, this specification's size and complexity make it inevitable
that its arrival will be delayed by amounts of time that seem
unreasonable to those on the outside looking in.  This will cause
problems because vendors who need this functionality will release
software based on unstable drafts, creating a combination of conversion
and interoperability problems down the road.

The size and complexity also ensure that when XQuery 1.0 finally
arrives, it will be well-populated with bugs, some of which will be
highly injurious to interoperability.

Furthermore, the immense size of the XQuery language as specified here
will make implementations difficult and time-consuming.  This will lead
to consideration of conformance levels.  Industry experience with
leveled conformance, specifically in the case of SQL, has been very bad;
leveled conformance leads inevitably to interoperability problems.

A core mandate of the W3C is to deliver specifications that promote
interoperability.  The extreme size and complexity of the current XQuery
drafts clearly are harmful to interoperability, for the reasons detailed
above.  Radical surgery should be applied to the XQuery feature
set. This will lead to a higher-quality, more  widely-deployed result
with a substantially smaller investment of work.

TB 2. Spec Suite organization

There needs to be an overview somewhere, a starting point, mostly
tutorial in nature, that explains the relationships between XQuery, the
data model, the use cases, the functions and operators, and XPath 2.
Having read all of them at least in part, I remain fairly puzzled as to
how they're supposed to fit together.

TB 3. Function of the "Data Model" and "Formal Semantics"

It is not clear that both the Data Model and Formal Semantics
specs need to exist, or that they need to have independent lives outside
of the XQuery spec.  In particular, I'm pretty sure that a conformant
XQuery implementation could be built with little or no reference to
anything but the XQuery and F&O specs, raising questions as to whether
all the work on DM and FS are cost-effective.

The Data Model and Formal Semantics docs are sufficiently complex and
hard to understand that they don't seem to serve any tutorial purpose.
At the very least, the spec suite needs to be very clear as to whether
implementors need to read them (in whole or in part), and if so why.

TB 4. Overlapping material

There is a large amount of overlapping material in XQuery, the Data
Model, the Formal Semantics, and XPath 2.  This has the negative effect
that it's really hard to read both XQuery and XPath and pay attention,
because the attention wanders as you realize you've already read this
15-page sequence.  It would be highly desirable if the material that is
*not* common could be called out somehow.

I as an implementor would be very interested in which bits of machinery
are XQuery-only, XPath-only, or shared.

Since the portions that are shared are sensibly generated from a common
source, I assume that such a call-out is achievablle.

I note considerable overlap also in the FS and DM specs with each other
and with XQuery.  The same comment applies.

TB 5. Use Cases for Type-based operations

XQuery defines built-in primitives which operate in terms of data types:
"cast", "treat", "assert", and "validate".  The volume of design that
has gone into building this framework is highly out of proportion to the
scenarios presented in the Use Cases document.

In particular, there are no use cases for the "cast", "assert", or
"validate" built-ins.  Almost every other aspect of XQuery has a far
richer backing in the use-case document. It is difficult to understand
how the design of such a framework can proceed intelligently without
use-cases in mind.

The best solution to this problem would be simply to drop most of these
type-based operations in the interests of getting a reasonably
interoperable XQuery 1.0 done in a reasonable amount of time.

TB 6. XML Schema Data Types and Duration

The reliance on XML Schema basic types seems well-thought-through,
although the comprehensibility and ease of implementation of XQuery
would be greatly increased by dropping support for some number of XSD
basic types, without, it seems, much serious loss of functionality.

The use of two types derived from XSD's "Duration" type is obviously
necessary, but highlights a co-ordination problem.  Anybody who wants
to do computation with duration-typed data is pretty clearly going to
want the XQuery version, not the XSD version.  Since it seems that many
different activities want to use XSD basic data types, it is highly
unsatisfactory that they are going to have to call out to two
specifications, XSD and XQuery.  As a co-ordination issue, XML Schema
should be required to fix this design defect.

TB 7. PIs and Comments

If I read XQuery 2.1.3.2 and 2.3.1.2 correctly, XQuery includes the
capability of searching on the presence of comments and on PIs and their
targets.

PI search capability is guaranteed to provoke controversy since there is
a body of opinion that PIs are architecturally second-class citizens and
anything that promotes their use should be deprecated.   This should be
seriously considered for removal.

XQuery access to comments seems simply incorrect given that there is
no assurance that they will be present in the data model even if they
are in the source document, and also because it is highly
architecturally unsound to encourage the use of comments for holding
information of lasting interest.  This should be removed without further
ado.

The inclusion of Comment and PI in XQuery is further evidence of lack of
attention to 80/20 thinking and cost/benefit trade-offs.

For similar reasons, all of section 2.8.4 (constructors for CDATA
sections, PIs, and comments) should be considered for removal.

TB 8. Relation to Schema Languages

At the moment, by conscious design choice traceable back to the
requirements documents, XQuery is quite strongly linked to W3C XML
Schemas in several ways.

In retrospect, this choice was unfortunate.  Fortunately, the situation
can be rectified at moderate cost and with considerable benefit.

Reasons why the linkage to XML Schema is problematic:

- XML Schema is large, complex, and buggy.  The linkage greatly
increases the difficulty of understanding and implementing XQuery.

- XML Schema is poorly suited to the needs of certain application
classes (in particular publishing applications), and there are other
schema alternatives available which are much better suited.  These
application classes are also likely to be heavy potential users of XQuery.

- XML Schema is a radical step forward in declarative constraint
technology, full of design choices that are based on speculation rather
than experience.  It is highly unlikely that XSD will be the last word
in schema technology for XML, even in those application areas in which
it specializes.  In particular, ISO has a serious effort underway to
create standards which describe multiple XML schema languages; it would
be disadvantageous if the use of these were incompatible with XQuery.
Decoupling XQuery from XSD will increase survivability in the face of
inevitable (and desirable) evolution in schema languages.

- Every cross-specification dependency introduces potential versioning
problems that will increase the complexity and difficulty of maintaining
the specification suite as time goes on.  To the extent that such
dependencies can be reduced, the W3C and the community win.

Note that in the rather old XQuery requirements doc, section 3.5.5, it
says that "Schema" can mean either XML Schema or DTD.  This is an
admirably open viewpoint, and note that since that time, the schema
universe has grown.

There is one dependency from XQuery on XSD which should not be severed,
the dependency on atomic data types.  XQuery clearly needs such a
repertory of types, and those provided by XSchema are adequate.

The remainder of this note discusses the ways in which XQuery is
currently linked to XSD and how they might be dealt with.

Linkage: The XQuery data model is described (in part) using terms
defined in XML Schema, and a specific procedure is given for
constructing it using the XSD PSVI as input.

Resolution: This is not a problem; the Data Model is described in enough
detail that it could be generated (as the draft notes) by a relational
database or a variety of other software modules, and understanding of
XSD (aside from the base data types) is not required to understand the
data model.  The construction procedure is not really normative in terms
of the operation of XQuery.  No change seems required.

Linkage: XQuery (sect. 3.1) provides for Schema Imports, to establish
the in-scope schema environment.  It is assumed that these are W3C
XML Schemas.

Resolution: Add a clause to production [80] to identify the schema
facility in use, by namespace name or or mime-type, for example:

    schema "http://www.w3.org/1999/xhtml"
      of namespace "http://www.w3.org/2001/XMLSchema"
      at "http:/www.w3.org/1999/xhtml/xhtml.xsd"

Linkage: XQuery provides type-based querying, where the types are those
identified by QNames in the data model.  Examples from XQuery 2.1.3.2:

    element person of type Employee
    attribute color of type xs:integer

Resolution 1: The semantics of matching the type identified by the qname
depend on the in-scope schema class as identified above.  XSD matches
the type if it's identical to or is a derivation of the named type;
other schema languages might have a more flexible notion of type matching.

Resolution 2: Adjust XQuery to say that the "of type" clause is
satisfied if and only if the type given in the query is identical to
that found in the data model, requiring only direct qname comparison
and bypassing schema semantics.

Resolution 3: Drop type-based querying in the interests of the speedier
delivery of a higher-quality recommendation.

Linkage: XQuery provides run-time type processing through the "treat",
"assert", and "cast" built-ins.

Resolution 1: The semantics of these functions depend on the class of
the in-scope schema as identified above.

Resolution 2: Drop these primitives from XQuery 1.0 - they have weak
support in the use cases anyhow.

Linkage: XQuery provides run-time validation and type-checking through
the "validate" built-in.

Resolution 1: The semantics of this function depend on the class of the
in-scope schema as identified above.

Resolution 2: Drop this primitive from XQuery 1.0 - it has weak support
in the use cases anyhow.

Best regards, Tim Bray

Received on Thursday, 11 July 2002 05:12:48 UTC