RE: XSchema integration, responsiveness, and a good solution to the problem from Kay, Michael on 2002-10-17 (public-qt-comments@w3.org from October 2002)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Thu, 17 Oct 2002 04:13:07 +0200
To: Tim Bray <tbray@textuality.com>, public-qt-comments@w3.org
Message-ID: <DFF2AC9E3583D511A21F0008C7E621060453DC8E@daemsg02.software-ag.de>
Tim Bray wrote:

> For brevity I'll refer to this as "Ashok's message" even though I 
> understand it's a group production.
> 
> The problem: Ashok's message is non-responsive to the point that it 
> would not be remotely acceptable as part of a CR-stage "resolution of 
> comments".

Paul Cotton suggested that it might be appropriate, given the difficulty of
putting together an agreed WG response to all your comments, for individual
WG members to give their own perspective on the issues you raised. So here
is my attempt to take up the challenge.

TB 1. Maximalism

The family of XML Query specification makes no visible effort to hit an
80/20 point.  It is trying very hard to stake out COMPLETE solution in the
XML query space, which is rather courageous given the profound lack of
industry experience.

The immense amount of work that has gone into this specification would have
a much higher chance of a positive impact on the world if the features and
functions provided in XQuery were reduced by a huge factor, cutting back at
least to XPath 1.0's level of semantic richness.

MK: This is easy to say, and it's even easy to agree with, but it's much
harder to do anything about. Most of us would agree that there are features
in XQuery that we would be happy to leave out, but I doubt that there are
many features that a majority of the group would want to leave out. Short of
some draconian changes to the voting rules, like requiring a 70% majority to
put something in and only 30% to take something out, it's hard to see how to
get round this problem.

It's not true that there's a lack of industry experience. Database
technology is mature and well understood, and the requirements on query
languages, both from a user perspective and an implementation perspective,
are well established. The user community is sufficiently mature that a
minimal solution without the features they expect in database languages
would not be well received. Many of the companies participating in the
exercise, including my own, have been in the database business for many
years and cannot be accused of not understanding the user requirements or
the technology. OK: applying these ideas to XML is relatively new, but my
company has had an XML database in the field for 3 years and our users are
not slow to tell us about missing features.

TB:
Furthermore, this specification's size and complexity make it inevitable
that its arrival will be delayed by amounts of time that seem unreasonable
to those on the outside looking in.  This will cause problems because
vendors who need this functionality will release software based on unstable
drafts, creating a combination of conversion and interoperability problems
down the road.

MK: yes, this is certainly a problem. Software AG is planning to release an
implementation this year because we can't afford to wait any longer.
Unfortunately, it's difficult to show that a radical change in approach is
likely to lead to faster delivery.

TB:
The size and complexity also ensure that when XQuery 1.0 finally arrives, it
will be well-populated with bugs, some of which will be highly injurious to
interoperability.

MK: that's certainly a risk. But part of the reason it is taking so long is
that we are being very thorough.

TB: Furthermore, the immense size of the XQuery language as specified here
will make implementations difficult and time-consuming.  This will lead to
consideration of conformance levels.  Industry experience with leveled
conformance, specifically in the case of SQL, has been very bad; leveled
conformance leads inevitably to interoperability problems.

MK: In comparison with database query languages delivered in existing
popular database products, XQuery cannot possibly be described as having
"immense size". This assertion is without foundation.

TB:
A core mandate of the W3C is to deliver specifications that promote
interoperability.  The extreme size and complexity of the current XQuery
drafts clearly are harmful to interoperability, for the reasons detailed
above.  Radical surgery should be applied to the XQuery feature set. This
will lead to a higher-quality, more  widely-deployed result with a
substantially smaller investment of work.

MK:
Remove a feature from XQuery, and vendors will invent a proprietary
replacement for the missing functionality. That can hardly be said to
improve interoperability.

TB 2. Spec Suite organization

There needs to be an overview somewhere, a starting point, mostly tutorial
in nature, that explains the relationships between XQuery, the data model,
the use cases, the functions and operators, and XPath 2. Having read all of
them at least in part, I remain fairly puzzled as to how they're supposed to
fit together.

MK: I agree with you entirely that the document set is poorly structured. It
is designed for the convenience of the authors, not of the readers. This is
a difficult problem to fix, because it isn't always possible to allocate
work to editors in an optimum way, but I personally think we should try.


TB 3. Function of the "Data Model" and "Formal Semantics"

It is not clear that both the Data Model and Formal Semantics specs need to
exist, or that they need to have independent lives outside of the XQuery
spec.  In particular, I'm pretty sure that a conformant XQuery
implementation could be built with little or no reference to anything but
the XQuery and F&O specs, raising questions as to whether all the work on DM
and FS are cost-effective.

MK: I agree with you on this point too. The formal semantics, and the more
formal parts of the data model definition, have been very useful to the
working group as vehicles for testing and formalizing our ideas, but I do
not personally think they are a good way of publishing normative
specification material. Of course, there are others on the group who have
made an immense contribution to these documents, which represent a
significant intellectual achievement, and who would understandably disagree
with me.

TB: The Data Model and Formal Semantics docs are sufficiently complex and
hard to understand that they don't seem to serve any tutorial purpose. At
the very least, the spec suite needs to be very clear as to whether
implementors need to read them (in whole or in part), and if so why.

MK: yes.

TB 4. Overlapping material

There is a large amount of overlapping material in XQuery, the Data Model,
the Formal Semantics, and XPath 2.  This has the negative effect that it's
really hard to read both XQuery and XPath and pay attention, because the
attention wanders as you realize you've already read this 15-page sequence.
It would be highly desirable if the material that is *not* common could be
called out somehow.

I as an implementor would be very interested in which bits of machinery are
XQuery-only, XPath-only, or shared.

Since the portions that are shared are sensibly generated from a common
source, I assume that such a call-out is achievablle.

MK: for internal use, we publish a combined spec in which the XPath and
XQuery parts are highlighted. I think we should review whether it would be
useful to publish this externally.

TB: I note considerable overlap also in the FS and DM specs with each other
and with XQuery.  The same comment applies.

MK: Here the solution is less easy, but I agree entirely with the aim.


TB 5. Use Cases for Type-based operations

XQuery defines built-in primitives which operate in terms of data types:
"cast", "treat", "assert", and "validate".  The volume of design that has
gone into building this framework is highly out of proportion to the
scenarios presented in the Use Cases document.

In particular, there are no use cases for the "cast", "assert", or
"validate" built-ins.  Almost every other aspect of XQuery has a far richer
backing in the use-case document. It is difficult to understand how the
design of such a framework can proceed intelligently without use-cases in
mind.

The best solution to this problem would be simply to drop most of these
type-based operations in the interests of getting a reasonably interoperable
XQuery 1.0 done in a reasonable amount of time.

MK: I personally have some sympathy with the view that the type machinery in
XQuery is over-engineered. We have the benefit of having some excellent type
theorists on the working group, and it is very hard for those of us who
don't fall into this category to tell when they are solving real problems
and when they are building castles in Spain. All I know is that we couldn't
do the job without them. I also know that a good solid type system is
absolutely crucial to a database query engine. I would love it to stay good
and solid but to become far simpler, and if anyone can show how to achieve
that I will buy them several pints.

Part of the complexity, of course, derives directly from XML Schema. If XML
Schema were much simpler, XQuery could also be much simpler. Some of us have
argued that there are features in XML Schema we simply shouldn't support
(for example, anonymous types), but this always gets a response from vendors
that their users are already making heavy use of these features and we can't
ask users to rewrite their schemas.


TB 6. XML Schema Data Types and Duration

The reliance on XML Schema basic types seems well-thought-through, although
the comprehensibility and ease of implementation of XQuery would be greatly
increased by dropping support for some number of XSD basic types, without,
it seems, much serious loss of functionality.

MK: I think we've got the balance roughly right on this. Our support for the
lesser-used of the 19 primitive types is absolutely minimal, and withdrawing
this support would remove about six paragraphs from the specs, which hardly
seems worth the trouble.

TB: The use of two types derived from XSD's "Duration" type is obviously
necessary, but highlights a co-ordination problem.  Anybody who wants to do
computation with duration-typed data is pretty clearly going to want the
XQuery version, not the XSD version.  Since it seems that many different
activities want to use XSD basic data types, it is highly unsatisfactory
that they are going to have to call out to two specifications, XSD and
XQuery.  As a co-ordination issue, XML Schema should be required to fix this
design defect.

MK: We have been working closely with XML Schema on this. I don't think
there is anything we could be doing that we aren't doing.


TB 7. PIs and Comments

If I read XQuery 2.1.3.2 and 2.3.1.2 correctly, XQuery includes the
capability of searching on the presence of comments and on PIs and their
targets.

PI search capability is guaranteed to provoke controversy since there is a
body of opinion that PIs are architecturally second-class citizens and
anything that promotes their use should be deprecated.   This should be
seriously considered for removal.

XQuery access to comments seems simply incorrect given that there is no
assurance that they will be present in the data model even if they are in
the source document, and also because it is highly architecturally unsound
to encourage the use of comments for holding information of lasting
interest.  This should be removed without further ado.

The inclusion of Comment and PI in XQuery is further evidence of lack of
attention to 80/20 thinking and cost/benefit trade-offs.

MK: I disagree with you on this. As an XML database vendor, we know that one
of the things our users complain about is that the documents coming out of
the database aren't the same as the ones they put in, for example, entities
and CDATA sections are lost. Losing comments and PIs as well would certainly
be unpopular. It would also create further problems with XPath 1.0
compatibility.

TB: For similar reasons, all of section 2.8.4 (constructors for CDATA
sections, PIs, and comments) should be considered for removal.

Again, I disagree. There are target document formats that require these
features to be present. We need to revise the CDATA stuff because CDATA
sections are not in the model (that's a known issue), but we should
otherwise support the full data model, including, in my view, additional XML
quirks such as unparsed entities - whenever I assert that no-one uses them,
someone proves me wrong.


TB 8. Relation to Schema Languages

At the moment, by conscious design choice traceable back to the requirements
documents, XQuery is quite strongly linked to W3C XML Schemas in several
ways.

In retrospect, this choice was unfortunate.  Fortunately, the situation can
be rectified at moderate cost and with considerable benefit.

MK: I agree with you that XML Schema is horribly over-complex. I don't agree
that we can manage without it. There is no easy solution to this problem.

TB: Reasons why the linkage to XML Schema is problematic:

- XML Schema is large, complex, and buggy.  The linkage greatly increases
the difficulty of understanding and implementing XQuery.

- XML Schema is poorly suited to the needs of certain application classes
(in particular publishing applications), and there are other schema
alternatives available which are much better suited.  These application
classes are also likely to be heavy potential users of XQuery.

- XML Schema is a radical step forward in declarative constraint technology,
full of design choices that are based on speculation rather than experience.
It is highly unlikely that XSD will be the last word in schema technology
for XML, even in those application areas in which it specializes.  In
particular, ISO has a serious effort underway to create standards which
describe multiple XML schema languages; it would be disadvantageous if the
use of these were incompatible with XQuery. Decoupling XQuery from XSD will
increase survivability in the face of inevitable (and desirable) evolution
in schema languages.

- Every cross-specification dependency introduces potential versioning
problems that will increase the complexity and difficulty of maintaining the
specification suite as time goes on.  To the extent that such dependencies
can be reduced, the W3C and the community win.

MK: You can't design a query language without a definition of the data model
that it is designed to support. I don't believe that a typeless data model
would be viable either for implementors or users. The only typed model in
town is XML Schema. We don't like it, but we're stuck with it.

TB: Note that in the rather old XQuery requirements doc, section 3.5.5, it
says that "Schema" can mean either XML Schema or DTD.  This is an admirably
open viewpoint, and note that since that time, the schema universe has
grown.

There is one dependency from XQuery on XSD which should not be severed, the
dependency on atomic data types.  XQuery clearly needs such a repertory of
types, and those provided by XSchema are adequate.

TB: The remainder of this note discusses the ways in which XQuery is
currently linked to XSD and how they might be dealt with.

Linkage: The XQuery data model is described (in part) using terms defined in
XML Schema, and a specific procedure is given for constructing it using the
XSD PSVI as input.

Resolution: This is not a problem; the Data Model is described in enough
detail that it could be generated (as the draft notes) by a relational
database or a variety of other software modules, and understanding of XSD
(aside from the base data types) is not required to understand the data
model.  The construction procedure is not really normative in terms of the
operation of XQuery.  No change seems required.

Linkage: XQuery (sect. 3.1) provides for Schema Imports, to establish the
in-scope schema environment.  It is assumed that these are W3C XML Schemas.

Resolution: Add a clause to production [80] to identify the schema facility
in use, by namespace name or or mime-type, for example:

    schema "http://www.w3.org/1999/xhtml"
      of namespace "http://www.w3.org/2001/XMLSchema"
      at "http:/www.w3.org/1999/xhtml/xhtml.xsd"

MK: I can't see how supporting multiple schema languages can possibly be
seen as a reduction in complexity. I don't know of any query language that
has ever been designed with this kind of data-model-independence. We really
would be researching new ground. I think this is an absolute non-starter.

Linkage: XQuery provides type-based querying, where the types are those
identified by QNames in the data model.  Examples from XQuery 2.1.3.2:

    element person of type Employee
    attribute color of type xs:integer

Resolution 1: The semantics of matching the type identified by the qname
depend on the in-scope schema class as identified above.  XSD matches the
type if it's identical to or is a derivation of the named type; other schema
languages might have a more flexible notion of type matching.

MK: they might indeed. We could just say that the rules for type matching
are defined in the schema language, and say no more. We are actually quite
close to that, and moving further in that direction.

Resolution 2: Adjust XQuery to say that the "of type" clause is satisfied if
and only if the type given in the query is identical to that found in the
data model, requiring only direct qname comparison and bypassing schema
semantics.

MK: we are doing that.

Resolution 3: Drop type-based querying in the interests of the speedier
delivery of a higher-quality recommendation.

MK: I would personally buy that, but 75% of the WG members would howl at the
suggestion. There are some features you can't leave out of v1 in the hope of
adding them later.

Linkage: XQuery provides run-time type processing through the "treat",
"assert", and "cast" built-ins.

Resolution 1: The semantics of these functions depend on the class of the
in-scope schema as identified above.

MK: actually treat and assert are largely compile-time, and assert has since
been refactored. Cast works on the simple types, which you want to retain.
We are all striving to find simplifications to these constructs, and have
made some progress, but there's no magic wand.

Resolution 2: Drop these primitives from XQuery 1.0 - they have weak support
in the use cases anyhow.

Linkage: XQuery provides run-time validation and type-checking through the
"validate" built-in.

Resolution 1: The semantics of this function depend on the class of the
in-scope schema as identified above.

Resolution 2: Drop this primitive from XQuery 1.0 - it has weak support in
the use cases anyhow.

MK: I agree with your aims here. My colleagues on the WG know that I have
made many attempts to achieve simplification in these areas. But the devil
is in the detail: most proposals to take things out end up leaving the
language broken.

Michael Kay
Software AG
Received on Wednesday, 16 October 2002 22:13:20 UTC