- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Mon, 14 Jul 2003 16:01:58 -0700
- To: public-qt-comments@w3.org
An initial batch of notes from XML Schema on XQuery, the data
model, and functions and operators is on the Web at
http://www.w3.org/XML/Group/2003/07/xmlschema-query-notes.html
An ASCII version follows for those who read email away from the
Web.
-C. M. Sperberg-McQueen
for the XML Schema WG
[1]W3C [2]Architecture Domain [3]XML | [4]XML Schema | [5]Member
Events | [6]Member-Confidential!
[1] http://www.w3.org/
[2] http://www.w3.org/Architecture/
[3] http://www.w3.org/XML/Group
[4] http://www.w3.org/XML/Group/Schemas
[5] http://www.w3.org/Member/Eventscal.html
[6] http://www.w3.org/Member/#confidential
W3C XML Schema Working Group
Comments on Query Documents
14 July 2003
_________________________________________________________________
* 1. [7]Background: documents reviewed
* 2. [8]Major issues
+ 2.1. [9]Time zones
+ 2.2. [10]The type anyAtomicType
+ 2.3. [11]The type untypedAtomic
+ 2.4. [12]Schema Access and Construction
+ 2.5. [13]Data model lacks normative reference to
anyAtomicType?
+ 2.6. [14]Plans for CR/PR/REC
* 3. [15]Issues of moderate importance
+ 3.1. [16]Duration types
+ 3.2. [17]Attribute lexical forms, values, and types
+ 3.3. [18]On URIs
+ 3.4. [19]More specific types
* 4. [20]Minor comments (typos, etc.)
_________________________________________________________________
This document contains some initial comments by the W3C XML Schema
Working Group on the current set of documents issued by the XSL and
XML Query Working Groups. The XML Schema WG is continuing to study the
relevant documents and may make further comments.
First and foremost, the XML Schema WG congratulates the XML Query and
XSL Working Groups on the high quality and great utility of the work
reflected in your documents.
We are gratified to see the deep integration of the XML Schema type
system into your data model and we are very happy to note that with
the passage of time your drafts have been increasingly well harmonized
with XML Schema. We do have some comments, some of which raise serious
concerns which will require substantial work to resolve. We look
forward to working with you to resolve them.
1. Background: documents reviewed
The comments below arose primarily from a review of [21]XQuery 1.0: An
XML Query Language; the reviewers also noticed and raised a few issues
in the [22]XQuery 1.0 and XPath 2.0 Data Model. For various reasons
including some confusion, our reviewers performed their detailed
review on a version of XQuery dated February 2003, the status and
history of which is a bit murky. It is labeled "Working Draft", but it
has an erroneous "This Version" URI. In the meantime, a [23]May
working draft has been published, which does not list the February
version among previous working drafts. We have hastily checked our
comments against the May draft and believe that they remain relevant,
and we allow ourselves to express the hope that the XML Query and XSL
Working Groups and their editors can find ways of managing their
internal and public drafts in such a way as to reduce the likelihood
of this kind of confusion in the future.
[21]
http://lists.w3.org/Archives/Member/w3c-archive/2003Feb/att-0103/02-xquery.html
[22] http://www.w3.org/TR/xpath-datamodel/
[23] http://www.w3.org/TR/2003/WD-xquery-20030502/
2. Major issues
2.1. Time zones
The [24]query data model construes timezones as significant in the
value as well as the lexical forms of xs:dateTime, xs:date, and
xs:time. The XML Schema specification does not forbid applications to
take timezone information into account; the timezone information is
visible in the lexical forms of the post-schema-validation information
set. That said, however, the data model's value space for this type is
definitely not XML Schema's value space, and the situation is at best
confusing for users.
We believe that the discrepancy between the XML Query and XML Schema
accounts of these types is untenable, because it will place an
unacceptable burden on users of the two languages. As a Working Group,
we oppose the progression of the Query/XSL specifications while this
discrepancy persists.
We believe the three Working Groups must discuss this and related
questions and reach consensus, and changes must be made either in one
of the two type systems or the other, or in both. We are willing to
make appropriate changes in XML Schema 1.1 to achieve this
harmonization if together we reach consensus that that is what is
needed.
It may be noted that some members of the Schema WG favored including
the timezone in a valuespace tuple in the first place. Others believe
that there is a serious problem with any evaluation mechanism which
does not realize that "5 p.m. Eastern Time" and "2 p.m. Pacific Time"
are different ways of denoting the same moment of time.
[24] http://www.w3.org/TR/xpath-datamodel/#timezones
2.2. The type anyAtomicType
A type named anyAtomicType is introduced as a subtype of
xdt:anySimpleType; the new type is introduced as an ancestor to
builtins such as xs:integer. We have some concerns here; they include:
(a) this changes the type hierarchy and is incompatible with the
simple types as published in XML Schema (because those types
explicitly name their base types), and (b) the derivation of
anyAtomicType appears to be `magic' and thus outside the scope of
derivations expressible in XML Schema 1.0. Several members of the
Schema WG believe that does seem to be real need for this type in
Query, but it appears to us that some coordination is needed among the
responsible Working Groups.
Changes of this kind to the type hierarchy create clear
interoperability problems because different schema-aware processors
will produce different and incompatible results when asked for
information about xs:integer or other primitive types. Because this
type as defined is necessarily magic, the problems cannot be resolved
by providing a conventional declaration for it. The XML Schema Working
Group opposes the progression of any Query/XSL-related specification
until this incompatibility has been resolved.
As with other discrepancies between our type system and your use of
it, we are ready to modify our type hierarchy in XML Schema 1.1 if the
responsible Working Groups can reach consensus.
2.3. The type untypedAtomic
The type xdt:untypedAtomic is introduced for untyped nodes "such as
text in schemaless documents." The query LC draft says "It has no
subtypes". It's not clear whether this type is ever to be used by
schema processing, or is only visible in the query system. We believe
this raises compatibility issues vis-a-vis XML Schema. It might be
argued that versions of XML Schema should assign this type in the PSVI
to information items which current receive no type information
properties (e.g. because they matched a wildcard with
processContents="skip"), as that would seem to maximize compatibility
with Query.
We believe the three Working Groups should work to achieve consensus
on this topic. We believe you should not progress your documents until
we do so.
2.4. Schema Access and Construction
We have concerns regarding both the mechanisms provided for and the
terminology used to describe access to XML schema documents. The
pertinent XQuery mechanisms are outlined primarily in [25]section 4.4.
For example, this section opens with the statement that:
[25] http://www.w3.org/TR/2003/WD-xquery-20030502/#id-schema-imports
[$1\47] SchemaImport ::= "import" "schema" SchemaPrefix?
StringLiteral "at" StringLiteral?
[$1\47] SchemaPrefix ::= ("namespace" NCName "=")
| ("default" "element" "namespace" "=")
A schema import imports the element, attribute, and type definitions
from a named schema into the in-scope schema definitions. The string
literals in a schema import must be valid URIs. The schema import
specifies the target namespace of the schema to be imported, and
optionally the location of the schema. A schema import may also bind a
namespace prefix to the target namespace of the imported schema, or
may declare that target namespace to be the default element namespace.
The optional location indication can be disregarded by an
implementation if it has another way to locate the given schema.
The following example imports the schema for an XHTML document,
specifying both its target namespace and its location, and binding the
prefix xhtml to this namespace:
import schema namespace xhtml="http://www.w3.org/1999/xhtml"
at "http://example.org/xhtml/xhtml.xsd"
The following example imports a schema by specifying only its target
namespace, and makes it the default element namespace for the query:
import schema default element namespace="http://example.org/abc"
This formulation seems in certain respects at odds with the schema
terminology defined by the XML Schema Recommendation, and in other
respects to be unnecessarily out of synch with the [26]mechanisms of
XML schema composition. For example, where the text above refers to a
"named schema", we conjecture that it may well mean a "named schema
document". If so, we believe it should be reformulated to say "named
schema document"; if not, we believe it would be helpful to say more
explicitly what forms of resource an XQuery processor may or must
accept as sources of schema components. The terms "schema" and "schema
document" are carefully distinguished in the XML Schema
Recommendation; we believe the distinction should be observed in the
specs related to XQuery and XSLT.
Since any schema document asserts the targetNamespace for which it is
providing declarations, we think that XQuery needs to describe what
should happen if the document referenced by the at clause is a schema
document for a different namespace or for no namespace at all.
The Query spec should also indicate the rules for handling
<xsd:include>, <xsd:redefine>, and <xsd:import> in the schema
documents (transitively) referenced by the query import. The rules may
be as simple as "do what the XML Schema Recommendation requires, and
wherever the Schema Recommendation provides latitude query processors
have similar latitude", but an explicit statement should be made.
The XML Schema WG also has some concern about the formulation "A
schema import imports the element, attribute, and type
definitions...." This seems to suggest that other components (e.g.
named model groups, named attribute, and the schema component itself)
are not imported. Since the XML Schema Recommendation requires that a
schema be available as a prerequisiste to validation, the suggestion
that the schema component is not imported is troubling. We recommend
that to the extent possible XQuery avoid restating the composition
mechanisms of XML schema, but instead refer to them directly. We don't
wish to prescribe a particular formulation, but we believe something
along the following lines would be clearer and less prone to
introducing interoperability problems with existing XML Schema
processors:
* A query processor MUST identify schema documents or other sources
of schema components for each namespace named in a query import.
Where at is specified, the schema document named MAY be used, and
if used in that document MUST declare the targetNamespace specfied
(if not, error XXX is thrown).
* A schema is constructed as described in the XML Schema
Recommendation. That schema consists of the components described
by the schema documents (if any) identified in step 1 as well as
any components identified through other means (i.e. conveyed in
forms other than schema documents). It is an error if the
resulting schema does not meet all constraints on schemas as
defined in the XML Schema Recommendation.
* All schema validations performed using this query context are
performed with respect to the schema thus constructed (though
different validations may specify different complexTypes or
element declarations from the schema to be used as the basis for
validation.)
[26] http://www.w3.org/TR/xmlschema-1/#composition
The quoted section goes on to say:
It is a static error to import two schemas that both define the same
name in the same symbol space and in the same scope. For instance, a
query may not import two schemas that include top-level element
declarations for two elements with the same expanded name.
This appears to be a tentative and very incomplete foray into
redefining or at least restating the rules for schema assembly. It
would seem more appropriate to say that when constructing a schema
using the mechanisms of XQuery (such as XQuery import), the resulting
schema must conform to all the Constraints on Schema and other
normative requirements of the XML Schema Recommendation. That would
pick up the constraint quoted above -- and many more.
As noted above: it seems to us that greater clarity is needed to make
explicit the expected behavior of XQuery and XSLT processors vis-a-vis
the exploitation of the mechanisms provided by XML schema. For
example, is it the intention of the XQuery group that all
user-supplied schema information necessarily be in the form of schema
documents? That should be your call, but there seems to be no good
reason for such a restriction. Query processors have always seemed to
us a particularly promising area for the deployment of binary
representations of schema components.
The specifications should also comment on the handling of
schemaLocation hints in XML instances, the handling of <xsd:include>
and <xsd:redefine>, and so on. Overall, it appears that XML schema
lays an effective foundation to meet the needs of Query, but a bit
more work is needed to make all the details explicit. (It would be
useful, for example, to mention the various resource resolution
methods outlined in Part 2 of
[27]http://www.w3.org/People/cmsmcq/2001/schema-resolution and to make
clear what constraints, if any, XQuery places on the strategy to be
followed by a query processor.)
To some degree these concerns are covered in section 2.6.2:
[27] http://www.w3.org/People/cmsmcq/2001/schema-resolution
2.6.2 Schema Import Feature The Schema Import Feature removes the
limitations specified by Rules 1 through 6 of Basic XQuery.
During the analysis phase, in-scope schema definitions are derived
from schemas named in Schema Import clauses in the Prolog. If more
than one schema is imported, the definitions contained in these
schemas are collected into a single pool of definitions. This pool of
definitions must satisfy the conditions for schema validity set out in
Sections 3 and 5 of [XML Schema] Part 1. In brief, the definitions
must be valid, they must be complete and they must be unique--that is,
the pool of definitions must not contain two or more schema components
with the same name and target namespace. If any of these conditions is
violated, a static error must be raised.
The term "pool of definitions" is not defined by any normative
specification. We believe it would be helpful to make explicit that it
is an informal way of referring to the set of schema components which
go to make up a schema -- if, that is, that is what it refers to. A
minority of the XML Schema Working Group believed that the term "pool
of definitions" corresponds not to the set of all schema components in
a schema, but specifically to the top-level component describing the
schema as a whole.
2.5. Data model lacks normative reference to anyAtomicType?
The Data Model document uses the anyAtomicType as the return type for
the dm:typedValue accessor. Unless we have overlooked it, the draft
provides no introduction or normative reference for this type (we
believe the reference would be to section 2.4.1 of [28]XQuery, which
gives a definition for the type.) In any case, it seems that a
normative reference is needed in the Data Model document.
[28] http://www.w3.org/TR/2003/WD-xquery-20030502/#d0e1328
2.6. Plans for CR/PR/REC
As far as we can tell, the Last Call drafts of the data model and
functions and operators documents do not indicate whether the XML
Query and XSL Working Groups intend, after Last Call, to advance the
documents to Candidate Recommendation or to Proposed Recommendation.
It is also not explicit whether these two drafts will advance ahead of
the main specification documents which depend on them. We respectfully
suggest to our colleagues that:
* It would be a mistake to advance the Data Model and or F&O specs
ahead of XQuery and XSLT. These specs are of minimal use in
isolation. As the main documents proceed through the review
process, it is important to maximize freedom of action in
resolving issues. Advancing the Data Model or F&O specs to CR or
REC ahead of the main documents themselves would seem to limit
such freedom insofar as it makes changes to those documents more
difficult. We recommend that all the interconnected specs advance
to CR and PR together.
* It might be helpful if future drafts were more explicit about the
Working Groups' plans for them (i.e. whether they are intended for
CR, PR, or whether there is an intention to make a determination
after gathering feedback).
* Given the complexity of this set of inter-related specifications,
we strongly recommend that all go through a CR phase. For
comparison, SOAP was a much simpler specification, with large
numbers of deployed commercial implementations of earlier
versions, and substantial implementation of version 1.2 features.
Nonetheless, a CR period was required to demonstrate at least two
interoperable implementations of each feature. We feel that many
of the interactions with Schema in particular will become clearer
during implementation testing, and hence we have particular reason
to recommend that there be a CR review period. We leave it to the
Query group (and W3C staff) to determine whether a relatively
short or a longer CR period will be appropriate to gather the
necessary implementation experience.
3. Issues of moderate importance
The XML Schema WG has not discussed the following comments; they were
suggested by our reviewers and we include them in case they are useful
to the Query and XSL Working Groups.
3.1. Duration types
[29]Section 2.4.1 introduces xdt:dayTimeDuration. This section really
should make clear that a schema document explicitly referencing this
type MUST contain an import for the xdt namespace, even if resolution
of the import is built in by schema processors.
(An alternative would be for these types to be in the XML Schema 1.1
type system. The WGs should discuss this possibility.)
[29] http://www.w3.org/TR/2003/WD-xquery-20030502/#d0e1328
3.2. Attribute lexical forms, values, and types
2.3.2 Currently says: "The typed value of an attribute node with any
other type annotation is derived from its string value and type
annotation in a way that is consistent with schema validation." This
seems to cover the case where a parsed document contains characters
for the attribute, from which values can be derived. It does not seem
to cover the case in which the data value is known first (e.g. because
it was computed). Perhaps it would be better to describe the
relationship as more symmetric: "The typed value of an attribute node
is always related to its string value by the mechanisms of XML
Schema."
Also: some members of the Schema WG believe we should encourage you to
give a bit of attention to whitespace handling, in order to avoid
unnecessary inconsistencies with XML Schema and/or user expectations.
In XML Schema, whitespace normalization happens during the process of
identifying a lexical form, not as part of the lexical -> value
mapping; it is handled by Structures, not Datatypes. If a given simple
type has a whitespace facet of "collapse", how does a query processor
deal with that? The appropriate normative references to XML Schema
should be made. The potential issues with "i18n-collapse" make
coordination on whitespace all the more important.
3.3. On URIs
Section 4.2 Currently says: "The string literal used in a namespace
declaration must be a valid URI, and may not be a zero-length string."
What does the term "valid URI" mean? Normative references should be
provided for any such terminology, and any constraints clearly
explained. Possible specific recommendations:
The string literal used in a namespace declaration must be of non-zero
length and must
* be a valid lexical form per the definition of xsd:anyURI (N.B.
this puts very few constraints on the string)
or
* be (lexically identical to) a `namespace name' as specified in
[30]http://www.w3.org/TR/REC-xml-names/#ns-decl
[30] http://www.w3.org/TR/REC-xml-names/#ns-decl
3.4. More specific types
On the general use of the term "more specific (type)": Currently, we
believe that this phrase is used to denote a derived type. It would be
better not to use "more specific" if in fact "derived" is what is
meant. On the other hand, if "more specific" doesn't mean "derived", a
definition of what it does mean would help.
4. Minor comments (typos, etc.)
2 (Basics) Currently says: "...so function names appearing without a
namespace prefix can be assumed to be in this namespace." This would
be clearer as: "...so function names appearing in examples or
definitions can be assumed to be in the namespace of XPath/XQuery
functions."
2.1.2 Currently says: "...these functions always returns..." Should
be: "...these functions always return..."
2.3 Currently says: "...the value of an element is represented by the
children..." The term "represented" doesn't seem right. "Constituted"
seems to work better, but doesn't adequately deal with mixed content.
2.3.1 Currently says: "The relative order among free-floating nodes
(those not in a document) is also implementation-defined but stable."
Does it intend to say: "The relative order among free-floating nodes
(those not in a document) is also implementation-defined but stable
with respect to themselves and all other nodes."? The original left it
a bit unclear whether the order is total, or only among the free
floating nodes.
2.4.2 (type checking) Currently says: para 3: "The static type of an
expression may be either a named type or a structural description"
para 4: "The dynamic type of a value may be either a structural type
(such as `sequence of integers') or a named type"
The juxtaposition of the terms "structural description" and
"structural type" is a little confusing at first. Perhaps another term
could be found for one of them?
3.5.1 Shows an example:
The following comparison is true because the two constructed nodes
have the same value after atomization, even though they have different
identities:
<a>5</a> eq <a>5</a>
It might be interesting to additionally comment on:
<a>5</a> eq <b>5</b>
Are these two also equal? They would seem to have the same value after
atomization.
Received on Monday, 14 July 2003 19:02:19 UTC