- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Mon, 14 Jul 2003 16:01:58 -0700
- To: public-qt-comments@w3.org
An initial batch of notes from XML Schema on XQuery, the data model, and functions and operators is on the Web at http://www.w3.org/XML/Group/2003/07/xmlschema-query-notes.html An ASCII version follows for those who read email away from the Web. -C. M. Sperberg-McQueen for the XML Schema WG [1]W3C [2]Architecture Domain [3]XML | [4]XML Schema | [5]Member Events | [6]Member-Confidential! [1] http://www.w3.org/ [2] http://www.w3.org/Architecture/ [3] http://www.w3.org/XML/Group [4] http://www.w3.org/XML/Group/Schemas [5] http://www.w3.org/Member/Eventscal.html [6] http://www.w3.org/Member/#confidential W3C XML Schema Working Group Comments on Query Documents 14 July 2003 _________________________________________________________________ * 1. [7]Background: documents reviewed * 2. [8]Major issues + 2.1. [9]Time zones + 2.2. [10]The type anyAtomicType + 2.3. [11]The type untypedAtomic + 2.4. [12]Schema Access and Construction + 2.5. [13]Data model lacks normative reference to anyAtomicType? + 2.6. [14]Plans for CR/PR/REC * 3. [15]Issues of moderate importance + 3.1. [16]Duration types + 3.2. [17]Attribute lexical forms, values, and types + 3.3. [18]On URIs + 3.4. [19]More specific types * 4. [20]Minor comments (typos, etc.) _________________________________________________________________ This document contains some initial comments by the W3C XML Schema Working Group on the current set of documents issued by the XSL and XML Query Working Groups. The XML Schema WG is continuing to study the relevant documents and may make further comments. First and foremost, the XML Schema WG congratulates the XML Query and XSL Working Groups on the high quality and great utility of the work reflected in your documents. We are gratified to see the deep integration of the XML Schema type system into your data model and we are very happy to note that with the passage of time your drafts have been increasingly well harmonized with XML Schema. We do have some comments, some of which raise serious concerns which will require substantial work to resolve. We look forward to working with you to resolve them. 1. Background: documents reviewed The comments below arose primarily from a review of [21]XQuery 1.0: An XML Query Language; the reviewers also noticed and raised a few issues in the [22]XQuery 1.0 and XPath 2.0 Data Model. For various reasons including some confusion, our reviewers performed their detailed review on a version of XQuery dated February 2003, the status and history of which is a bit murky. It is labeled "Working Draft", but it has an erroneous "This Version" URI. In the meantime, a [23]May working draft has been published, which does not list the February version among previous working drafts. We have hastily checked our comments against the May draft and believe that they remain relevant, and we allow ourselves to express the hope that the XML Query and XSL Working Groups and their editors can find ways of managing their internal and public drafts in such a way as to reduce the likelihood of this kind of confusion in the future. [21] http://lists.w3.org/Archives/Member/w3c-archive/2003Feb/att-0103/02-xquery.html [22] http://www.w3.org/TR/xpath-datamodel/ [23] http://www.w3.org/TR/2003/WD-xquery-20030502/ 2. Major issues 2.1. Time zones The [24]query data model construes timezones as significant in the value as well as the lexical forms of xs:dateTime, xs:date, and xs:time. The XML Schema specification does not forbid applications to take timezone information into account; the timezone information is visible in the lexical forms of the post-schema-validation information set. That said, however, the data model's value space for this type is definitely not XML Schema's value space, and the situation is at best confusing for users. We believe that the discrepancy between the XML Query and XML Schema accounts of these types is untenable, because it will place an unacceptable burden on users of the two languages. As a Working Group, we oppose the progression of the Query/XSL specifications while this discrepancy persists. We believe the three Working Groups must discuss this and related questions and reach consensus, and changes must be made either in one of the two type systems or the other, or in both. We are willing to make appropriate changes in XML Schema 1.1 to achieve this harmonization if together we reach consensus that that is what is needed. It may be noted that some members of the Schema WG favored including the timezone in a valuespace tuple in the first place. Others believe that there is a serious problem with any evaluation mechanism which does not realize that "5 p.m. Eastern Time" and "2 p.m. Pacific Time" are different ways of denoting the same moment of time. [24] http://www.w3.org/TR/xpath-datamodel/#timezones 2.2. The type anyAtomicType A type named anyAtomicType is introduced as a subtype of xdt:anySimpleType; the new type is introduced as an ancestor to builtins such as xs:integer. We have some concerns here; they include: (a) this changes the type hierarchy and is incompatible with the simple types as published in XML Schema (because those types explicitly name their base types), and (b) the derivation of anyAtomicType appears to be `magic' and thus outside the scope of derivations expressible in XML Schema 1.0. Several members of the Schema WG believe that does seem to be real need for this type in Query, but it appears to us that some coordination is needed among the responsible Working Groups. Changes of this kind to the type hierarchy create clear interoperability problems because different schema-aware processors will produce different and incompatible results when asked for information about xs:integer or other primitive types. Because this type as defined is necessarily magic, the problems cannot be resolved by providing a conventional declaration for it. The XML Schema Working Group opposes the progression of any Query/XSL-related specification until this incompatibility has been resolved. As with other discrepancies between our type system and your use of it, we are ready to modify our type hierarchy in XML Schema 1.1 if the responsible Working Groups can reach consensus. 2.3. The type untypedAtomic The type xdt:untypedAtomic is introduced for untyped nodes "such as text in schemaless documents." The query LC draft says "It has no subtypes". It's not clear whether this type is ever to be used by schema processing, or is only visible in the query system. We believe this raises compatibility issues vis-a-vis XML Schema. It might be argued that versions of XML Schema should assign this type in the PSVI to information items which current receive no type information properties (e.g. because they matched a wildcard with processContents="skip"), as that would seem to maximize compatibility with Query. We believe the three Working Groups should work to achieve consensus on this topic. We believe you should not progress your documents until we do so. 2.4. Schema Access and Construction We have concerns regarding both the mechanisms provided for and the terminology used to describe access to XML schema documents. The pertinent XQuery mechanisms are outlined primarily in [25]section 4.4. For example, this section opens with the statement that: [25] http://www.w3.org/TR/2003/WD-xquery-20030502/#id-schema-imports [$1\47] SchemaImport ::= "import" "schema" SchemaPrefix? StringLiteral "at" StringLiteral? [$1\47] SchemaPrefix ::= ("namespace" NCName "=") | ("default" "element" "namespace" "=") A schema import imports the element, attribute, and type definitions from a named schema into the in-scope schema definitions. The string literals in a schema import must be valid URIs. The schema import specifies the target namespace of the schema to be imported, and optionally the location of the schema. A schema import may also bind a namespace prefix to the target namespace of the imported schema, or may declare that target namespace to be the default element namespace. The optional location indication can be disregarded by an implementation if it has another way to locate the given schema. The following example imports the schema for an XHTML document, specifying both its target namespace and its location, and binding the prefix xhtml to this namespace: import schema namespace xhtml="http://www.w3.org/1999/xhtml" at "http://example.org/xhtml/xhtml.xsd" The following example imports a schema by specifying only its target namespace, and makes it the default element namespace for the query: import schema default element namespace="http://example.org/abc" This formulation seems in certain respects at odds with the schema terminology defined by the XML Schema Recommendation, and in other respects to be unnecessarily out of synch with the [26]mechanisms of XML schema composition. For example, where the text above refers to a "named schema", we conjecture that it may well mean a "named schema document". If so, we believe it should be reformulated to say "named schema document"; if not, we believe it would be helpful to say more explicitly what forms of resource an XQuery processor may or must accept as sources of schema components. The terms "schema" and "schema document" are carefully distinguished in the XML Schema Recommendation; we believe the distinction should be observed in the specs related to XQuery and XSLT. Since any schema document asserts the targetNamespace for which it is providing declarations, we think that XQuery needs to describe what should happen if the document referenced by the at clause is a schema document for a different namespace or for no namespace at all. The Query spec should also indicate the rules for handling <xsd:include>, <xsd:redefine>, and <xsd:import> in the schema documents (transitively) referenced by the query import. The rules may be as simple as "do what the XML Schema Recommendation requires, and wherever the Schema Recommendation provides latitude query processors have similar latitude", but an explicit statement should be made. The XML Schema WG also has some concern about the formulation "A schema import imports the element, attribute, and type definitions...." This seems to suggest that other components (e.g. named model groups, named attribute, and the schema component itself) are not imported. Since the XML Schema Recommendation requires that a schema be available as a prerequisiste to validation, the suggestion that the schema component is not imported is troubling. We recommend that to the extent possible XQuery avoid restating the composition mechanisms of XML schema, but instead refer to them directly. We don't wish to prescribe a particular formulation, but we believe something along the following lines would be clearer and less prone to introducing interoperability problems with existing XML Schema processors: * A query processor MUST identify schema documents or other sources of schema components for each namespace named in a query import. Where at is specified, the schema document named MAY be used, and if used in that document MUST declare the targetNamespace specfied (if not, error XXX is thrown). * A schema is constructed as described in the XML Schema Recommendation. That schema consists of the components described by the schema documents (if any) identified in step 1 as well as any components identified through other means (i.e. conveyed in forms other than schema documents). It is an error if the resulting schema does not meet all constraints on schemas as defined in the XML Schema Recommendation. * All schema validations performed using this query context are performed with respect to the schema thus constructed (though different validations may specify different complexTypes or element declarations from the schema to be used as the basis for validation.) [26] http://www.w3.org/TR/xmlschema-1/#composition The quoted section goes on to say: It is a static error to import two schemas that both define the same name in the same symbol space and in the same scope. For instance, a query may not import two schemas that include top-level element declarations for two elements with the same expanded name. This appears to be a tentative and very incomplete foray into redefining or at least restating the rules for schema assembly. It would seem more appropriate to say that when constructing a schema using the mechanisms of XQuery (such as XQuery import), the resulting schema must conform to all the Constraints on Schema and other normative requirements of the XML Schema Recommendation. That would pick up the constraint quoted above -- and many more. As noted above: it seems to us that greater clarity is needed to make explicit the expected behavior of XQuery and XSLT processors vis-a-vis the exploitation of the mechanisms provided by XML schema. For example, is it the intention of the XQuery group that all user-supplied schema information necessarily be in the form of schema documents? That should be your call, but there seems to be no good reason for such a restriction. Query processors have always seemed to us a particularly promising area for the deployment of binary representations of schema components. The specifications should also comment on the handling of schemaLocation hints in XML instances, the handling of <xsd:include> and <xsd:redefine>, and so on. Overall, it appears that XML schema lays an effective foundation to meet the needs of Query, but a bit more work is needed to make all the details explicit. (It would be useful, for example, to mention the various resource resolution methods outlined in Part 2 of [27]http://www.w3.org/People/cmsmcq/2001/schema-resolution and to make clear what constraints, if any, XQuery places on the strategy to be followed by a query processor.) To some degree these concerns are covered in section 2.6.2: [27] http://www.w3.org/People/cmsmcq/2001/schema-resolution 2.6.2 Schema Import Feature The Schema Import Feature removes the limitations specified by Rules 1 through 6 of Basic XQuery. During the analysis phase, in-scope schema definitions are derived from schemas named in Schema Import clauses in the Prolog. If more than one schema is imported, the definitions contained in these schemas are collected into a single pool of definitions. This pool of definitions must satisfy the conditions for schema validity set out in Sections 3 and 5 of [XML Schema] Part 1. In brief, the definitions must be valid, they must be complete and they must be unique--that is, the pool of definitions must not contain two or more schema components with the same name and target namespace. If any of these conditions is violated, a static error must be raised. The term "pool of definitions" is not defined by any normative specification. We believe it would be helpful to make explicit that it is an informal way of referring to the set of schema components which go to make up a schema -- if, that is, that is what it refers to. A minority of the XML Schema Working Group believed that the term "pool of definitions" corresponds not to the set of all schema components in a schema, but specifically to the top-level component describing the schema as a whole. 2.5. Data model lacks normative reference to anyAtomicType? The Data Model document uses the anyAtomicType as the return type for the dm:typedValue accessor. Unless we have overlooked it, the draft provides no introduction or normative reference for this type (we believe the reference would be to section 2.4.1 of [28]XQuery, which gives a definition for the type.) In any case, it seems that a normative reference is needed in the Data Model document. [28] http://www.w3.org/TR/2003/WD-xquery-20030502/#d0e1328 2.6. Plans for CR/PR/REC As far as we can tell, the Last Call drafts of the data model and functions and operators documents do not indicate whether the XML Query and XSL Working Groups intend, after Last Call, to advance the documents to Candidate Recommendation or to Proposed Recommendation. It is also not explicit whether these two drafts will advance ahead of the main specification documents which depend on them. We respectfully suggest to our colleagues that: * It would be a mistake to advance the Data Model and or F&O specs ahead of XQuery and XSLT. These specs are of minimal use in isolation. As the main documents proceed through the review process, it is important to maximize freedom of action in resolving issues. Advancing the Data Model or F&O specs to CR or REC ahead of the main documents themselves would seem to limit such freedom insofar as it makes changes to those documents more difficult. We recommend that all the interconnected specs advance to CR and PR together. * It might be helpful if future drafts were more explicit about the Working Groups' plans for them (i.e. whether they are intended for CR, PR, or whether there is an intention to make a determination after gathering feedback). * Given the complexity of this set of inter-related specifications, we strongly recommend that all go through a CR phase. For comparison, SOAP was a much simpler specification, with large numbers of deployed commercial implementations of earlier versions, and substantial implementation of version 1.2 features. Nonetheless, a CR period was required to demonstrate at least two interoperable implementations of each feature. We feel that many of the interactions with Schema in particular will become clearer during implementation testing, and hence we have particular reason to recommend that there be a CR review period. We leave it to the Query group (and W3C staff) to determine whether a relatively short or a longer CR period will be appropriate to gather the necessary implementation experience. 3. Issues of moderate importance The XML Schema WG has not discussed the following comments; they were suggested by our reviewers and we include them in case they are useful to the Query and XSL Working Groups. 3.1. Duration types [29]Section 2.4.1 introduces xdt:dayTimeDuration. This section really should make clear that a schema document explicitly referencing this type MUST contain an import for the xdt namespace, even if resolution of the import is built in by schema processors. (An alternative would be for these types to be in the XML Schema 1.1 type system. The WGs should discuss this possibility.) [29] http://www.w3.org/TR/2003/WD-xquery-20030502/#d0e1328 3.2. Attribute lexical forms, values, and types 2.3.2 Currently says: "The typed value of an attribute node with any other type annotation is derived from its string value and type annotation in a way that is consistent with schema validation." This seems to cover the case where a parsed document contains characters for the attribute, from which values can be derived. It does not seem to cover the case in which the data value is known first (e.g. because it was computed). Perhaps it would be better to describe the relationship as more symmetric: "The typed value of an attribute node is always related to its string value by the mechanisms of XML Schema." Also: some members of the Schema WG believe we should encourage you to give a bit of attention to whitespace handling, in order to avoid unnecessary inconsistencies with XML Schema and/or user expectations. In XML Schema, whitespace normalization happens during the process of identifying a lexical form, not as part of the lexical -> value mapping; it is handled by Structures, not Datatypes. If a given simple type has a whitespace facet of "collapse", how does a query processor deal with that? The appropriate normative references to XML Schema should be made. The potential issues with "i18n-collapse" make coordination on whitespace all the more important. 3.3. On URIs Section 4.2 Currently says: "The string literal used in a namespace declaration must be a valid URI, and may not be a zero-length string." What does the term "valid URI" mean? Normative references should be provided for any such terminology, and any constraints clearly explained. Possible specific recommendations: The string literal used in a namespace declaration must be of non-zero length and must * be a valid lexical form per the definition of xsd:anyURI (N.B. this puts very few constraints on the string) or * be (lexically identical to) a `namespace name' as specified in [30]http://www.w3.org/TR/REC-xml-names/#ns-decl [30] http://www.w3.org/TR/REC-xml-names/#ns-decl 3.4. More specific types On the general use of the term "more specific (type)": Currently, we believe that this phrase is used to denote a derived type. It would be better not to use "more specific" if in fact "derived" is what is meant. On the other hand, if "more specific" doesn't mean "derived", a definition of what it does mean would help. 4. Minor comments (typos, etc.) 2 (Basics) Currently says: "...so function names appearing without a namespace prefix can be assumed to be in this namespace." This would be clearer as: "...so function names appearing in examples or definitions can be assumed to be in the namespace of XPath/XQuery functions." 2.1.2 Currently says: "...these functions always returns..." Should be: "...these functions always return..." 2.3 Currently says: "...the value of an element is represented by the children..." The term "represented" doesn't seem right. "Constituted" seems to work better, but doesn't adequately deal with mixed content. 2.3.1 Currently says: "The relative order among free-floating nodes (those not in a document) is also implementation-defined but stable." Does it intend to say: "The relative order among free-floating nodes (those not in a document) is also implementation-defined but stable with respect to themselves and all other nodes."? The original left it a bit unclear whether the order is total, or only among the free floating nodes. 2.4.2 (type checking) Currently says: para 3: "The static type of an expression may be either a named type or a structural description" para 4: "The dynamic type of a value may be either a structural type (such as `sequence of integers') or a named type" The juxtaposition of the terms "structural description" and "structural type" is a little confusing at first. Perhaps another term could be found for one of them? 3.5.1 Shows an example: The following comparison is true because the two constructed nodes have the same value after atomization, even though they have different identities: <a>5</a> eq <a>5</a> It might be interesting to additionally comment on: <a>5</a> eq <b>5</b> Are these two also equal? They would seem to have the same value after atomization.
Received on Monday, 14 July 2003 19:02:19 UTC