notes from XML Schema WG from C. M. Sperberg-McQueen on 2003-07-14 (public-qt-comments@w3.org from July 2003)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Mon, 14 Jul 2003 16:01:58 -0700
To: public-qt-comments@w3.org
Message-Id: <5.1.0.14.1.20030714155816.02a9a618@localhost>
An initial batch of notes from XML Schema on XQuery, the data
model, and functions and operators is on the Web at
http://www.w3.org/XML/Group/2003/07/xmlschema-query-notes.html

An ASCII version follows for those who read email away from the
Web.

-C. M. Sperberg-McQueen
  for the XML Schema WG


    [1]W3C [2]Architecture Domain [3]XML | [4]XML Schema | [5]Member
    Events | [6]Member-Confidential!

       [1] http://www.w3.org/
       [2] http://www.w3.org/Architecture/
       [3] http://www.w3.org/XML/Group
       [4] http://www.w3.org/XML/Group/Schemas
       [5] http://www.w3.org/Member/Eventscal.html
       [6] http://www.w3.org/Member/#confidential

W3C XML Schema Working Group

Comments on Query Documents

14 July 2003

      _________________________________________________________________

      * 1. [7]Background: documents reviewed
      * 2. [8]Major issues
           + 2.1. [9]Time zones
           + 2.2. [10]The type anyAtomicType
           + 2.3. [11]The type untypedAtomic
           + 2.4. [12]Schema Access and Construction
           + 2.5. [13]Data model lacks normative reference to
             anyAtomicType?
           + 2.6. [14]Plans for CR/PR/REC
      * 3. [15]Issues of moderate importance
           + 3.1. [16]Duration types
           + 3.2. [17]Attribute lexical forms, values, and types
           + 3.3. [18]On URIs
           + 3.4. [19]More specific types
      * 4. [20]Minor comments (typos, etc.)

      _________________________________________________________________

    This document contains some initial comments by the W3C XML Schema
    Working Group on the current set of documents issued by the XSL and
    XML Query Working Groups. The XML Schema WG is continuing to study the
    relevant documents and may make further comments.
    First and foremost, the XML Schema WG congratulates the XML Query and
    XSL Working Groups on the high quality and great utility of the work
    reflected in your documents.
    We are gratified to see the deep integration of the XML Schema type
    system into your data model and we are very happy to note that with
    the passage of time your drafts have been increasingly well harmonized
    with XML Schema. We do have some comments, some of which raise serious
    concerns which will require substantial work to resolve. We look
    forward to working with you to resolve them.

1. Background: documents reviewed

    The comments below arose primarily from a review of [21]XQuery 1.0: An
    XML Query Language; the reviewers also noticed and raised a few issues
    in the [22]XQuery 1.0 and XPath 2.0 Data Model. For various reasons
    including some confusion, our reviewers performed their detailed
    review on a version of XQuery dated February 2003, the status and
    history of which is a bit murky. It is labeled "Working Draft", but it
    has an erroneous "This Version" URI. In the meantime, a [23]May
    working draft has been published, which does not list the February
    version among previous working drafts. We have hastily checked our
    comments against the May draft and believe that they remain relevant,
    and we allow ourselves to express the hope that the XML Query and XSL
    Working Groups and their editors can find ways of managing their
    internal and public drafts in such a way as to reduce the likelihood
    of this kind of confusion in the future.

      [21] 
http://lists.w3.org/Archives/Member/w3c-archive/2003Feb/att-0103/02-xquery.html
      [22] http://www.w3.org/TR/xpath-datamodel/
      [23] http://www.w3.org/TR/2003/WD-xquery-20030502/

2. Major issues

2.1. Time zones

    The [24]query data model construes timezones as significant in the
    value as well as the lexical forms of xs:dateTime, xs:date, and
    xs:time. The XML Schema specification does not forbid applications to
    take timezone information into account; the timezone information is
    visible in the lexical forms of the post-schema-validation information
    set. That said, however, the data model's value space for this type is
    definitely not XML Schema's value space, and the situation is at best
    confusing for users.
    We believe that the discrepancy between the XML Query and XML Schema
    accounts of these types is untenable, because it will place an
    unacceptable burden on users of the two languages. As a Working Group,
    we oppose the progression of the Query/XSL specifications while this
    discrepancy persists.
    We believe the three Working Groups must discuss this and related
    questions and reach consensus, and changes must be made either in one
    of the two type systems or the other, or in both. We are willing to
    make appropriate changes in XML Schema 1.1 to achieve this
    harmonization if together we reach consensus that that is what is
    needed.
    It may be noted that some members of the Schema WG favored including
    the timezone in a valuespace tuple in the first place. Others believe
    that there is a serious problem with any evaluation mechanism which
    does not realize that "5 p.m. Eastern Time" and "2 p.m. Pacific Time"
    are different ways of denoting the same moment of time.

      [24] http://www.w3.org/TR/xpath-datamodel/#timezones

2.2. The type anyAtomicType

    A type named anyAtomicType is introduced as a subtype of
    xdt:anySimpleType; the new type is introduced as an ancestor to
    builtins such as xs:integer. We have some concerns here; they include:
    (a) this changes the type hierarchy and is incompatible with the
    simple types as published in XML Schema (because those types
    explicitly name their base types), and (b) the derivation of
    anyAtomicType appears to be `magic' and thus outside the scope of
    derivations expressible in XML Schema 1.0. Several members of the
    Schema WG believe that does seem to be real need for this type in
    Query, but it appears to us that some coordination is needed among the
    responsible Working Groups.
    Changes of this kind to the type hierarchy create clear
    interoperability problems because different schema-aware processors
    will produce different and incompatible results when asked for
    information about xs:integer or other primitive types. Because this
    type as defined is necessarily magic, the problems cannot be resolved
    by providing a conventional declaration for it. The XML Schema Working
    Group opposes the progression of any Query/XSL-related specification
    until this incompatibility has been resolved.
    As with other discrepancies between our type system and your use of
    it, we are ready to modify our type hierarchy in XML Schema 1.1 if the
    responsible Working Groups can reach consensus.

2.3. The type untypedAtomic

    The type xdt:untypedAtomic is introduced for untyped nodes "such as
    text in schemaless documents." The query LC draft says "It has no
    subtypes". It's not clear whether this type is ever to be used by
    schema processing, or is only visible in the query system. We believe
    this raises compatibility issues vis-a-vis XML Schema. It might be
    argued that versions of XML Schema should assign this type in the PSVI
    to information items which current receive no type information
    properties (e.g. because they matched a wildcard with
    processContents="skip"), as that would seem to maximize compatibility
    with Query.
    We believe the three Working Groups should work to achieve consensus
    on this topic. We believe you should not progress your documents until
    we do so.

2.4. Schema Access and Construction

    We have concerns regarding both the mechanisms provided for and the
    terminology used to describe access to XML schema documents. The
    pertinent XQuery mechanisms are outlined primarily in [25]section 4.4.
    For example, this section opens with the statement that:

      [25] http://www.w3.org/TR/2003/WD-xquery-20030502/#id-schema-imports

[$1\47] SchemaImport ::= "import" "schema" SchemaPrefix?
                        StringLiteral "at" StringLiteral?
[$1\47] SchemaPrefix ::= ("namespace" NCName "=")
                      | ("default" "element" "namespace" "=")

    A schema import imports the element, attribute, and type definitions
    from a named schema into the in-scope schema definitions. The string
    literals in a schema import must be valid URIs. The schema import
    specifies the target namespace of the schema to be imported, and
    optionally the location of the schema. A schema import may also bind a
    namespace prefix to the target namespace of the imported schema, or
    may declare that target namespace to be the default element namespace.
    The optional location indication can be disregarded by an
    implementation if it has another way to locate the given schema.

    The following example imports the schema for an XHTML document,
    specifying both its target namespace and its location, and binding the
    prefix xhtml to this namespace:
import schema namespace xhtml="http://www.w3.org/1999/xhtml"
             at "http://example.org/xhtml/xhtml.xsd"

    The following example imports a schema by specifying only its target
    namespace, and makes it the default element namespace for the query:
import schema default element namespace="http://example.org/abc"

    This formulation seems in certain respects at odds with the schema
    terminology defined by the XML Schema Recommendation, and in other
    respects to be unnecessarily out of synch with the [26]mechanisms of
    XML schema composition. For example, where the text above refers to a
    "named schema", we conjecture that it may well mean a "named schema
    document". If so, we believe it should be reformulated to say "named
    schema document"; if not, we believe it would be helpful to say more
    explicitly what forms of resource an XQuery processor may or must
    accept as sources of schema components. The terms "schema" and "schema
    document" are carefully distinguished in the XML Schema
    Recommendation; we believe the distinction should be observed in the
    specs related to XQuery and XSLT.
    Since any schema document asserts the targetNamespace for which it is
    providing declarations, we think that XQuery needs to describe what
    should happen if the document referenced by the at clause is a schema
    document for a different namespace or for no namespace at all.
    The Query spec should also indicate the rules for handling
    <xsd:include>, <xsd:redefine>, and <xsd:import> in the schema
    documents (transitively) referenced by the query import. The rules may
    be as simple as "do what the XML Schema Recommendation requires, and
    wherever the Schema Recommendation provides latitude query processors
    have similar latitude", but an explicit statement should be made.
    The XML Schema WG also has some concern about the formulation "A
    schema import imports the element, attribute, and type
    definitions...." This seems to suggest that other components (e.g.
    named model groups, named attribute, and the schema component itself)
    are not imported. Since the XML Schema Recommendation requires that a
    schema be available as a prerequisiste to validation, the suggestion
    that the schema component is not imported is troubling. We recommend
    that to the extent possible XQuery avoid restating the composition
    mechanisms of XML schema, but instead refer to them directly. We don't
    wish to prescribe a particular formulation, but we believe something
    along the following lines would be clearer and less prone to
    introducing interoperability problems with existing XML Schema
    processors:
      * A query processor MUST identify schema documents or other sources
        of schema components for each namespace named in a query import.
        Where at is specified, the schema document named MAY be used, and
        if used in that document MUST declare the targetNamespace specfied
        (if not, error XXX is thrown).
      * A schema is constructed as described in the XML Schema
        Recommendation. That schema consists of the components described
        by the schema documents (if any) identified in step 1 as well as
        any components identified through other means (i.e. conveyed in
        forms other than schema documents). It is an error if the
        resulting schema does not meet all constraints on schemas as
        defined in the XML Schema Recommendation.
      * All schema validations performed using this query context are
        performed with respect to the schema thus constructed (though
        different validations may specify different complexTypes or
        element declarations from the schema to be used as the basis for
        validation.)

      [26] http://www.w3.org/TR/xmlschema-1/#composition

    The quoted section goes on to say:

    It is a static error to import two schemas that both define the same
    name in the same symbol space and in the same scope. For instance, a
    query may not import two schemas that include top-level element
    declarations for two elements with the same expanded name.

    This appears to be a tentative and very incomplete foray into
    redefining or at least restating the rules for schema assembly. It
    would seem more appropriate to say that when constructing a schema
    using the mechanisms of XQuery (such as XQuery import), the resulting
    schema must conform to all the Constraints on Schema and other
    normative requirements of the XML Schema Recommendation. That would
    pick up the constraint quoted above -- and many more.
    As noted above: it seems to us that greater clarity is needed to make
    explicit the expected behavior of XQuery and XSLT processors vis-a-vis
    the exploitation of the mechanisms provided by XML schema. For
    example, is it the intention of the XQuery group that all
    user-supplied schema information necessarily be in the form of schema
    documents? That should be your call, but there seems to be no good
    reason for such a restriction. Query processors have always seemed to
    us a particularly promising area for the deployment of binary
    representations of schema components.
    The specifications should also comment on the handling of
    schemaLocation hints in XML instances, the handling of <xsd:include>
    and <xsd:redefine>, and so on. Overall, it appears that XML schema
    lays an effective foundation to meet the needs of Query, but a bit
    more work is needed to make all the details explicit. (It would be
    useful, for example, to mention the various resource resolution
    methods outlined in Part 2 of
    [27]http://www.w3.org/People/cmsmcq/2001/schema-resolution and to make
    clear what constraints, if any, XQuery places on the strategy to be
    followed by a query processor.)
    To some degree these concerns are covered in section 2.6.2:

      [27] http://www.w3.org/People/cmsmcq/2001/schema-resolution

    2.6.2 Schema Import Feature The Schema Import Feature removes the
    limitations specified by Rules 1 through 6 of Basic XQuery.

    During the analysis phase, in-scope schema definitions are derived
    from schemas named in Schema Import clauses in the Prolog. If more
    than one schema is imported, the definitions contained in these
    schemas are collected into a single pool of definitions. This pool of
    definitions must satisfy the conditions for schema validity set out in
    Sections 3 and 5 of [XML Schema] Part 1. In brief, the definitions
    must be valid, they must be complete and they must be unique--that is,
    the pool of definitions must not contain two or more schema components
    with the same name and target namespace. If any of these conditions is
    violated, a static error must be raised.

    The term "pool of definitions" is not defined by any normative
    specification. We believe it would be helpful to make explicit that it
    is an informal way of referring to the set of schema components which
    go to make up a schema -- if, that is, that is what it refers to. A
    minority of the XML Schema Working Group believed that the term "pool
    of definitions" corresponds not to the set of all schema components in
    a schema, but specifically to the top-level component describing the
    schema as a whole.

2.5. Data model lacks normative reference to anyAtomicType?

    The Data Model document uses the anyAtomicType as the return type for
    the dm:typedValue accessor. Unless we have overlooked it, the draft
    provides no introduction or normative reference for this type (we
    believe the reference would be to section 2.4.1 of [28]XQuery, which
    gives a definition for the type.) In any case, it seems that a
    normative reference is needed in the Data Model document.

      [28] http://www.w3.org/TR/2003/WD-xquery-20030502/#d0e1328

2.6. Plans for CR/PR/REC

    As far as we can tell, the Last Call drafts of the data model and
    functions and operators documents do not indicate whether the XML
    Query and XSL Working Groups intend, after Last Call, to advance the
    documents to Candidate Recommendation or to Proposed Recommendation.
    It is also not explicit whether these two drafts will advance ahead of
    the main specification documents which depend on them. We respectfully
    suggest to our colleagues that:
      * It would be a mistake to advance the Data Model and or F&O specs
        ahead of XQuery and XSLT. These specs are of minimal use in
        isolation. As the main documents proceed through the review
        process, it is important to maximize freedom of action in
        resolving issues. Advancing the Data Model or F&O specs to CR or
        REC ahead of the main documents themselves would seem to limit
        such freedom insofar as it makes changes to those documents more
        difficult. We recommend that all the interconnected specs advance
        to CR and PR together.
      * It might be helpful if future drafts were more explicit about the
        Working Groups' plans for them (i.e. whether they are intended for
        CR, PR, or whether there is an intention to make a determination
        after gathering feedback).
      * Given the complexity of this set of inter-related specifications,
        we strongly recommend that all go through a CR phase. For
        comparison, SOAP was a much simpler specification, with large
        numbers of deployed commercial implementations of earlier
        versions, and substantial implementation of version 1.2 features.
        Nonetheless, a CR period was required to demonstrate at least two
        interoperable implementations of each feature. We feel that many
        of the interactions with Schema in particular will become clearer
        during implementation testing, and hence we have particular reason
        to recommend that there be a CR review period. We leave it to the
        Query group (and W3C staff) to determine whether a relatively
        short or a longer CR period will be appropriate to gather the
        necessary implementation experience.

3. Issues of moderate importance

    The XML Schema WG has not discussed the following comments; they were
    suggested by our reviewers and we include them in case they are useful
    to the Query and XSL Working Groups.

3.1. Duration types

    [29]Section 2.4.1 introduces xdt:dayTimeDuration. This section really
    should make clear that a schema document explicitly referencing this
    type MUST contain an import for the xdt namespace, even if resolution
    of the import is built in by schema processors.
    (An alternative would be for these types to be in the XML Schema 1.1
    type system. The WGs should discuss this possibility.)

      [29] http://www.w3.org/TR/2003/WD-xquery-20030502/#d0e1328

3.2. Attribute lexical forms, values, and types

    2.3.2 Currently says: "The typed value of an attribute node with any
    other type annotation is derived from its string value and type
    annotation in a way that is consistent with schema validation." This
    seems to cover the case where a parsed document contains characters
    for the attribute, from which values can be derived. It does not seem
    to cover the case in which the data value is known first (e.g. because
    it was computed). Perhaps it would be better to describe the
    relationship as more symmetric: "The typed value of an attribute node
    is always related to its string value by the mechanisms of XML
    Schema."
    Also: some members of the Schema WG believe we should encourage you to
    give a bit of attention to whitespace handling, in order to avoid
    unnecessary inconsistencies with XML Schema and/or user expectations.
    In XML Schema, whitespace normalization happens during the process of
    identifying a lexical form, not as part of the lexical -> value
    mapping; it is handled by Structures, not Datatypes. If a given simple
    type has a whitespace facet of "collapse", how does a query processor
    deal with that? The appropriate normative references to XML Schema
    should be made. The potential issues with "i18n-collapse" make
    coordination on whitespace all the more important.

3.3. On URIs

    Section 4.2 Currently says: "The string literal used in a namespace
    declaration must be a valid URI, and may not be a zero-length string."
    What does the term "valid URI" mean? Normative references should be
    provided for any such terminology, and any constraints clearly
    explained. Possible specific recommendations:

    The string literal used in a namespace declaration must be of non-zero
    length and must
      * be a valid lexical form per the definition of xsd:anyURI (N.B.
        this puts very few constraints on the string)

    or
      * be (lexically identical to) a `namespace name' as specified in
        [30]http://www.w3.org/TR/REC-xml-names/#ns-decl

      [30] http://www.w3.org/TR/REC-xml-names/#ns-decl

3.4. More specific types

    On the general use of the term "more specific (type)": Currently, we
    believe that this phrase is used to denote a derived type. It would be
    better not to use "more specific" if in fact "derived" is what is
    meant. On the other hand, if "more specific" doesn't mean "derived", a
    definition of what it does mean would help.

4. Minor comments (typos, etc.)

    2 (Basics) Currently says: "...so function names appearing without a
    namespace prefix can be assumed to be in this namespace." This would
    be clearer as: "...so function names appearing in examples or
    definitions can be assumed to be in the namespace of XPath/XQuery
    functions."
    2.1.2 Currently says: "...these functions always returns..." Should
    be: "...these functions always return..."
    2.3 Currently says: "...the value of an element is represented by the
    children..." The term "represented" doesn't seem right. "Constituted"
    seems to work better, but doesn't adequately deal with mixed content.
    2.3.1 Currently says: "The relative order among free-floating nodes
    (those not in a document) is also implementation-defined but stable."
    Does it intend to say: "The relative order among free-floating nodes
    (those not in a document) is also implementation-defined but stable
    with respect to themselves and all other nodes."? The original left it
    a bit unclear whether the order is total, or only among the free
    floating nodes.
    2.4.2 (type checking) Currently says: para 3: "The static type of an
    expression may be either a named type or a structural description"
    para 4: "The dynamic type of a value may be either a structural type
    (such as `sequence of integers') or a named type"
    The juxtaposition of the terms "structural description" and
    "structural type" is a little confusing at first. Perhaps another term
    could be found for one of them?
    3.5.1 Shows an example:

    The following comparison is true because the two constructed nodes
    have the same value after atomization, even though they have different
    identities:
         <a>5</a> eq <a>5</a>

    It might be interesting to additionally comment on:
         <a>5</a> eq <b>5</b>

    Are these two also equal? They would seem to have the same value after
    atomization.
Received on Monday, 14 July 2003 19:02:19 UTC