XML Schema WG comments on Data Model

Dear colleagues:

The XML Schema Working Group congratulates the XML Query and XSL
Working Groups on their progress, and in particular on the Last Call
draft of "XQuery 1.0 and XPath 2.0 Data Model".

We have now reviewed the last call draft, and our comments are at
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html
(an ASCII version is reproduced below for the convenience of those
with access to their email but not to the Web).

We apologize for the tardy arrival of these notes.

-C. M. Sperberg-McQueen, for the W3C XML Schema WG



W3C XML Schema WG

Notes on XQuery 1.0 and XPath 2.0 Data Model

1 August 2003
      _________________________________________________________________

      * 1. [7]Schema-related issues
           + 1.1. [8]The term type
           + 1.2. [9]Derivation of simple types
           + 1.3. [10]Items and singleton sequences
           + 1.4. [11]The implications of [validity] != valid
           + 1.5. [12]Anonymous local types
           + 1.6. [13]Target namespaces
           + 1.7. [14]Lexical spaces, reference, containment
      * 2. [15]Other technical issues
           + 2.1. [16]Atomic values and singleton sequences
           + 2.2. [17]Node identity
           + 2.3. [18]Names in namespace nodes
           + 2.4. [19]Elements labeled xs:anyType in the PSVI
           + 2.5. [20]Minor items
                o 2.5.1. [21]Infoset-only processing
                o 2.5.2. [22]Prefix property
                o 2.5.3. [23]Sequences in sequences
                o 2.5.4. [24]Synthetic data models
      * 3. [25]Editorial notes
           + 3.1. [26]Comments reviewed by the Working Group
           + 3.2. [27]Comments not reviewed by the Working Group
      _________________________________________________________________

    This document contains comments on the Last Call draft of 2 May 2003
    of [28]XQuery 1.0 and XPath 2.0 Data Model (hereinafter DM) from the
    XML Schema Working Group. These comments were prepared by an ad hoc
    Task Force and most of them were reviewed and revised by the XML
    Schema Working Group at its teleconference of 1 August 2003. The
    editorial comments included in [29]section 3.2 were not reviewed by
    the XML Schema Working Group.
    In addition to the comments below, please note that several of the
    [30]general comments sent on 14 July relate to the data model
    specification. Some of those comments sent earlier overlap with some
    comments below.

1. Schema-related issues

    The comments in this section relate to the use of XML Schema in the
    F/O specification and thus to the particular area of responsibility
    borne by the XML Schema WG.

1.1. The term type

    DM appears to use the term type for several related but different
    concepts; we believe it would be desirable if you were to clarify the
    meaning of the term, or at least if you called the reader's attention
    to its overloading.
    The Data Model specification appeals to the Formal Semantics
    specification, which says types are XML Schema types. However, XML
    Schema tries to avoid the term "type", instead using "type
    definition".
    Among the uses of "type" we have noticed are:
     1. T1. a name (for example, as used by the dm:type accessor).
     2. T2. a set of values (this sense is used by XML Schema's internal
        work on a formalization, which includes a "Type Lattice").
     3. T3. an XML Schema Type Definition component (simple or complex).
        Defines a set of values and certain properties, such as [name],
        [baseType], etc.
     4. T4. an OO class. Defines a set of values, inheritance info, and
        operators.

    Specifically, we suggest that the dm:type accessor be renamed to
    dm:type-name and that "type" be explicitly defined. If "type" is just
    a synonym for "type definition", say so in the definition ot "type".

1.2. Derivation of simple types

    Section 5 Atomic Values reads in part:

      An XML Schema simple type [XMLSchema Part 2] may be primitive or
      derived by restriction, list, or union.

    We think it will help avoid confusion among users, implementors, and
    (not least) discussion among Working Groups if you use XML Schema
    terminology here. Perhaps:

      An XML Schema simple type definition [XMLSchema Part 2] has a
      [variety], which may be atomic, list or union. If [variety] is
      atomic, the type definition may be primitive or derived by
      restriction.

    The XML Schema WG wishes to de-emphasize the use of the term "derived
    by" in XML Schema Part 2 in describing union and list contruction. The
    term "derived by" is used only colloquially there and is unfortunately
    confused with derivation in the proper sense (i.e. restriction and
    extension). All non-primitive simple types are derived by restriction.
    List types may be restrictions of xs:anySimpleType or other lists.
    Similarly for union types. Please don't propagate the confusion we
    created.
    [We are aware that it would be useful to have a simple term other than
    derivation to describe the relation between a list type and its item
    type, or that between a union type and its member types; we need it as
    much as you do. Suggestions are welcome.]

1.3. Items and singleton sequences

    Section 6 Sequences reads in part:

      An important characteristic of the data model is that there is no
      distinction between an item (a node or an atomic value) and a
      singleton sequence containing that item.

    One consequence of this characteristic is that the types xs:integer
    and a list of xs:integer with length constrained to 1 have exactly the
    same value space in the Data Model. That is, each value in the value
    space is a sequence of a single xs:integer. This is different from the
    XML Schema value spaces for the two types. Might this cause a problem
    for functions or other uses of the Data Model?
    We believe further discussion is needed here.

1.4. The implications of [validity] != valid

    Section 3.6 para 2 reads in part: "The only information that can be
    inferred from an invalid or not known validity value is that the
    information item is well-formed."
    This is not true in the general case: the values of the properties
    [validity] and [validation attempted] interact, so that some
    inferences beyond well-formedness can be made. (If [validity] is
    'notKnown', for example, we can infer without examining the PSVI that
    [validation attempted] is not 'full'. If for some node N [validity] is
    'invalid', we can infer that declarations are available for at least
    some element or attribute information items in the subtree rooted in
    N.) The data model doesn't have to be interested in those inferences,
    but it is simply incorrect to say that they don't exist.
    On the whole, we believe that that the data model misses an
    opportunity by failing to exploit the information contained in the
    [validity] and [validation attempted] properties more fully.

1.5. Anonymous local types

    Section 3.6 has an extended list of cases describing how the namespace
    and local name of a type are found. This list reads in part:

      * If the [validity] property exists and is `valid':
           + ...
           + If the [type definition] property exists and its {name}
             property is present:
                o the {target namespace} and {name} properties of the
                  [type definition] property.
           + ...
           + If [type definition anonymous] exists:
                o If it is false: the [type definition namespace] and the
                  [type definition name]
                o Otherwise, the namespace and local name of the
                  appropriate anonymous type name.

    The above structure does not handle the case of an anonymous type when
    the schema processor provides the [type definition] property instead
    of the [type definition name] property and its fellows.
    We think the [type definition] rule can readily be rephrased so that
    the result is parallel to the case when the upstream schema processor
    provides [type defintion name] instead of [type definition]:

      * If the [validity] property exists and is `valid':
           + ...
           + If the [type definition] property exists[DEL: and its {name}
             property is present :DEL] :
                o [INS: If the [type definition]'s {name} property exists:
                  :INS] the {target namespace} and {name} properties of
                  the [type definition] property.
                o [INS: Otherwise, the namespace and local name of the
                  appropriate anonymous type name. :INS]
           + ...

1.6. Target namespaces

    Section 3.4 Types reads in part:

      Since named types in XML Schema are global, an expanded-QName
      uniquely identifies such a type. The namespace name of the
      expanded-QName is the target namespace of the schema and its local
      name is the name of the type.

    A schema does not have a target namespace; a schema document has a
    target namespace.
    One possible repair would be:

      Since named types in XML Schema are global, an expanded-QName
      uniquely identifies such a type. The namespace name of the
      expanded-QName is the {target namespace} property of the type
      definition, and its local name is the {name} property of the type
      definition.

    Another might be:

      Since named types in XML Schema are global, an expanded-QName
      uniquely identifies such a type within a schema.

    We believe this to be relatively important.

1.7. Lexical spaces, reference, containment

    Section 2 refers to: "the lexical space referring to constructs of the
    form prefix:local-name". Perhaps substitute "the lexical space
    containing ..." Lexical forms may, with a certain investment of time
    and energy, be thought of as `referring to' values, but the lexical
    space as a whole does not refer. The lexical space of QName does
    contain, even if it does not refer to, constructs of the form
    prefix:local-name.

2. Other technical issues

    The comments in this section relate to technical issues other than the
    use of XML Schema in the F/O specification; the XML Schema WG claims
    no particular responsibility or expertise on these questions but
    raises them because they seem to need attention.

2.1. Atomic values and singleton sequences

    In section 2 Notation, after indicating how to represent Node and Item
    in the syntax, DM says "Some accessors can accept or return
    sequences."
    This may need clarification; elsewhere we had been led to think that
    everything is a sequence. Please emphasize that Node, Item, and atomic
    values in the syntax correspond to singleton sequences, and that some
    accessors accept less-constrained sequences.
    Some members of the XML Schema WG add that DM seems to conflate the
    notations of list and sequence, which are distinct and should not be
    confused.

2.2. Node identity

    Sections 3.1 and 3.2 raise the question of node identity and stable
    ordering.
    Does a node maintain its identity on being modified? on being added to
    another tree? If so, wouldn't its ordering change?

2.3. Names in namespace nodes

    Section 4.3 Elements lists, among the constraints that element nodes
    must satisfy:
    7. The namespace nodes of an element must have distinct names.
    This requirement contradicts the definition of dm:name for namespace
    nodes, for processors that choose not to preserve prefix information.
    All their namespace nodes will name [or have] the same name, namely
    the empty sequence.

2.4. Elements labeled xs:anyType in the PSVI

    Section 4.3.2 says in part:

      If the element node's type is xs:anyType, the dm:typed-value
      accessor returns the node's string value as xs:anySimpleType.

    This seems to contradict section 4.1.6:

      If the node is an element node with type xs:anyType, then its typed
      value is equal to its string value, as an instance of
      xdt:untypedAtomic.

2.5. Minor items

2.5.1. Infoset-only processing

    Section 3.6 says, under the heading "Infoset-only processing":

      Note that this processing is only performed if no part of the
      subtree that contains the node was schema validated. In particular,
      Infoset-only processing does not apply to subtrees that are "skip"
      validated in a document.

    Which subtree is "the" subtree? A given node is contained by many
    subtrees. Perhaps read "if no part of any subtree containing the node
    was schema validated"?

2.5.2. Prefix property

    Section 4.3.4 says:

      An implementation must construct the value of the [prefix] property
      as if the following algorithm was applied: if the element has at
      least one namespace node whose namespace URI is the same as the
      namespace name of the xs:QName returned by the dm: node-name
      accessor ...

    Please be clear about the meaning of "namespace URI" or the namespace
    node. Is it the [uri] property of the namespace node or the namespace
    uri part of the node-name property of the namespace node?

2.5.3. Sequences in sequences

    Section 2 reads in part:

      In a sequence, V may be a Node or AtomicValue, or the union
      (choice) of several categories of Items.

    It's not immediately clear to all readers what this means. It appears
    a first glance to say that if V*, V?, or V+ appear in (the description
    of) a sequence, then V may be or denote a Node or an AtomicValue or a
    union. But if sequences cannot appear in sequences, and V* and V? and
    V+ all denote sequences (as specified in the list immediately above),
    then if V*, V?, or V+ appear in (the description of) a sequence S,
    then sequence S would appear to violate the rule that sequences cannot
    contain other sequences. (Unless "In a sequence" means `When appearing
    as the description of a sequence'.)

2.5.4. Synthetic data models

    Section 3.3, para 2 reads:

      Although we describe construction of a data model in terms of
      infoset properties, an infoset is not an absolutely necessary
      precondition for building an instance of the Data Model. Purely
      synthetic data model instances are entirely appropriate as long as
      they obey all of the constraints described in this document.

    We agree that it is worthwhile to point out that synthetic instances
    of the Data Model are possible, and need not derive from some
    pre-existing XML document or information set. Some members of the XML
    Schema WG believe, however, that the formulation just quoted does not
    do full justice to the abstract nature of the infoset as a concept.
    Any process which can create an instance of the Data Model clearly has
    access to the set of information defined by the Infoset Rec and can
    thus be thought to have, or be, an infoset itself. To this line of
    thinking, the construction of a synthetic Data Model is itself a
    sufficient demonstration that the necessary information, and thus the
    necessary infoset, is available.
    Two possible fixes may be worth suggesting:

      Although we describe construction of a data model in terms of
      infoset properties, a [INS: pre-existing :INS] infoset is not an
      absolutely necessary precondition for building an instance of the
      Data Model. Purely synthetic data model instances are entirely
      appropriate as long as they obey all of the constraints described
      in this document.

    Or

      Although we describe construction of a data model in terms of XML
      infoset properties, a [INS: pre-existing XML document :INS] is not
      an absolutely necessary precondition for building an instance of
      the Data Model. Purely synthetic data model instances are entirely
      appropriate as long as they obey all of the constraints described
      in this document.

3. Editorial notes

    In the course of our work, some editorial points were noted; we list
    them here for the use of the editors. We do not particularly expect
    formal responses on these comments.

3.1. Comments reviewed by the Working Group

     1. QNames. Section 2 Notation reads in part:

      [Definition: An expanded-QName is a pair of values consisting of a
      namespace URI and a local name. They belong to the value space of
      the XML Schema type xs:QName. When this document refers to xs:QName
      we always mean the value space, i.e. a namespace URI, local name
      pair (and not the lexical space referring to constructs of the form
      prefix:local-name).]
        Thank you for being specific about value-space vs. lexical space.
        Please also be specific on whether the namespace URI can be absent
        or not.
     2. Section 3.3: The definition

      [Definition: A Post Schema Validation Infoset, or PSVI, is the
      augmented infoset produced by an XML Schema validation episode.].
        has an extra full stop at the end.
     3. Section 3.4 para 6:

      It returns xs:anyType or xs:anySimpleType if no type information
      exists, or if it failed W3C XML Schema validity assessment.
        Are "xs:anyType" and "xs:anySimpleType" expanded-QNames? They
        don't look like it.
     4. Section 4.1.1: We suggest using "[base-uri]" rather than
        "base-uri" when referring to the infoset propery, to avoid
        confusion with the base-uri accessor. In general, we believe all
        references to infoset properties should use the brackets.
     5. Section 4.1.3:

      dm:node-name returns the qualified name of the element or
      attribute.
        The XML Infoset does not define a [qualified name] for items. For
        "qualified name" perhaps read "expanded QName".
     6. Section 4.1.6, bulleted list: Two of the bullets begin "If the
        item is" and the rest begin "If the node is". Why are these
        different? At first we thought the difference reflected a crucial
        difference in the tests being performed, but the entire list is
        about nodes; there are no items under discussion which are not
        nodes.
     7. Section 4.3.2, repeated in 4.4.2: the first bullet item says that
        under certain circumstances the result will be an "atomic value
        3.14 of type decimal". Should that be "xs:decimal"?

3.2. Comments not reviewed by the Working Group

    When the XML Schema Working Group reviewed the draft comments provided
    by our task force, we focused on substantive comments; the following
    editorial comments were not reviewed owing to lack of time. They are
    transmitted on behalf of the Working Group, but they do not
    necessarily carry the consensus of the Working Group.
     1. Section 3.3 para -1: "inconsistent data models are forbidden".
        There has not thus far been any definition of consistency for data
        models; if it's provided elsewhere, a forward reference might be
        in order. If it's not provided elsewhere, it needs to be.
     2. abstract. For "the data model of at least XPath 2.0 ... and any
        other specifications that reference it" perhaps read "the data
        model of XPath 2.0 ... and of any other specifications that
        reference it".
     3. Section 1 Introduction para 2: "... it defines precisely the
        information contained in the input to an XSLT or XQuery
        processor."
        Surely it specifies a minimum, by defining the information which
        must be contained, rather than specifying both a minimum and a
        maximum by forbidding any input to contain any other information.
        If one has concealed a coded message in a document by varying the
        amount of white space before the '>' characters which close the
        tags in an XML document, that coded message is certainly (a)
        information, and (b) present in the input to the processor and (c)
        not defined by this Data Model.
        It may make sense to say that this document defines precisely
        which information present in the input it is that is relevant to
        XSLT or XQuery processors (although formulating this without
        falling into traps is also fraught with difficulty), but it seems
        simply wrong to deny that information other than what is defined
        here is present in the input.
     4. Section 2 Notation. Since this is to be a free-standing document,
        a short description of what the sample signature means would be
        useful. As it is, the combination of (a) the sample, clearly
        intended to help the reader understand the notation, with (b) the
        absence of any explication, manages to do a rather effective job
        of sapping the reader's will to continue reading.
     5. Section 3.3 para -1. "Validation is described conceptually as a
        process of ..." -- either insert a pointer to the section or
        document which provides this description or (if this is the
        description) read "Validation is a process of ..."
     6. Section 3.4 para 2. For "For named types, which includes ..." read
        "For named types, which include ..." (subject-verb agreement)
     7. section 3.4 para 6. "The data model defines ... It returns ... if
        it ..." The noun phrase "data model" is almost certainly not
        intended as the antecedent of either of the two occurrences of it,
        but syntactically it has a better claim than any other noun phrase
        around. For the first, perhaps read "The accessor"; for the
        second, perhaps "the node" or "the argument".
     8. section 3.4 para -1. For "The semantics of such operations, e.g.
        checking if a particular instance of an element node has a given
        type is defined in [Formal semantics]" read "... if a particular
        instance ... has a given type, is defined in ...".

References

    1. http://www.w3.org/
    2. http://www.w3.org/Architecture/
    3. http://www.w3.org/XML/Group
    4. http://www.w3.org/XML/Group/Schemas
    5. http://www.w3.org/Member/Eventscal.html
    6. http://www.w3.org/Member/#confidential
    7. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e69
    8. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e74
    9. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e134
   10. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e168
   11. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e189
   12. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e205
   13. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e277
   14. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e300
   15. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e320
   16. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e325
   17. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e340
   18. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e347
   19. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e361
   20. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e390
   21. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e393
   22. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e416
   23. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e432
   24. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e454
   25. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e483
   26. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e488
   27. 
http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e589
   28. http://www.w3.org/TR/xpath-datamodel/
   29. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html
   30. http://www.w3.org/XML/Group/2003/07/xmlschema-query-notes.html

Received on Friday, 1 August 2003 15:46:27 UTC