- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Fri, 01 Aug 2003 13:45:10 -0600
- To: public-qt-comments@w3.org
- Cc: W3C XML Schema IG <w3c-xml-schema-ig@w3.org>
Dear colleagues: The XML Schema Working Group congratulates the XML Query and XSL Working Groups on their progress, and in particular on the Last Call draft of "XQuery 1.0 and XPath 2.0 Data Model". We have now reviewed the last call draft, and our comments are at http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html (an ASCII version is reproduced below for the convenience of those with access to their email but not to the Web). We apologize for the tardy arrival of these notes. -C. M. Sperberg-McQueen, for the W3C XML Schema WG W3C XML Schema WG Notes on XQuery 1.0 and XPath 2.0 Data Model 1 August 2003 _________________________________________________________________ * 1. [7]Schema-related issues + 1.1. [8]The term type + 1.2. [9]Derivation of simple types + 1.3. [10]Items and singleton sequences + 1.4. [11]The implications of [validity] != valid + 1.5. [12]Anonymous local types + 1.6. [13]Target namespaces + 1.7. [14]Lexical spaces, reference, containment * 2. [15]Other technical issues + 2.1. [16]Atomic values and singleton sequences + 2.2. [17]Node identity + 2.3. [18]Names in namespace nodes + 2.4. [19]Elements labeled xs:anyType in the PSVI + 2.5. [20]Minor items o 2.5.1. [21]Infoset-only processing o 2.5.2. [22]Prefix property o 2.5.3. [23]Sequences in sequences o 2.5.4. [24]Synthetic data models * 3. [25]Editorial notes + 3.1. [26]Comments reviewed by the Working Group + 3.2. [27]Comments not reviewed by the Working Group _________________________________________________________________ This document contains comments on the Last Call draft of 2 May 2003 of [28]XQuery 1.0 and XPath 2.0 Data Model (hereinafter DM) from the XML Schema Working Group. These comments were prepared by an ad hoc Task Force and most of them were reviewed and revised by the XML Schema Working Group at its teleconference of 1 August 2003. The editorial comments included in [29]section 3.2 were not reviewed by the XML Schema Working Group. In addition to the comments below, please note that several of the [30]general comments sent on 14 July relate to the data model specification. Some of those comments sent earlier overlap with some comments below. 1. Schema-related issues The comments in this section relate to the use of XML Schema in the F/O specification and thus to the particular area of responsibility borne by the XML Schema WG. 1.1. The term type DM appears to use the term type for several related but different concepts; we believe it would be desirable if you were to clarify the meaning of the term, or at least if you called the reader's attention to its overloading. The Data Model specification appeals to the Formal Semantics specification, which says types are XML Schema types. However, XML Schema tries to avoid the term "type", instead using "type definition". Among the uses of "type" we have noticed are: 1. T1. a name (for example, as used by the dm:type accessor). 2. T2. a set of values (this sense is used by XML Schema's internal work on a formalization, which includes a "Type Lattice"). 3. T3. an XML Schema Type Definition component (simple or complex). Defines a set of values and certain properties, such as [name], [baseType], etc. 4. T4. an OO class. Defines a set of values, inheritance info, and operators. Specifically, we suggest that the dm:type accessor be renamed to dm:type-name and that "type" be explicitly defined. If "type" is just a synonym for "type definition", say so in the definition ot "type". 1.2. Derivation of simple types Section 5 Atomic Values reads in part: An XML Schema simple type [XMLSchema Part 2] may be primitive or derived by restriction, list, or union. We think it will help avoid confusion among users, implementors, and (not least) discussion among Working Groups if you use XML Schema terminology here. Perhaps: An XML Schema simple type definition [XMLSchema Part 2] has a [variety], which may be atomic, list or union. If [variety] is atomic, the type definition may be primitive or derived by restriction. The XML Schema WG wishes to de-emphasize the use of the term "derived by" in XML Schema Part 2 in describing union and list contruction. The term "derived by" is used only colloquially there and is unfortunately confused with derivation in the proper sense (i.e. restriction and extension). All non-primitive simple types are derived by restriction. List types may be restrictions of xs:anySimpleType or other lists. Similarly for union types. Please don't propagate the confusion we created. [We are aware that it would be useful to have a simple term other than derivation to describe the relation between a list type and its item type, or that between a union type and its member types; we need it as much as you do. Suggestions are welcome.] 1.3. Items and singleton sequences Section 6 Sequences reads in part: An important characteristic of the data model is that there is no distinction between an item (a node or an atomic value) and a singleton sequence containing that item. One consequence of this characteristic is that the types xs:integer and a list of xs:integer with length constrained to 1 have exactly the same value space in the Data Model. That is, each value in the value space is a sequence of a single xs:integer. This is different from the XML Schema value spaces for the two types. Might this cause a problem for functions or other uses of the Data Model? We believe further discussion is needed here. 1.4. The implications of [validity] != valid Section 3.6 para 2 reads in part: "The only information that can be inferred from an invalid or not known validity value is that the information item is well-formed." This is not true in the general case: the values of the properties [validity] and [validation attempted] interact, so that some inferences beyond well-formedness can be made. (If [validity] is 'notKnown', for example, we can infer without examining the PSVI that [validation attempted] is not 'full'. If for some node N [validity] is 'invalid', we can infer that declarations are available for at least some element or attribute information items in the subtree rooted in N.) The data model doesn't have to be interested in those inferences, but it is simply incorrect to say that they don't exist. On the whole, we believe that that the data model misses an opportunity by failing to exploit the information contained in the [validity] and [validation attempted] properties more fully. 1.5. Anonymous local types Section 3.6 has an extended list of cases describing how the namespace and local name of a type are found. This list reads in part: * If the [validity] property exists and is `valid': + ... + If the [type definition] property exists and its {name} property is present: o the {target namespace} and {name} properties of the [type definition] property. + ... + If [type definition anonymous] exists: o If it is false: the [type definition namespace] and the [type definition name] o Otherwise, the namespace and local name of the appropriate anonymous type name. The above structure does not handle the case of an anonymous type when the schema processor provides the [type definition] property instead of the [type definition name] property and its fellows. We think the [type definition] rule can readily be rephrased so that the result is parallel to the case when the upstream schema processor provides [type defintion name] instead of [type definition]: * If the [validity] property exists and is `valid': + ... + If the [type definition] property exists[DEL: and its {name} property is present :DEL] : o [INS: If the [type definition]'s {name} property exists: :INS] the {target namespace} and {name} properties of the [type definition] property. o [INS: Otherwise, the namespace and local name of the appropriate anonymous type name. :INS] + ... 1.6. Target namespaces Section 3.4 Types reads in part: Since named types in XML Schema are global, an expanded-QName uniquely identifies such a type. The namespace name of the expanded-QName is the target namespace of the schema and its local name is the name of the type. A schema does not have a target namespace; a schema document has a target namespace. One possible repair would be: Since named types in XML Schema are global, an expanded-QName uniquely identifies such a type. The namespace name of the expanded-QName is the {target namespace} property of the type definition, and its local name is the {name} property of the type definition. Another might be: Since named types in XML Schema are global, an expanded-QName uniquely identifies such a type within a schema. We believe this to be relatively important. 1.7. Lexical spaces, reference, containment Section 2 refers to: "the lexical space referring to constructs of the form prefix:local-name". Perhaps substitute "the lexical space containing ..." Lexical forms may, with a certain investment of time and energy, be thought of as `referring to' values, but the lexical space as a whole does not refer. The lexical space of QName does contain, even if it does not refer to, constructs of the form prefix:local-name. 2. Other technical issues The comments in this section relate to technical issues other than the use of XML Schema in the F/O specification; the XML Schema WG claims no particular responsibility or expertise on these questions but raises them because they seem to need attention. 2.1. Atomic values and singleton sequences In section 2 Notation, after indicating how to represent Node and Item in the syntax, DM says "Some accessors can accept or return sequences." This may need clarification; elsewhere we had been led to think that everything is a sequence. Please emphasize that Node, Item, and atomic values in the syntax correspond to singleton sequences, and that some accessors accept less-constrained sequences. Some members of the XML Schema WG add that DM seems to conflate the notations of list and sequence, which are distinct and should not be confused. 2.2. Node identity Sections 3.1 and 3.2 raise the question of node identity and stable ordering. Does a node maintain its identity on being modified? on being added to another tree? If so, wouldn't its ordering change? 2.3. Names in namespace nodes Section 4.3 Elements lists, among the constraints that element nodes must satisfy: 7. The namespace nodes of an element must have distinct names. This requirement contradicts the definition of dm:name for namespace nodes, for processors that choose not to preserve prefix information. All their namespace nodes will name [or have] the same name, namely the empty sequence. 2.4. Elements labeled xs:anyType in the PSVI Section 4.3.2 says in part: If the element node's type is xs:anyType, the dm:typed-value accessor returns the node's string value as xs:anySimpleType. This seems to contradict section 4.1.6: If the node is an element node with type xs:anyType, then its typed value is equal to its string value, as an instance of xdt:untypedAtomic. 2.5. Minor items 2.5.1. Infoset-only processing Section 3.6 says, under the heading "Infoset-only processing": Note that this processing is only performed if no part of the subtree that contains the node was schema validated. In particular, Infoset-only processing does not apply to subtrees that are "skip" validated in a document. Which subtree is "the" subtree? A given node is contained by many subtrees. Perhaps read "if no part of any subtree containing the node was schema validated"? 2.5.2. Prefix property Section 4.3.4 says: An implementation must construct the value of the [prefix] property as if the following algorithm was applied: if the element has at least one namespace node whose namespace URI is the same as the namespace name of the xs:QName returned by the dm: node-name accessor ... Please be clear about the meaning of "namespace URI" or the namespace node. Is it the [uri] property of the namespace node or the namespace uri part of the node-name property of the namespace node? 2.5.3. Sequences in sequences Section 2 reads in part: In a sequence, V may be a Node or AtomicValue, or the union (choice) of several categories of Items. It's not immediately clear to all readers what this means. It appears a first glance to say that if V*, V?, or V+ appear in (the description of) a sequence, then V may be or denote a Node or an AtomicValue or a union. But if sequences cannot appear in sequences, and V* and V? and V+ all denote sequences (as specified in the list immediately above), then if V*, V?, or V+ appear in (the description of) a sequence S, then sequence S would appear to violate the rule that sequences cannot contain other sequences. (Unless "In a sequence" means `When appearing as the description of a sequence'.) 2.5.4. Synthetic data models Section 3.3, para 2 reads: Although we describe construction of a data model in terms of infoset properties, an infoset is not an absolutely necessary precondition for building an instance of the Data Model. Purely synthetic data model instances are entirely appropriate as long as they obey all of the constraints described in this document. We agree that it is worthwhile to point out that synthetic instances of the Data Model are possible, and need not derive from some pre-existing XML document or information set. Some members of the XML Schema WG believe, however, that the formulation just quoted does not do full justice to the abstract nature of the infoset as a concept. Any process which can create an instance of the Data Model clearly has access to the set of information defined by the Infoset Rec and can thus be thought to have, or be, an infoset itself. To this line of thinking, the construction of a synthetic Data Model is itself a sufficient demonstration that the necessary information, and thus the necessary infoset, is available. Two possible fixes may be worth suggesting: Although we describe construction of a data model in terms of infoset properties, a [INS: pre-existing :INS] infoset is not an absolutely necessary precondition for building an instance of the Data Model. Purely synthetic data model instances are entirely appropriate as long as they obey all of the constraints described in this document. Or Although we describe construction of a data model in terms of XML infoset properties, a [INS: pre-existing XML document :INS] is not an absolutely necessary precondition for building an instance of the Data Model. Purely synthetic data model instances are entirely appropriate as long as they obey all of the constraints described in this document. 3. Editorial notes In the course of our work, some editorial points were noted; we list them here for the use of the editors. We do not particularly expect formal responses on these comments. 3.1. Comments reviewed by the Working Group 1. QNames. Section 2 Notation reads in part: [Definition: An expanded-QName is a pair of values consisting of a namespace URI and a local name. They belong to the value space of the XML Schema type xs:QName. When this document refers to xs:QName we always mean the value space, i.e. a namespace URI, local name pair (and not the lexical space referring to constructs of the form prefix:local-name).] Thank you for being specific about value-space vs. lexical space. Please also be specific on whether the namespace URI can be absent or not. 2. Section 3.3: The definition [Definition: A Post Schema Validation Infoset, or PSVI, is the augmented infoset produced by an XML Schema validation episode.]. has an extra full stop at the end. 3. Section 3.4 para 6: It returns xs:anyType or xs:anySimpleType if no type information exists, or if it failed W3C XML Schema validity assessment. Are "xs:anyType" and "xs:anySimpleType" expanded-QNames? They don't look like it. 4. Section 4.1.1: We suggest using "[base-uri]" rather than "base-uri" when referring to the infoset propery, to avoid confusion with the base-uri accessor. In general, we believe all references to infoset properties should use the brackets. 5. Section 4.1.3: dm:node-name returns the qualified name of the element or attribute. The XML Infoset does not define a [qualified name] for items. For "qualified name" perhaps read "expanded QName". 6. Section 4.1.6, bulleted list: Two of the bullets begin "If the item is" and the rest begin "If the node is". Why are these different? At first we thought the difference reflected a crucial difference in the tests being performed, but the entire list is about nodes; there are no items under discussion which are not nodes. 7. Section 4.3.2, repeated in 4.4.2: the first bullet item says that under certain circumstances the result will be an "atomic value 3.14 of type decimal". Should that be "xs:decimal"? 3.2. Comments not reviewed by the Working Group When the XML Schema Working Group reviewed the draft comments provided by our task force, we focused on substantive comments; the following editorial comments were not reviewed owing to lack of time. They are transmitted on behalf of the Working Group, but they do not necessarily carry the consensus of the Working Group. 1. Section 3.3 para -1: "inconsistent data models are forbidden". There has not thus far been any definition of consistency for data models; if it's provided elsewhere, a forward reference might be in order. If it's not provided elsewhere, it needs to be. 2. abstract. For "the data model of at least XPath 2.0 ... and any other specifications that reference it" perhaps read "the data model of XPath 2.0 ... and of any other specifications that reference it". 3. Section 1 Introduction para 2: "... it defines precisely the information contained in the input to an XSLT or XQuery processor." Surely it specifies a minimum, by defining the information which must be contained, rather than specifying both a minimum and a maximum by forbidding any input to contain any other information. If one has concealed a coded message in a document by varying the amount of white space before the '>' characters which close the tags in an XML document, that coded message is certainly (a) information, and (b) present in the input to the processor and (c) not defined by this Data Model. It may make sense to say that this document defines precisely which information present in the input it is that is relevant to XSLT or XQuery processors (although formulating this without falling into traps is also fraught with difficulty), but it seems simply wrong to deny that information other than what is defined here is present in the input. 4. Section 2 Notation. Since this is to be a free-standing document, a short description of what the sample signature means would be useful. As it is, the combination of (a) the sample, clearly intended to help the reader understand the notation, with (b) the absence of any explication, manages to do a rather effective job of sapping the reader's will to continue reading. 5. Section 3.3 para -1. "Validation is described conceptually as a process of ..." -- either insert a pointer to the section or document which provides this description or (if this is the description) read "Validation is a process of ..." 6. Section 3.4 para 2. For "For named types, which includes ..." read "For named types, which include ..." (subject-verb agreement) 7. section 3.4 para 6. "The data model defines ... It returns ... if it ..." The noun phrase "data model" is almost certainly not intended as the antecedent of either of the two occurrences of it, but syntactically it has a better claim than any other noun phrase around. For the first, perhaps read "The accessor"; for the second, perhaps "the node" or "the argument". 8. section 3.4 para -1. For "The semantics of such operations, e.g. checking if a particular instance of an element node has a given type is defined in [Formal semantics]" read "... if a particular instance ... has a given type, is defined in ...". References 1. http://www.w3.org/ 2. http://www.w3.org/Architecture/ 3. http://www.w3.org/XML/Group 4. http://www.w3.org/XML/Group/Schemas 5. http://www.w3.org/Member/Eventscal.html 6. http://www.w3.org/Member/#confidential 7. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e69 8. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e74 9. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e134 10. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e168 11. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e189 12. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e205 13. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e277 14. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e300 15. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e320 16. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e325 17. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e340 18. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e347 19. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e361 20. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e390 21. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e393 22. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e416 23. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e432 24. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e454 25. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e483 26. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e488 27. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html#d0e589 28. http://www.w3.org/TR/xpath-datamodel/ 29. http://www.w3.org/XML/Group/2003/08/xmlschema-datamodel-comments.html 30. http://www.w3.org/XML/Group/2003/07/xmlschema-query-notes.html
Received on Friday, 1 August 2003 15:46:27 UTC