- From: Paul Cotton <paulcotton@alumni.uwaterloo.ca>
- Date: Mon, 29 May 2000 12:28:34 -0400
- To: www-xml-schema-comments@w3.org
- Cc: w3c-xml-query-wg@w3.org
Here is the second set of comments from the XML Query Working Group on the XML Schema last call Working Draft. http://www.w3.org/TR/2000/WD-xmlschema-0-20000407/ http://www.w3.org/TR/2000/WD-xmlschema-1-20000407/ http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/ In this version, we address the following issues: 2. XML Query data model related issues 2.1 Treatment of anonymous types 2.2 Schema for schemaless documents 2.3 Treatment of collections 2.4 Problems with minoccurs and maxoccurs 2.5 Identity-constraints tables 2.6 Referential mechanisms across multiple documents 2.7 Internal representation of datatypes 2.8 Infoset contributions for simple types 3. Algebra related issues 3.1 Operations 3.2 Treatment of NULLS This list is not exhaustive and the XML Query WG will provide additional feedback at a later date. - Paul Cotton, on behalf of the XML Query WG ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. XML Query data model related issues -------------------------------------- 2.1 Treatment of anonymous types -------------------------------- XML Query will require access to explicit schema information for every element and attribute in order to know, e.g. what kind of operations are legal on those nodes. The current prescription in XML Schema Part 1: Structures, section 3.3, is that if the name of the actual type definition "is absent, schema processors may, but need not, provide a value unique to the {type definition} of the declaration." Besides that this unique value is rather mysterious, Query will require something which is both mandatory, and consistent with the treatment of named types. For the query language, we may not need the "identity" of an anonymous type - wouldn't it be sufficient to have the type definition itself? For anonymous types, equality of type can reasonably be defined as structural equivalence. For heavy users of anonymous types, that would lead to enormous redundancy in the PSV-infoset, and suggests that the infoset contributions should also include new Type Information Items that could be referenced from the EIIs. This would be advantageous for named types too, to save users of the PSV-Infoset from having to locate schemas (except for more general schema investigation). We would like to offer the following proposal for consideration: =============================================================== Schema Infoset Contribution: Element Validated by Type (Structures 3.3): First, insert a Type Information Item for the actual type definition into the set of TIIs (see below). Since the TIIs form a set, duplicates are not inserted. [Note that this requires some work on detailed definition of equality of anonymous types. Also namespaces must be added to named type definitions to avoid false elimination of apparent duplicates. However, for anonymous types one probably wants to ignore the namespace if the types are structurally the same.] Then add the following to the EII: [type definition namespace] [type definition name] - may be absent for anonymous types [type definition reference] - reference to its TII Schema Infoset Contribution: Type Information Item The set of TIIs that need to be referenced within the PSV-Infoset (except for the builtin simple types - and other types defined within the Schema spec?). A TII has the structure of an EII in the Infoset for the schema that defines the corresponding <simpleType> or <complexType> element. ================================================================= So navigating a TII would be equivalent to going to the schema and navigating the type definition. Basically, a user of the PSV-Infoset would always have the content of any type definition handy (or known already from the Schema spec if in that namespace), and would also have the names of named types for strong type checking where needed. The TII would carry the simple|complex information, so [type definition type] is not needed in the element SISC. Also [type definition anonymous] can be omitted, since it is redundant with absence or presence of a [type definition name]. 2.2 Schema for schemaless documents ----------------------------------- We do require a standard way to represent the "schema" of documents which have DTD's or do not have any schema at all. In particular, we need to have a representation for the ur-type. 2.3 Treatment of collections ---------------------------- In processing a query, sometimes the order of children in an element is relevant and sometimes it is not. In the case where order is not relevant, additional optimizations may be performed. It would be helpful if schema could provide some way to indicate whether the order of the children is significant. For instance, this might be done by giving a type an `ordered' property. Thus, just as the content of a non-empty element is always either mixed or elementOnly, it also might be either ordered or unordered. 2.4 Problems with minoccurs and maxoccurs ----------------------------------------- A. The default for maxOccurs behaves counter-intuitively. When maxOccurs is not explicitly specified, it inherits the value of minOccurs (which defaults to 1 if not specified). This is confusing. For example, po.xsd in XML-Schema Part-0 (Primer) contains the declaration <xsd:element ref="comment" minOccurs="0"/> This effectively prohibits comments in the instance-document. The XML Query Working Group suggests that Schema require that minOccurs and maxOccurs occur together or that Schema normatively adopt the default-rule mentioned in Appendix B of XML Schema Part-1: "maxOccurs defaults to 1 or minOccurs, whichever is greater". B. The XML Query Working finds the different treatment of the properties minOccurs/maxOccurs, fixed, default, and value in the XML representation for element-declarations and for attribute-declarations confusing. The XML Query Working group suggests to use the same representation for element-declarations and attribute-declarations, and constrain the allowed value for minOccurs and maxOccurs in attribute-declarations to "0" or "1". This would allow queries such as: "Select all attributes and elements that may occur at most 1 once" to be evaluated more efficiently. C. There is an inconsistency between '*' and 'unbounded'. Primer uses "*" to mean Infinity; Data Type spec uses "*" in appendix B. Other places in the spec use "unbounded". 2.5 Identity-constraints tables ------------------------------- XML Schema Part 1: Structures section 3.10 discusses the Infoset contributions for identity constraints. In order to verify that identity constraints are satisfied, it defines identity-constraint tables to be added to Element Information Items. These tables in effect would let a query processor find the element referred to by any keyref. A. The note at the end of the section says, however, that these tables are optional. Conformant schema processors are *not* required to expose them. This means that a query processor working with a PSV Infoset created by a conformant processor that does not expose such tables may be forced to reconstruct some or all of them -- possibly an expensive process, and clearly unnecessary as the schema processor would have created them to check the identity constraints and then thrown them away! We suggest that all conformant XML Schema processors must be able to expose the identity-constraint tables, but need not do so if requested otherwise. B. We would like to request a reformulation as a single "identity-constraint index" from which it would also be easy to find all the elements whose keyrefs referred to a key. A simpler representation would promote interoperability of conformant XML Schema processors. We are thinking both of conceptual simplicity and of a corresponding API that could support transfer of this information in practice. 2.6 Referential mechanisms across multiple documents ----------------------------------------------------- Query has a requirement to query across collections of documents, which implies that we will need referential mechanisms other than URI references (e.g., keys/keyRefs) across multiple documents. In version 1, the reference mechanisms defined by Schema are restricted to a single document. Mechanisms such as XPointer might address inter-document references if extended to support the keyRef datatype. We believe there is a future requirement for referential mechanisms between documents. 2.7 Internal representation of datatypes ----------------------------------------- Schema defines datatypes in the PSV Infoset for Query to access. The PSV is extracted from the XML document by a PSV enabled parser. The Query WG is interested in working together with the Schema WG and other working groups, e.g., DOM, to determine whether the physical representation of each schema primitive datatype (e.g., floating point numbers) should be an optional PSV characteristic. This would increase interoperability by moving the conversion of datatypes into the realm of a PSV Schema processor. Some members of the Query WG believe that this comment encroaches on implementation details, but would like to further discuss this issue with the Schema WG. 2.8 Infoset contributions for simple types ------------------------------------------ There are differences in the infoset for simple types (datatypes) between part 1 and part 2 of the schema spec: A. The part 1 spec has an [abstract] property. The part 2 spec does not. B. The part 1 spec does not have the property [fundamental facets]. Except for "bounds", the other fundamental facets (equal, order, cardinality, numeric) are constant for a base datatype and its derived types. There is no need to represent this constant information in the PSV Infoset. C. The structures spec has 2 properties [base type definition] and [primitive type definition]. The datatypes spec has a single property [base type definition]. The primitive type can be obtained by following the base type chain, but storing the primitive type is more efficient for certain kinds of type inference. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3. Algebra related issues ------------------------- 3.1 Operations -------------- There is a need for operations to be defined on base types. Schema doesn't define any built-in operations or provide any mechanism for user-defined operations on types. As a result, the Query WG needs to define these. The Query WG will also need to determine the type of the arguments to select the right operator (e.g., floating point vs. integer arithmetic) and do the appropriate type coercion. The type coercion rules need to be defined. The Query WG is intending to define these operations and looks forward to doing this in cooperation with the Schema WG. 3.2 Treatment of NULLS ---------------------- The Query WG has not reached a consensus regarding the definition of NULLs. We expect that the Query WG will submit comments regarding nulls in the future, once we have determined their potential impact on the Query algebra. In the interim, we have asked individual members of the Query WG to send their comments regarding NULLs directly to the Schema WG. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Paul Cotton, Microsoft Canada mailto:paulcotton@alumni.uwaterloo.ca 17 Eleanor Drive, Nepean, Ontario K2E 6A3 Tel: (613) 225-5445 Fax: (613) 226-6913
Received on Monday, 29 May 2000 12:28:28 UTC