- From: Jonathan Robie <jonathan.robie@datadirect.com>
- Date: Tue, 10 Feb 2004 18:07:46 -0500
- To: XML Query Comments <public-qt-comments@w3.org>
Untyped data is one of the significant challenges in the design of the XML Query type system. Two important criteria for the representation of untyped data are: 1. We need a way to identify data that is not schema processed It should be easy for a processor to identify documents or regions for which no schema processing is done, either because the instance was not schema validated or for nodes found in a skip-validated region of a schema. This allows a processor to know that no typed data occurs within the region. One way to do this is to use xdt:untyped for elements that have not been schema-processed, and xdt:untypedAtomic for attributes that have not been schema processed, and to use the types assigned by XML Schema, including xs:anyType and xs:anySimpleType, when schema processing has been done. 2. Compatibility with the XML Schema type system. If a document has been schema-validated, the types used in the document should be compatible with those given to it by XML Schema. This is listed explicitly as a goal in our charter. It is a goal of the XML Query work to be compatible with the work of the XML Schema Working group on XML Schema Part 2: Datatypes (XML Schema Part 2) and XML Schema Part 1: Structures (XML Schema Part 1). For example, it should be possible to base query predicates on the existing DTD or XML Schema Part 1 definition of the content of an XML document and on the new data types being defined as part of the XML Schema Part 2. In addition the XML Query work will take advantage of the formal description of the contents of XML Schema defined in XML Schema: Formal Description (XML Schema: Formal Description). When schema processing is done, the Data Model should use the same type names as XML Schema. We currently map all instances of xs:anyType to xdt:untypedAny [1] and mapping all instances of xs:anySimpleType to xdt:untypedAtomic, which means that someone who understands XML Schema must also understand how our types differ from those in the XML Schema specification. If XML Schema assigns the type xs:anyType, the Data Model should use the same type. If XML Schema assigns the type xs:anySimpleType, this type should be preserved in the Data Model. This is important not only for the comprehension of those poor souls who must understand both XML Schema and XQuery's type system, but also because XQuery and XSLT are not the only systems that use type information from XML Schema. Since we use different type names, software based on the PSVI has different type names for untyped data than software based on the Data Model. A Java or C++ program using a PSVI API will have the same type names as the Data Model for almost every other named data type, but not for these two - which means that someone using XQuery embedded in a Java program, or an XQuery that makes external calls to Java, must be aware of the two sets of type names and how they relate to each other. Also, a browser based on the PSVI representation reports different type names than a browser based on the Data Model, and debugging tools based on the two different representations report different type names. This is especially important since many of us see the Data Model as an important simplification of the PSVI that may become the basis for many specifications. It must not be at odds with XML Schema. Our charter asks us to design a language, not to change the type hierarchy used by XML Schema. If our language can't match a type in the type hierarchy or express the type for the purposes of static inference, the solution is to change the language, not the data model. Our status quo goes against the charter in a way that hurts interoperability among specifications and tools. Jonathan [1] This is called xdt:untyped in internal drafts that have not yet been released.
Received on Tuesday, 10 February 2004 18:44:33 UTC