XML Query Comments to XML Schema (1st part) from Jerome Simeon on 2000-05-18 (www-xml-schema-comments@w3.org from April to June 2000)

From: Jerome Simeon <simeon@research.bell-labs.com>
Date: Thu, 18 May 2000 11:47:39 -0400 (EDT)
To: www-xml-schema-comments@w3.org
CC: w3c-xml-query-wg@w3.org
Message-Id: <200005181547.LAA19801@starling.research.bell-labs.com>
Here is the first set of comments from the XML Query Working Group on
the XML Schema last call Working Draft.
    http://www.w3.org/TR/2000/WD-xmlschema-0-20000407/
    http://www.w3.org/TR/2000/WD-xmlschema-1-20000407/
    http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/

In this version, we address the following issues:
   1.1  Complexity of the XML Schema specification
   1.2  Abstract Types
   1.3  Typing documents and queries with local types
   1.4  Partially validated Instance and lax validation

This list is not exhaustive and the XML Query WG will provide
additional feedback at a later date.

- Jerome Simeon, on behalf of the XML Query WG

======================================================================

0. Introduction: Usage of schema for queries
--------------------------------------------

There are many ways a schema might be a useful information for a query
language. Here are some of the use of schema information that the XML
Query Working Group find important. This part can be seen as an XML
Query use case of XML Schema.

A. query formulation: knowing the structure of the document can help
   the user writing the appropriate query

B. query typing: knowing the structure of the document can be used to
   detect errors in the queries

C. query optimization: knowing the structure of the document can be
   used, for example, to avoid unnecessary navigation in certain
   portions of the document

D. querying the schema: one might want to query the schema information
   itself

E. query semantics: knowing the type of values can be used to choose
   necessary coercions, e.g., when performing comparisons.

The following comments are formulated with these scenarios in mind.

1.1 Complexity of the XML Schema specification
----------------------------------------------

The XML Query group is concerned with the difficulty in understanding
the XML Schema specification, both in terms of conceptual complexity
and in terms of presentation complexity. Notably, it is often that
information is scattered throughout the document in a way that makes
it almost impossible to read sequentially. Commonality between schema
components is not explicitly captured, and there are no overview
tables to help with the problem. As a result, the naive reader finds
it difficult to answer even simple questions about the abstract data
model found in the Structures spec.

To facilitate the understanding of the document, we suggest it would
be useful to enumerate all aspects of each Schema component at a
single place. Notably, it would be useful to define what a complex
type is at a single place.

1.2 Abstract Types
------------------

Section 4.6 in XML Schema Part 0 describes the following use of
abstract types:

 <schema xmlns='http://www.w3.org/1999/XMLSchema'
         targetNamespace='http://cars.example.com/schema'
         xmlns:target='http://cars.example.com/schema'>
   <complexType name='Vehicle'   abstract='true'/>
   <complexType name='Car'       base='target:Vehicle' />
   <complexType name='Plane'     base='target:Vehicle' />
   <element     name='transport' type='target:Vehicle' />
 </schema>

On the other hand Section 3.4 in XML Schema Part 1 says:

"A complex type for which {abstract} is true must not appear as the
{type definition} of an Element Declaration (�2.2.2.1), and must not
be referenced from an xsi:type (�2.6.1) attribute in an instance
document; such abstract complex types can be used as {base type
definition}s, but they are never used directly to validate element
content."

This effectively forbids the schema in Section 4.6/XML Schema Part 0.
In addition, it does not seem to be compliant with the constraints on
Schemas in Section 5.2 (Element Declaration Properties Correct) and in
Section 5.11 (Complex Type Definition Properties Correct) - although
the latter is somewhat cyclic (referring back to Section 3.4).

The XML Query WG would like to use abstract types in element
declarations. Therefore, the XML Query WG finds the above paragraph
overly restrictive and asks to change it as follows:

"A complex type for which {abstract} is true must not be referenced
from an xsi:type (�2.6.1) attribute in an instance document; such
abstract complex types can be used as {base type definition}s and
{element type definition}s, but they are never used directly to
validate element content. Instead an xsi:type (�2.6.1) attribute must
specify explicitly the non-abstract derived type for every element
which is declared with an abstract type". Also the sentence "{type
definition} must not be an abstract type definition." should be
deleted from section 3.3.

For example, the following instance-fragment should be allowed

<transport xsi:type='target:Car'>Driving Directions ....</transport>

whereas, the following instance-fragment should not be allowed

<transport xsi:type='target:Vehicle'>Driving Directions and Flying
Directions ...</transport>

1.3 Typing documents and queries with local types
-------------------------------------------------

If one considers the following simple XML document:

  <authors>
     <author>Serge Abiteboul</author>
     <author>Peter Buneman</author>
     <author><first>Dan</first><last>Suciu</last><author>
  <authors>

This document is well-formed. It can be easily defined by a user or
generated by a query. However, because XML Schema does not allow to
use distinct types for local elements with the same name and it is
very difficult to provide a schema for it. The best we could come to
used a mixed element type for authors, which looses a fair amount of
information. As a consequence, this particular limitation could make
type checking for query quite difficult.

The XML Query group does not yet fully understand which is the best
way to solve this issue. However, the two following concrete proposals
are considered as a means to address some aspects of the problem.

Proposal 1: Removing limitations on local elements
--------------------------------------------------

The limitation comes from XML Schema Part I: Structures, section 5.7

 "If the {particles} contains, either directly, indirectly (that is,
  within the {particles} of a contained model group, recursively) or
  implicitly two or more element declaration particles with the same
  {name} and {target namespace}, all their {type definition}s must be
  the same."

Removing this limitation would address the problem, as it would allow
to write, for instance, the following type for the above document:

    <xsd:element name="result">
       <xsd:complexType>
          <xsd:sequence>
             <xsd:element name="author" type="xsd:string"/>
             <xsd:element name="author" type="xsd:string"/>
             <xsd:element name="author">
                <xsd:complexType>
                   <xsd:element name="first" type="xsd:string"/>
                   <xsd:element name="last" type="xsd:string"/>
                </xsd:complexType>
             </xsd:element>
          </xsd:sequence>
        </xsd:complexType>
     </xsd:element>

Proposal 2: Using abstract types
--------------------------------

Another approach could be to use abstract types along the lines
suggested in 1.1 above. With this approach, a schema for the above
instance could be constructed as follows:

 <schema xmlns='http://www.w3.org/1999/XMLSchema'
         targetNamespace='http://used.science.org/schema'
         xmlns:target='http://used.science.com/schema'>
   <complexType name='Author'        abstract='true'/>
   <annotation> This assumes that this is the ur-type
definition</annotation>
   <complexType name='SimpleAuthor'
                base='target:Author' derivedBy='restriction' type='string'/>
   <annotation>This assumes that this is an allowed derivation from the
ur-type</annotation>
   <complexType name='ComplexAuthor'
                base='target:Author' derivedBy='restriction'>
   <annotation>This assumes that this is an allowed derivation from the
ur-type</annotation>
     <element name="first" type="string"/>
     <element name="last" type="string"/>
   </complexType>
   <element name='authors'/>
     <complexType>
      <element name='author' minOccurs='0' maxOccurs='unbounded'
type='target:Author'>
     </complexType>
   </element>
 </schema>

'ComplexAuthor' and 'SimpleAuthor' are both (complex) types derived
from the abstract type Author. In effect 'author' in 'result' can take
either the concrete type 'SimpleAuthor' or the concrete type
'ComplexAuthor'.

Note that the instance must be changed to indicate the concrete type
explicitly:

   <authors>
       <author xsi:type="target:SimpleAuthor">Serge Abiteboul</author>
       <author xsi:type="target:SimpleAuthor">Peter Buneman</author>
       <author
xsi:type="target:ComplexAuthor"><first>Dan</first><last>Suciu</last>
       </author>
    </authors>

1.4 Partially validated Instance and lax validation
---------------------------------------------------

We wonder about the reasons for and details of lax schema validation.
Lax schema validation seems to allow for schema instances which make
arbitrary extensions to the structure allowed explicitly by a schema
in the form of additional elements or attributes.

Another issue is what type the query data model assumes for simple
types of lax elements or attributes. The ur-type?
Received on Thursday, 18 May 2000 11:48:11 UTC