Schema Examples for Primer

In fulfilment of my action item from the January 8th telcon[1],
here are a proposed set of schema examples for inclusion in the
WSDL 2.0 Primer.

If you can remember that far back, these examples are offered as
a pattern useful for describing some common data structures in WSDL.
The intent is to give implementers simple patterns for exposing data
structures in schema and to facilitate the round tripping from code 
onto the wire and back again. 

The data structures under consideration are programming language and
environment agnostic:

  - Collection
  - Vector
  - Map



COLLECTION
----------

A collection of data items (an object, class, structure, record etc)
is best represented as a complexType with the individual items as
elements and/or attributes. Sadly it needs saying that the compositors
"sequence", "all" *and* "choice" should be supported.

  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="colour" type="xs:string"/>
    </xs:sequence>
    <xs:attribute name="id" type="xs:string" />
  </xs:complexType>


Extending a Collection
----------------------

Explicit (active) extensibility of a collection may be expressed
using 'any' and 'anyAttribute' as in the following example adapted 
from the draft TAG finding on extensibility and versioning[2]:

   <xs:complexType name="ProductType">
      <xs:sequence>
        <xs:element name="name" type="xs:string"
            minOccurs="1" maxOccurs="1"/>
        <xs:element name="colour" type="xs:string"/>
        <xs:any processContents="lax"
	      minOccurs="0" maxOccurs="unbounded"/>
     </xs:sequence>
     <xs:attribute name="id" type="xs:string" />
     <xs:anyAttribute/>
   </xs:complexType>

The 'any' wildcard is only deterministic when applied to the end of
a 'sequence' - determinism being particularly important when generating
messages from a schema definition. Again it needs saying that tools 
should support the 'namespace' attribute including the '##any', 
'##other' and '##targetnamespace' values. 

I suggest we discuss Henry Thompson's findings on passive versioning
separately [3].


VECTOR
------

A vector is an ordered sequence of items of the same data type. This is
a very common construct in programming languages appearing as an array or
list. Multi-dimensional arrays may be built by composing vectors within
vectors.  Formalising more complex constructs such as sparse or jagged
matrices are beyond the scope of these examples.

The WSDL 1.1 note[4] suggested naming array types using the prefix
"ArrayOf" and offered the following schema extract for presenting arrays
in WSDL:

  <xs:complexType name="ArrayOfFloat">
    <xs:complexContent>
      <xs:restriction base="soapenc:Array">
         <xs:attribute ref="soapenc:arrayType"
                    wsdl:arrayType="xsd:float[]"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>

Some encoded tools rely upon the soapenc:arrayType annotation attribute
appearing on the wire, e.g.:

    <prices soapenc:arrayType="xs:float[3]">
      <price xsi:type="xs:float">0.34</price>
      <price xsi:type="xs:float">0.44</price>
      <price xsi:type="xs:float">0.21</price>
    </prices>

Other implementations require the array size to match
the number of items appearing in the container structure.

This led to very good interoperability between SOAP section 5 aware
tools but introduced difficulties for document processors which had to
nest the repeated elements in otherwise flat documents as well as
providing soapenc and xsi annotations. So the WS-I Basic Profile[5] 
specifically prohibited using the SOAP section 5 annotation as well 
advising against using the "ArrayOf" prefix. The following schema extract 
is given as an example for an array:

  <xsd:element name="MyArray1" type="tns:MyArray1Type"/>
  <xsd:complexType name="MyArray1Type">
    <xsd:sequence>
      <xsd:element name="x" type="xsd:string"
                   minOccurs=0 maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

Serialised as:

  <MyArray1>
    <x>abcd</x>
    <x>efgh</x>
  </MyArray1>

I'd quite like to endorse the above pattern for an example of a
vector as it's a common output from current tools when generating
document/literal 1.1 WSDLs from code. But elements which may be
repeated outside of an array container should also be mapped
in code as an array, e.g:

  <xs:complexType name="Address">
    <xs:sequence>
      <xs:element name="lines" type="xs:string" maxOccurs="unbounded"/>
      <xs:element name="telnos" type="xs:string" maxOccurs="unbounded"/>
    <xs:sequence>
  </xs:complexType>

The above example would be presented as a collection containing two 
separate arrays. Unfortunately this isn't orthogonal: a WS-I example 
array would flip into a collection containing an array if it was later
extended to contain another element. Not good. So we should offer:

  <xs:element name="products" type="tns:Product" maxOccurs="unbounded"/>

as an example for a vector, making the WS-I example present a collection
containing a single vector. This actually what a number of existing
WSDL 1.1 toolkits do anyway.


MAP
---

A map is a common construct provided by many languages as a 'hash
table', 'dictionary', 'map', 'indexed table', 'associative
array', 'associative memory' etc.

SOAPBuilders discussed how to exchange map types several times. Looking
through the mailing list[6] it seems that the proposals were mainly aimed
at section 5 encoding using on the wire annotations, element naming
conventions or extensions of a common type. I didn't think these approaches
suitable for a generalised literal context so looked for a common schema 
pattern to express a map.


The Key
-------

A map may be viewed as a Vector in which each item is accessible via
a unique key:

  - The key is unique and identifies a collection of data items.
  - The key is often (but not always) a simpleType.

In the following example list of products, each item is identified
by a productId:

     '10203' => {
         name => 'apple'
         price => '35',
     },

     '10204' => {
         name => 'pear',
         price => '50',
     },

This could be represented in XML as:

     <product id="10203">
	 <name>apple</name>
       <price>35</price>
     </product>

     <product id="10204">
	 <name>pear</name>
       <price>50</price>
     </product>

To turn a Vector into a map all that is required is to recognise
which element attribute contained within of the repeated element
is the key. W3C Schema offers a number of standard mechanisms to
describe uniqueness:

   - xs:ID
   - xs:unique
   - xs:key


xs:ID as a Key
--------------

xs:ID is similar to the DTD ID type with the following properties:

   - the type has the same lexical space as xs:NCName
   - the ID may be an attribute or an element
   - the ID value must be unique with the document
   - there may be one or more ID's associated with an element

The PSVI set contains a ID/IDREF table[7] which is an index pointing
to nodes with an included xs:ID element/attribute.

The new xml:id[8] attribute from the XML Base WG offers an neat way of
expressing an ID value for an element targeted at recipients who may not
have access to a schema. A so called 'disengaged' agent may easily more
recognise a map from a document containing xml:id attributes and present 
the repeated element as an associative array, e.g:

     <product xml:id="product_10203">
	 <name>apple</name>
       <price>35</price>
     </product>

     <product xml:id="product_10204">
	 <name>pear</name>
       <price>50</price>
     </product>

WSDL Schema for above xml:id example:

   <types>
     <xs:schema targetNamespace="http://www.w3.org/1998/XML/Namespace">
       <xs:attribute name="id" type="xs:ID"/>
     </xs:schema>

     <xs:schema targetNamespace="http://www.openuri.org/">
       <xs:import namespace="http://www.w3.org/XML/1998/namespace"/>

       <xs:complexType name="ProductType">
         <xs:sequence>
           <xs:element name="name" type="xs:string"/>
           <xs:element name="price" type="xs:string"/>
         </xs:sequence>
         <xs:attribute ref="xml:id" use="required" />
       </xs:complexType>
        
       .....
     </xs:schema>
   </types>


xs:unique as a Key
------------------

The xs:unique provides a means of expressing that an element or attribute
value is unique within a specified set of elements. A selector XPath
expression defines the set of elements and attributes which constitute 
the context, a second field XPath expression identifies an element or
attribute which must be unique within the context, e.g.:

  <xs:element name="products">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="product" type="ProductType" 
               maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>

    <xs:unique name="product">
      <xs:selector xpath="product"/>
      <xs:field xpath="@id"/>
    </xs:unique>

  </xs:element>

  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="price" type="xs:string"/>
    </xs:sequence>
    <xs:attribute name="id" type="xs:string" />
  </xs:complexType>


xs:key as a Key
---------------

xs:key is very similar to xs:unique, but adds a constaint that all 
nodes corresponding to all fields have to be present. Key has been 
designed for use with xs:keyref - similar to ID/IDREF in DTD with 
the difference that the uniqueness is within a XPath defined scope. 
xs:key is therefore preferable to xs:unique when describing a key. 

Map Summary
-----------

   - ID is an NCName datatype. A raw number such as an ISBN such as
     "0123456789" could be represented as "isbn_0123456789", more
     complex types such as a Java object will require a hashed key
     value.

   - ID has to be unique within the entire document. A context specific
     prefix could added to the front of each key.

   - there may be more than one ID, xs:unique or xs:key value for
     a repeated element. Rules as to which element/attribute should be
     used as the primary key could be provided, or be left undefined.

   - ID's are presented in the PSVI ID/IDREF table. May assist processing.

   - xs:unique and xs:key are more explicit in which containing elements
     form the map.

   - xs:unique and xs:key allow multiple fields to form a composite key

   - xs:unique and xs:key are complex and not well supported by current
     implementations

   - not all schema languages have a means of expressing a uniqueness
     constraint. xs:unique and xs:key don't map well to Relax NG.

I'd suggest using the simple xml:id as a good mechanism for generating 
a map from code with the caveats on the key having to be NCName as well
as document wide unique.

I suggest we offer xml:id, xs:ID, xs:unique and xs:key as patterns for
recognising a map when binding schema to code.

Paul

--
Paul Sumner Downey
Web Services Integration
BT Exact


[1] minutes from January 8th Telcon:
http://tinyurl.com/2gflf
http://lists.w3.org/Archives/Public/www-ws-desc/2004Jan/att-0029/minutes_200
40108.htm#item05

[2] David Orchard, Norman Walsh, "Versioning XML Languages":
http://www.w3.org/2001/tag/doc/versioning.html

[3] Henry Thompson, XML2004 "Versioning made easy with W3C Schema":
http://lists.w3.org/Archives/Public/www-ws-desc/2004Apr/0019.html
slides:
http://www.markuptechnology.com/XMLEu2004/

[4] WSDL 1.1 Note:
http://www.w3.org/TR/wsdl

[5] WS-I Basic Profile 1.0 Board Approved Draft, example array:
http://tinyurl.com/2o9x2
http://www.ws-i.org/Profiles/Basic/2003-03/BasicProfile-1.0-BdAD.html#IDAAN4
GB

[6] SOAPBuilders long thread on map types
- includes contributions from Glen and Gudge!
http://groups.yahoo.com/group/soapbuilders/message/1331

[7] PSVI Set Contribution: ID/IDREF Table:
http://www.w3.org/TR/xmlschema-1/#sic-id

[8] xml:id Working Draft:
http://www.w3.org/TR/xml-id/

Received on Wednesday, 5 May 2004 12:25:18 UTC