straw proposal for mapping XML schema valid XML data to RIF frames

This is in response to ACTION-591.

Mapping XML Schema valid XML Data to RIF Frames
===============================================

This is a strawman for mapping XML documents whose structure is 
described by an XML schema[1] to and
from RIF Core frames.

An XML element has a type as defined by its schema.  The type can be 
simple or complex.  Simple types
can be atomic, lists, or unions.  Atomic types can be primitive (e.g. 
xs:string), or they can be
enumerations or restrictions of atomic types.  A complex type can have 
attributes and content.  The
content can be simple, or it can be a sequence, choice, or set of 
elements.  An attribute has a name and
a value.  The value has a simple type.  Types can be derived by 
extension or restriction.
Elements and types can be named globally or locally.  A local element 
can be defined by referring to a
global element. An element may be defined by referring to a global type, 
or can include the type
definition in the content of its own definition.  There are many ways to 
write the "same" schema.

Thus, XML schema is quite complex.  Here, we limit our concern to 
mapping elements of complex type to
frames. The only simple types we will handle are the primitive types 
supported by RIF DTB. Our
contribution is mainly to define how to construct IRIs of class 
constants and slot name constants from
the XML schema.

This is a "strawman by example", so we start with an example document 
(from [2]).

Example XML Document
--------------------

<shiporder orderid="889923" xmlns="http://example.org">
 <orderperson>John Smith</orderperson>
 <shipto>
  <name>Ola Nordmann</name>
  <address>Langgt 23</address>
  <city>4000 Stavanger</city>
  <country>Norway</country>
 </shipto>
 <item>
  <title>Empire Burlesque</title>
  <note>Special Edition</note>
  <quantity>1</quantity>
  <price>10.90</price>
 </item>
 <item>
  <title>Hide your heart</title>
  <quantity>1</quantity>
  <price>9.90</price>
 </item>
</shiporder>

We consider 3 ways to write a schema for the above document.

1. One big element
------------------

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://example.org">

<xs:element name="shiporder">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="orderperson" type="xs:string"/>
   <xs:element name="shipto">
    <xs:complexType>
     <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="address" type="xs:string"/>
      <xs:element name="city" type="xs:string"/>
      <xs:element name="country" type="xs:string"/>
     </xs:sequence>
    </xs:complexType>
   </xs:element>
   <xs:element name="item" maxOccurs="unbounded">
    <xs:complexType>
     <xs:sequence>
      <xs:element name="title" type="xs:string"/>
      <xs:element name="note" type="xs:string" minOccurs="0"/>
      <xs:element name="quantity" type="xs:positiveInteger"/>
      <xs:element name="price" type="xs:decimal"/>
     </xs:sequence>
    </xs:complexType>
   </xs:element>
  </xs:sequence>
  <xs:attribute name="orderid" type="xs:string" use="required"/>
 </xs:complexType>
</xs:element>

</xs:schema>

Using the above schema, we represent the shiporder using the following 
RIF-PS:

Prefix(tns http://example.org)

_obj1#<tns:/shiporder>
_obj1[<tns:/shiporder@orderid>     -> "889923"
      <tns:/shiporder/orderperson> -> "John Smith"
      <tns:/shiporder/shipto>      -> _obj2 
      <tns:/shiporder/item>        -> _obj3
      <tns:/shiporder/item>        -> _obj4
]

_obj2#<tns:/shiporder/shipto>
_obj2[<tns:/shiporder/shipto/name>    -> "Ola Nordmann"
      <tns:/shiporder/shipto/address> -> "Langgt 23"
      <tns:/shiporder/shipto/city>    -> "4000 Stavanger"
      <tns:/shiporder/shipto/country> -> "Norway"
]

_obj3#<tns:/shiporder/item>
_obj3[<tns:/shiporder/item/title>    -> "Empire Burlesque"
      <tns:/shiporder/item/note>     -> "Special Edition"
      <tns:/shiporder/item/quantity> -> 1
      <tns:/shiporder/item/price>    -> 10.90
]

_obj4#<tns:/shiporder/item>
_obj4[<tns:/shiporder/item/title>    -> "Hide your heart"
      <tns:/shiporder/item/quantity> -> 1
      <tns:/shiporder/item/price>    -> 9.90
]


2. Refs to Global Elements and Attributes
-----------------------------------------

A second equivalent schema uses the following style.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://example.org">

<!-- definition of simple elements -->
<xs:element name="orderperson" type="xs:string"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>

<!-- definition of attributes -->
<xs:attribute name="orderid" type="xs:string"/>

<!-- definition of complex elements -->
<xs:element name="shipto">
 <xs:complexType>
  <xs:sequence>
   <xs:element ref="name"/>
   <xs:element ref="address"/>
   <xs:element ref="city"/>
   <xs:element ref="country"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>
<xs:element name="item">
 <xs:complexType>
  <xs:sequence>
   <xs:element ref="title"/>
   <xs:element ref="note" minOccurs="0"/>
   <xs:element ref="quantity"/>
   <xs:element ref="price"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

<xs:element name="shiporder">
 <xs:complexType>
  <xs:sequence>
   <xs:element ref="orderperson"/>
   <xs:element ref="shipto"/>
   <xs:element ref="item" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:attribute ref="orderid" use="required"/>
 </xs:complexType>
</xs:element>

</xs:schema>

Using the above schema, we represent the shiporder using the following 
RIF-PS:

Prefix(tns http://example.org)

_obj1#<tns:/shiporder>
_obj1[<tns:@orderid>     -> "889923"
      <tns:/orderperson> -> "John Smith"
      <tns:/shipto>      -> _obj2 
      <tns:/item>        -> _obj3
      <tns:/item>        -> _obj4
]

_obj2#<tns:/shipto>
_obj2[<tns:/name>    -> "Ola Nordmann"
      <tns:/address> -> "Langgt 23"
      <tns:/city>    -> "4000 Stavanger"
      <tns:/country> -> "Norway"
]

_obj3#<tns:/item>
_obj3[<tns:/title>    -> "Empire Burlesque"
      <tns:/note>     -> "Special Edition"
      <tns:/quantity> -> 1
      <tns:/price>    -> 10.90
]

_obj4#<tns:/item>
_obj4[<tns:/title>    -> "Hide your heart"
      <tns:/quantity> -> 1
      <tns:/price>    -> 9.90
]

3. Named Types
--------------

The third style of schema uses named types.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://example.org">

<xs:simpleType name="stringtype">
 <xs:restriction base="xs:string"/>
</xs:simpleType>

<xs:simpleType name="inttype">
 <xs:restriction base="xs:positiveInteger"/>
</xs:simpleType>

<xs:simpleType name="dectype">
 <xs:restriction base="xs:decimal"/>
</xs:simpleType>

<xs:simpleType name="orderidtype">
 <xs:restriction base="xs:string">
  <xs:pattern value="[0-9]{6}"/>
 </xs:restriction>
</xs:simpleType>

<xs:complexType name="shiptotype">
 <xs:sequence>
  <xs:element name="name" type="stringtype"/>
  <xs:element name="address" type="stringtype"/>
  <xs:element name="city" type="stringtype"/>
  <xs:element name="country" type="stringtype"/>
 </xs:sequence>
</xs:complexType>

<xs:complexType name="itemtype">
 <xs:sequence>
  <xs:element name="title" type="stringtype"/>
  <xs:element name="note" type="stringtype" minOccurs="0"/>
  <xs:element name="quantity" type="inttype"/>
  <xs:element name="price" type="dectype"/>
 </xs:sequence>
</xs:complexType>

<xs:complexType name="shipordertype">
 <xs:sequence>
  <xs:element name="orderperson" type="stringtype"/>
  <xs:element name="shipto" type="shiptotype"/>
  <xs:element name="item" maxOccurs="unbounded" type="itemtype"/>
 </xs:sequence>
 <xs:attribute name="orderid" type="orderidtype" use="required"/>
</xs:complexType>

<xs:element name="shiporder" type="shipordertype"/>

</xs:schema>

Using the above schema, we represent the shiporder using the following 
RIF-PS:

Prefix(tns http://example.org)

_obj1#<tns:/shiporder>
<tns:/shiporder>##<tns:/shipordertype>
_obj1[<tns:/shipordertype@orderid>     -> "889923"
      <tns:/shipordertype/orderperson> -> "John Smith"
      <tns:/shipordertype/shipto>      -> _obj2 
      <tns:/shipordertype/item>        -> _obj3
      <tns:/shipordertype/item>        -> _obj4
]

_obj2#<tns:/shiptotype>
_obj2[<tns:/shiptotype/name>    -> "Ola Nordmann"
      <tns:/shiptotype/address> -> "Langgt 23"
      <tns:/shiptotype/city>    -> "4000 Stavanger"
      <tns:/shiptotype/country> -> "Norway"
]

_obj3#<tns:/itemtype>
_obj3[<tns:/itemtype/title>    -> "Empire Burlesque"
      <tns:/itemtype/note>     -> "Special Edition"
      <tns:/itemtype/quantity> -> 1
      <tns:/itemtype/price>    -> 10.90
]

_obj4#<tns:/itemtype>
_obj4[<tns:/itemtype/title>    -> "Hide your heart"
      <tns:/itemtype/quantity> -> 1
      <tns:/itemtype/price>    -> 9.90
]

General Rules
-------------

1. order of sequences is not preserved

2. cardinality (minOccurs, maxOccurs) is ignored

3. simple types are ignored

4. the IRI for an element e (IRI(e)) is given by
  a. <tns:/e> if e is a global element, where tns is the targetNamespace 
of the schema
  b. <C:/e> otherwise, where C is the IRI of the containing complexType 
of e

5. the IRI for an attribute a (IRI(a)) is given by
  a. <tns:@a> if a is a global attribute
  b. <C:@a> otherwise, where C is the IRI of the containing complexType of a

6. the IRI for a complexType c (IRI(c)) is given by
  a. <tns:/c> if c is a global complexType
  b. <E:c> otherwise, where E is the IRI of the element containing c

7. an instance of an XML element e with complexType c maps to an object 
_o that is a member of IRI(e).
I.e., _o#IRI(e). If IRI(e) != IRI(c), then additionally we have the 
axiom IRI(e)##IRI(c).

8. an element f contained in e (whether in a sequence, choice, or all) 
is a frame slot of _o named
IRI(f).  E.g. _o[IRI(f)->...]

9. if complexType sub extends a complexType sup, then IRI(sub)##IRI(sup)

Issues
------

Slot names are not disjoint from class names.

We could of course map much more schema information to axioms.  E.g. 
maxOccurs=1 could be expressed as
?x=?y :- _o[slot1->?x slot1->?y].  But that's not Core.  Are there other 
things expressible in Core that
we should map?

Should we care if the trailing char of tns is '/'?  or should we use '#' 
instead of the first '/' in the
curie?  or should we use '/' or '/@' instead of '@' for attributes?

Neither '#' nor '##' is legal in the conclusion in Core.  Probably they 
should be allowed in ground
facts.

Because we don't capture all the schema constraints, it may be 
impossible to serialize a collection of
frames computed by a seemingly consistent ruleset into a schema-valid 
XML document.

Fully-striped XML data doesn't need a schema, and probably should follow 
an RDF-style mapping (not
covered here).


[1] http://www.w3.org/TR/xmlschema-0/
[2] http://www.w3schools.com/Schema/schema_example.asp

Received on Monday, 13 October 2008 23:07:32 UTC