Re: Simplifying XML Schema

Philip Wadler writes:

>> Thanks for this comment.  Indeed, it is very difficult
>> to fit xsi:type into existing type systems, and this
>> is one reason for my worry.  This is directly relevant
>> to the work of Query, because we are hoping to have
>> a typed query algebra.

There are indeed some good reasons to question xsi:type, but I don't agree
that mismatch to existing type systems is one.

Consider the following Java fragments:

     Class BaseClass { ...};

     Class Derived1 extends BaseClass {...};
     Class Derived2 extends BaseClass {...};

     Class C {
          Baseclass member1;
          Baseclass member2;
     };

     SomeC = new C;
     SomeC.member1 = new Derived1();
     SomeC.member2 = new Derived2();

One obvious serialization is to use element names for instances, and Schema
types for Java types (for simplicity, I've skipped namespaces in all of the
following) :

     <schema ....>
          <!-- types for members -->
          <complexType name="BaseClass">
             .....
          </complexType>

          <complexType name="Derived1"
                     base="BaseClass"
                     derivedBy="extension">
            .....
          </complexType>
          <complexType name="Derived2"
                     base="BaseClass"
                     derivedBy="extension">
             .....
          </complexType>

          <!-- define C and SomeC -->
          <complexType name="C">
               <element name="member1"
                       type="BaseClass"/>
               <element name="member1"
                       type="BaseClass"/>
          </complexType>

          <element name="SomeC" type= "C"/>
     </schema>

corresponding to the following instance (among many others):

     <someC>
          <member1 xsi:type="Derived1"> .... </member1>
          <member1 xsi:type="Derived2"> .... </member2>
     </someC>

Similar examples could be given for C++ (common single inheritance
structures...we didn't bite off multiple) and other single inheritance
object type systems.  This correspondence between XML schemas and existing
type systems was intentional.  So, I am a little confused by your concern
that  "it is very difficult to fit xsi:type into existing type systems".
Again, xsi:type does represent a layer of complexity and some added power.
Whether it is on balance apprioriate is a good question.  On the other
hand, it was added exactly because it does model certain reasonable idioms
for modeling existing type systems.  I feel like I'm missing something.

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------





                                                                                                                                 
                    Philip Wadler                                                                                                
                    <wadler@research.bell-labs.c        To:     petsa@us.ibm.com                                                 
                    om>                                 cc:     Philip Wadler <wadler@research.bell-labs.com>,                   
                    Sent by:                            www-xml-schema-comments@w3.org, Mary Fernandez <mff@research.att.com>,   
                    www-xml-schema-comments-requ        simeon@research.bell-labs.com, (bcc: Noah Mendelsohn/CAM/Lotus)          
                    est@w3.org                          Subject:     Re: Simplifying XML Schema                                  
                                                                                                                                 
                                                                                                                                 
                    05/18/00 02:51 PM                                                                                            
                                                                                                                                 
                                                                                                                                 



> Phil:
> We spent a lot of time debating the two options you discuss:
>
>     <billTo xsi:type="ipo:US-Address">
>         <name>Robert Smith</name>
>         <street>8 Oak Avenue</street>
>         <city>Old Town</city>
>         <state>PA</state>
>         <zip>95819</zip>
>     </billTo>
>
>     <billTo>
>         <US-Address>
>             <name>Robert Smith</name>
>             <street>8 Oak Avenue</street>
>             <city>Old Town</city>
>             <state>PA</state>
>             <zip>95819</zip>
>          </US-Address>
>     </billTo>
>
> when you declared US-Address and UK-Address you specified that
> they could appear wherever Address was declared.  In the instance
> you need to tell the validator exactly what type to validate for.
> The xsi:type and the additional level of nesting are two different
> ways of conveying that information.
>
> To me it came down to an aethetic choice and we chose not to add
> the extra level of nesting.  I'm not a deep type theorist so I
> could not see a strong argument either way.  Perhaps you can.
>
> Regards, Ashok

Thanks for this comment.  Indeed, it is very difficult to fit
xsi:type into existing type systems, and this is one reason for
my worry.  This is directly relevant to the work of Query,
because we are hoping to have a typed query algebra.

Everyone is familiar with types.  For instance, in Java you can
declare that a method receives arguments of certain type and returns
results of a given type, and the compiler will warn you if this is not
the case.  Much the same happens in many programming languages we are
familiar with (scripting languages like Perl being a notable
exception).

If one uses the DOM, this is not much help.  In Java, every XML tree
is represented by the DOM type Node.  For a method that processes XML,
saying that the inputs are of type Node and the output of type Node is
not an enormous help (essentially, it reduces the type system of Java
to give you no more help then you get in Perl).

If one uses a more sophisticated system, like the XML Bean Generator
from IBM Alphaworks or SOX from Commerce One, then the situation is
much better.  The Bean Generator takes a DTD as input, and produces a
collection of Java class and interface declarations as output.  Each
element type in the DTD (book, author, address, UK-address, what have
you) yields a corresponding Java class declaration.  The generated
classes include methods that automatically parse XML and yield
corresponding Java objects, or serialize objects as XML.  Now one can
have a method that, say, takes a Book and returns an Author, which is
much more useful than taking a Node and returning a Node.

Similarly, XML programming languages may be typed or untyped.
Xduce is a typed language, XSLT is an untyped language.  So in
Xduce one can specify that a procedure takes a Book and returns
an Author, and the compiler will check this for you.  There is
no way to get the compiler to check similar information in XSLT.

The first requirement of a type system is that it satisfies a type
safety result: if a method or procedure is declared to have arguments
and results of certain types, and the arguments passed actually have
the given types, then the result must have a given type.  There is a
long history of work on mathematically formulating and proving such
safety results, stretching back to the 1930's.  Recently, there has
been a lot of work on proving type safety for Java.  This is of
particular interest because the security properties of Java depend
crucially on type safety, and indeed some bugs found in Java security
are intimately tied to the type system.

Java, DTDs, and Xduce all fit into this mathematical framework.  I do
not know how to fit xsi:type into this framework, and there are reasons
to suspect it would be hard to do so.  The xsi:type construct looks a
lot like what type theorists call a `dependent type' --- the type of
one part of the data depends on the type of another part of the data.
While some type systems involving dependent types are known, they are
generally considered to be much harder to deal with than other type
systems.   (I can elaborate, if you want to hear me go on about
complete inference systems and the like ...)

So, the bottom line is that there are many well-understood properties
of types.  I know how to apply these to DTDs or to the type system of
Xduce, but I do not know how to apply them to xsi:type.

The algebra proposal being designed by the LA group is typed, and
pretty much the same considerations that apply to programming
languages also apply to it.  Among other things, we are hoping to
replace our earlier hybrid proposal (which had tuples and lists as
well as Node) by a unified proposal (where everything is XML data) ---
this only seems feasible to us if there is a typed system, since the
types carry information that was previously carried by non-XML
components like tuples.  So xsi:type may throw a real monkey-wrench
into the works, as far as our work on the algebra is concerned.

Cheers,  -- P

Received on Friday, 19 May 2000 13:28:40 UTC