Re: Simplifying XML Schema

> Phil:
> We spent a lot of time debating the two options you discuss:
> 
>     <billTo xsi:type="ipo:US-Address">
>         <name>Robert Smith</name>
>         <street>8 Oak Avenue</street>
>         <city>Old Town</city>
>         <state>PA</state>
>         <zip>95819</zip>
>     </billTo>
> 
>     <billTo>
>         <US-Address>
>             <name>Robert Smith</name>
>             <street>8 Oak Avenue</street>
>             <city>Old Town</city>
>             <state>PA</state>
>             <zip>95819</zip>
>          </US-Address>
>     </billTo>
> 
> when you declared US-Address and UK-Address you specified that
> they could appear wherever Address was declared.  In the instance
> you need to tell the validator exactly what type to validate for.
> The xsi:type and the additional level of nesting are two different
> ways of conveying that information.
> 
> To me it came down to an aethetic choice and we chose not to add
> the extra level of nesting.  I'm not a deep type theorist so I
> could not see a strong argument either way.  Perhaps you can.
> 
> Regards, Ashok

Thanks for this comment.  Indeed, it is very difficult to fit
xsi:type into existing type systems, and this is one reason for
my worry.  This is directly relevant to the work of Query,
because we are hoping to have a typed query algebra.

Everyone is familiar with types.  For instance, in Java you can
declare that a method receives arguments of certain type and returns
results of a given type, and the compiler will warn you if this is not
the case.  Much the same happens in many programming languages we are
familiar with (scripting languages like Perl being a notable
exception).

If one uses the DOM, this is not much help.  In Java, every XML tree
is represented by the DOM type Node.  For a method that processes XML,
saying that the inputs are of type Node and the output of type Node is
not an enormous help (essentially, it reduces the type system of Java
to give you no more help then you get in Perl).

If one uses a more sophisticated system, like the XML Bean Generator
from IBM Alphaworks or SOX from Commerce One, then the situation is
much better.  The Bean Generator takes a DTD as input, and produces a
collection of Java class and interface declarations as output.  Each
element type in the DTD (book, author, address, UK-address, what have
you) yields a corresponding Java class declaration.  The generated
classes include methods that automatically parse XML and yield
corresponding Java objects, or serialize objects as XML.  Now one can
have a method that, say, takes a Book and returns an Author, which is
much more useful than taking a Node and returning a Node.

Similarly, XML programming languages may be typed or untyped.
Xduce is a typed language, XSLT is an untyped language.  So in
Xduce one can specify that a procedure takes a Book and returns
an Author, and the compiler will check this for you.  There is
no way to get the compiler to check similar information in XSLT.

The first requirement of a type system is that it satisfies a type
safety result: if a method or procedure is declared to have arguments
and results of certain types, and the arguments passed actually have
the given types, then the result must have a given type.  There is a
long history of work on mathematically formulating and proving such
safety results, stretching back to the 1930's.  Recently, there has
been a lot of work on proving type safety for Java.  This is of
particular interest because the security properties of Java depend
crucially on type safety, and indeed some bugs found in Java security
are intimately tied to the type system.

Java, DTDs, and Xduce all fit into this mathematical framework.  I do
not know how to fit xsi:type into this framework, and there are reasons
to suspect it would be hard to do so.  The xsi:type construct looks a
lot like what type theorists call a `dependent type' --- the type of
one part of the data depends on the type of another part of the data.
While some type systems involving dependent types are known, they are
generally considered to be much harder to deal with than other type
systems.   (I can elaborate, if you want to hear me go on about
complete inference systems and the like ...)

So, the bottom line is that there are many well-understood properties
of types.  I know how to apply these to DTDs or to the type system of
Xduce, but I do not know how to apply them to xsi:type.

The algebra proposal being designed by the LA group is typed, and
pretty much the same considerations that apply to programming
languages also apply to it.  Among other things, we are hoping to
replace our earlier hybrid proposal (which had tuples and lists as
well as Node) by a unified proposal (where everything is XML data) ---
this only seems feasible to us if there is a typed system, since the
types carry information that was previously carried by non-XML
components like tuples.  So xsi:type may throw a real monkey-wrench
into the works, as far as our work on the algebra is concerned.

Cheers,  -- P

Received on Thursday, 18 May 2000 14:52:06 UTC