Re: change the question slightly maybe...schemas, leveraging their object orientedness??

Dean Hiller writes:

>> If XSD was object-oriented, it would only care about the superclasses 
unless Company C had companyB's xsd so they could validate the new feature 
too.

>> I think XSD would be so much easier to go from version to version of all 
the different protocols out there if XSD was more object oriented. 

I am the first to agree that versioning of vocabularies is crucial to the 
success of XML, and I have for three years been outspoken in my 
disappointment that the W3C community hasn't done a better job of tackling 
that issue.  I think that (a) doing this right is a very, very hard 
problem (b) there is quite a range of important use cases ranging from 
handling of small bug fixes, such as allowing a new attribute or new 
attribute value, to rather major changes to parts of a vocabulary and (c) 
there is some reason to suspect that in an ideal world, the solutions 
would involve not just schema, but perhaps changes to the way namespaces 
work, and perhaps new models of processing XML.

That said, I wouldn't leap too quickly to the assumption that "if only 
schema had exploited the well known principles of object orientation, we'd 
all be fine by now."  First of all, months of study were given to the use 
of object orientation in schemas, and when you look carefully, there 
really are a number of reasons that people use object orientation, and 
interop across versions is just one.   For example, object orientation is 
commonly used to allow for reuse and evolution at the source level.  I 
believe that XML schema succeeds at this to a degree, both in the type 
system and with substitution groups.

The more fundamental point I wanted to make is that I think your statement 
misses a key point about object orientation.  Indeed, it probably took a 
year of work on XML and schema for me to learn or at least notice this: in 
fundamental ways, XML (not just schema) and object orientation are at 
odds.  The fundamental premise of object orientation in programming is 
that state is hidden behind behavior.  You don't expose data, you expose 
methods, and typically you expose those methods  by name.   In certain 
respects, XML is a reaction to object orientation:  when people tried to 
loosely couple the method-oriented object structure of their systems using 
COM and Corba, the resulting contracts were too detailed for some 
purposes.  It turns out that when I want to order a book from Amazon, I 
don't want to know about the object structure (if any) if their 
implementation, I just want to send them a document representing a book 
order.  Indeed, many different operations can be performed on that 
document including signing it, filing it, updating it with costs and 
expected shipping dates etc, but unlike an object, a document carries no 
behavior?

Why does this relate to the versioning issue?  Well, look at how object 
oriented systems achieve the behavior to which you refer.  They impose a 
level of indirection, often referred to as virtual method calling, between 
those looking for data and those providing it.  Thus, subclasses can come 
and go, as long as the contract at the method level is preserved. If you 
have interfaces, then sometimes only the pertinent subset of the contract 
needs to be preserved as the system evolves (I.e. you need to support the 
interface of interest.)

XML is completely different.  I can write all the schemas I want to check 
your XML document, but when it's handed to your application the data is 
right there for you to see.  It's not just the schema language that needs 
to adapt to versions, it's your application.  Consider a simple example 
where someone extends a phone-number element to include a country code 
attribute.  With some work one can write a schema for the original 
application that will accept the new, previously unknown country code 
element when it shows up.  The question is, what does the app do with it? 
The schema may ignore it, but a dialing applicatoin better not.  Indeed, 
in the US, it would need to know to prefix it with an 011 if the code is 
other than country code 1, and dial that country code.  In an 
object-oriented programming system, you'd have a new dial method that 
would overload the old one, and would preserve the interface.

Now you can see one of the reasons we didn't do multiple inheritance. With 
single inheritance we can at least have the (somewhat messy) convention 
that extensions are at the end of the structure.  What do you do with 
multiple inheritance?  How do you help an application keep track of where 
one base class is contributing content and then another?  I'm not saying 
it's impossible, but I am saying the issues are quite different from what 
you find in an OO system, where behavior wraps data.

Finally, I believe this is one of the reasons that refinement and 
extension are called out separately in schema, something which is less 
common in OO systems.  Java doesn't particularly distinguish a subclass 
which just provides a subset of the data of its base (e.g. restricts 
integers to values < 65535) from subclasses that provide extended 
behavior;  both are declared in the same way.  In a data system like 
schema, it's much more helpful to call out "this subclass is really a 
subset of the base", because that's the case where you know that an old 
application can handle the new data.

So, I agree versioning is important.  I'm less convinced that the answer 
for XML is primarily to be found in the world of object orientation.  XML 
is data without behavior, somewhat the opposite of OO.

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------

Received on Tuesday, 21 October 2003 11:11:10 UTC