W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2004

Re: 3rd try on versioning question

From: Anthony B. Coates <abcoates@idmm.co.uk>
Date: Mon, 18 Oct 2004 19:41:29 +0100
To: xmlschema-dev@w3.org
Message-ID: <opsf2wvfft6mihty@idmm-vaio-vgn>

I think the only thing we could all agree on is that there is no industry  
standard for how to manage versioning and extensibility in XML.  Part of  
the problem is that it really depends on which 'industry' you come from.   
Let's take an example.

The W3C guideline on namespaces and versioning (last I saw) is that  
namespace URIs should not change with versions of a vocabulary.  There  
should be versions of the schema, but all for the same namespace URI.  The  
example was that in (X)HTML, just because you bring out a new version of  
the XHTML Schema(/DTD), the behaviour of the <p> tag won't change.  This  
sounded good at the time, but in practice, it was the wrong call for most  
of us.  HTML is really something of an 'edge' case, because HTML browsers  
are incredibly lenient, and have reasonably well understood rules for how  
to interpret invalid markup (like swapped end tags or unknown  
elements/attributes).  This is only possible because of the huge audience  
that HTML has.  In normal application contexts, there is never a big  
enough audience to justify the cost of engineering this kind of  
flexibility (and in researching all of the things that could go wrong, and  
how to handle them).  Indeed, for transactional messaging, which is  
typically the antithesis of HTML and other information distribution  
formats, you really don't want any implicit conversion between versions or  
support of invalid markup, because that introduces the risk of semantic  
misinterpretations that could cost big money (I do a lot of work in the  
finance world where a single transaction could be involve an 8 or 9 digit  
sum).

The other problem with not versioning namespaces is that the XML Schema  
spec itself suggests that applications can select the Schema to validate a  
message with based solely on the namespace URI.  That pretty much forces  
you to put version information in the namespace URI for most  
applications.  If the world was different, and the Schema spec provided  
support for selecting a Schema based on the namespace URI *and* one or  
more attributes on the top-level element, it would be different.  That  
isn't the case, and so versioning the namespace URI seems to be the way  
forward if you are doing something other than (X)HTML.

On a slightly different tack, I'm led to believe that the TAG has had  
proposals along the lines of supporting a Schema style where (in simple  
terms) 'ANY' content can be added almost anywhere as a single-switch  
setting, to future-proof Schemas.  I have co-authored Schemas that do  
thing kind of thing (e.g. MDDL, http://www.mddl.org/), but it's only  
appropriate for information distribution.  If Schemas supported this as an  
easily-switched on option, transactional XML users would have to  
continually scan their Schemas to make sure that it was switched off.  If  
someone sends you a message with *any* data that you are not expecting,  
you want to reject it, in case that extra data has a semantic impact on  
the meaning of the desired transaction.  It's not something that is worth  
taking a risk on.

I suspect that the driving reason for people wanting this 'solution' to  
versioning/extensibility is because schema 'compilers' have proven very  
popular with Java/C#/etc. developers who don't want to have to understand  
XML.  The problem with them is that they encourage developers to tightly  
couple their application code to a particular version of a schema.  All  
hell then breaks loose when somebody has the temerity to release a new  
version of the schema; the sound of code breaking can be heard for miles.   
This is a long-winded way of saying that some of the issues here are  
really not to do with Schemas, but with how people bind them to their  
code.  The "ANY content anywhere" solution to future-proofing is really  
just a sop to schema compilers, in my opinion.  What you actually need is  
for the application to have its own internal data model, and for a tool to  
provide easy support for mapping from the schema model to the application  
model.  That support should include support for mappings from multiple  
schema versions to the same application data model (where feasible), and  
it should allow an existing mapping to be easily modified when the change  
between two versions of a schema is small.

So, my (long winded) answer is that while there is no single industry  
standard on versioning and extensibily, I would suggest the following:

1. Version your namespace URIs, unless you can really justify the cost of  
building a compliant and version-tolerant XML processing module into your  
application;

2. Where what you are doing is information distribution, where adding new  
information does not affect the semantics of the existing information, use  
'xsd:any' content (usually with the '##other' option) at all points where  
extensibility is required.  Post-processing the Schema with an XSLT script  
is a way of approaching this that avoids the quality control pitfalls of  
many enhancement, but be aware that XML Schema allows some constructs to  
be expressed in multiple ways, which complicates the post-processing  
script.

3. Where you are doing transactional XML, and no possible  
misinterpretation of semantics can be allowed, just create separate  
versions of the Schemas and plan to use a mapping layer between each  
version of the Schema and the application's data model.

I wish the answer was neater than that, but it isn't in my experience.

	Cheers,
		Tony.

On Sat, 16 Oct 2004 11:28:54 -0700 (PDT), <dean@xsoftware.biz> wrote:

> I have asked a few times, and have not quite given up just yet.  I am
> looking for an industry standard on how to do versioning and
> extensibility.  It would be great if there was a standard way of doing
> this.  I have used some standards that kind of messed up and we couldn't
> extend it.

-- 
Anthony B. Coates, Director
Information Design, Messaging and Management
mailto:abcoates@idmm.co.uk
Mobile/Cell: +44 (79) 0543 9026
--
MDDL Editor (Market Data Definition Language)
http://www.mddl.org/
FpML AWG Member (Financial Products Markup Language)
http://www.fpml.org/
Received on Monday, 18 October 2004 18:42:05 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:15:11 UTC