- From: Anthony B. Coates <abcoates@idmm.co.uk>
- Date: Mon, 18 Oct 2004 19:41:29 +0100
- To: xmlschema-dev@w3.org
I think the only thing we could all agree on is that there is no industry standard for how to manage versioning and extensibility in XML. Part of the problem is that it really depends on which 'industry' you come from. Let's take an example. The W3C guideline on namespaces and versioning (last I saw) is that namespace URIs should not change with versions of a vocabulary. There should be versions of the schema, but all for the same namespace URI. The example was that in (X)HTML, just because you bring out a new version of the XHTML Schema(/DTD), the behaviour of the <p> tag won't change. This sounded good at the time, but in practice, it was the wrong call for most of us. HTML is really something of an 'edge' case, because HTML browsers are incredibly lenient, and have reasonably well understood rules for how to interpret invalid markup (like swapped end tags or unknown elements/attributes). This is only possible because of the huge audience that HTML has. In normal application contexts, there is never a big enough audience to justify the cost of engineering this kind of flexibility (and in researching all of the things that could go wrong, and how to handle them). Indeed, for transactional messaging, which is typically the antithesis of HTML and other information distribution formats, you really don't want any implicit conversion between versions or support of invalid markup, because that introduces the risk of semantic misinterpretations that could cost big money (I do a lot of work in the finance world where a single transaction could be involve an 8 or 9 digit sum). The other problem with not versioning namespaces is that the XML Schema spec itself suggests that applications can select the Schema to validate a message with based solely on the namespace URI. That pretty much forces you to put version information in the namespace URI for most applications. If the world was different, and the Schema spec provided support for selecting a Schema based on the namespace URI *and* one or more attributes on the top-level element, it would be different. That isn't the case, and so versioning the namespace URI seems to be the way forward if you are doing something other than (X)HTML. On a slightly different tack, I'm led to believe that the TAG has had proposals along the lines of supporting a Schema style where (in simple terms) 'ANY' content can be added almost anywhere as a single-switch setting, to future-proof Schemas. I have co-authored Schemas that do thing kind of thing (e.g. MDDL, http://www.mddl.org/), but it's only appropriate for information distribution. If Schemas supported this as an easily-switched on option, transactional XML users would have to continually scan their Schemas to make sure that it was switched off. If someone sends you a message with *any* data that you are not expecting, you want to reject it, in case that extra data has a semantic impact on the meaning of the desired transaction. It's not something that is worth taking a risk on. I suspect that the driving reason for people wanting this 'solution' to versioning/extensibility is because schema 'compilers' have proven very popular with Java/C#/etc. developers who don't want to have to understand XML. The problem with them is that they encourage developers to tightly couple their application code to a particular version of a schema. All hell then breaks loose when somebody has the temerity to release a new version of the schema; the sound of code breaking can be heard for miles. This is a long-winded way of saying that some of the issues here are really not to do with Schemas, but with how people bind them to their code. The "ANY content anywhere" solution to future-proofing is really just a sop to schema compilers, in my opinion. What you actually need is for the application to have its own internal data model, and for a tool to provide easy support for mapping from the schema model to the application model. That support should include support for mappings from multiple schema versions to the same application data model (where feasible), and it should allow an existing mapping to be easily modified when the change between two versions of a schema is small. So, my (long winded) answer is that while there is no single industry standard on versioning and extensibily, I would suggest the following: 1. Version your namespace URIs, unless you can really justify the cost of building a compliant and version-tolerant XML processing module into your application; 2. Where what you are doing is information distribution, where adding new information does not affect the semantics of the existing information, use 'xsd:any' content (usually with the '##other' option) at all points where extensibility is required. Post-processing the Schema with an XSLT script is a way of approaching this that avoids the quality control pitfalls of many enhancement, but be aware that XML Schema allows some constructs to be expressed in multiple ways, which complicates the post-processing script. 3. Where you are doing transactional XML, and no possible misinterpretation of semantics can be allowed, just create separate versions of the Schemas and plan to use a mapping layer between each version of the Schema and the application's data model. I wish the answer was neater than that, but it isn't in my experience. Cheers, Tony. On Sat, 16 Oct 2004 11:28:54 -0700 (PDT), <dean@xsoftware.biz> wrote: > I have asked a few times, and have not quite given up just yet. I am > looking for an industry standard on how to do versioning and > extensibility. It would be great if there was a standard way of doing > this. I have used some standards that kind of messed up and we couldn't > extend it. -- Anthony B. Coates, Director Information Design, Messaging and Management mailto:abcoates@idmm.co.uk Mobile/Cell: +44 (79) 0543 9026 -- MDDL Editor (Market Data Definition Language) http://www.mddl.org/ FpML AWG Member (Financial Products Markup Language) http://www.fpml.org/
Received on Monday, 18 October 2004 18:42:05 UTC