Re: Impact of XML on Data Modeling

On Fri, 01 Feb 2008 18:28:46 -0000, Jack Lindsey <tuquenukem@hotmail.com>  
wrote:

> I now need to define an approach to XML reuse in an organization that is  
> typically the 800-pound gorilla in the room, rather than a peer in a  
> data exchange community (i.e. no appetite for UBL, OAGIS, etc), but  
> where the management of the different business lines like to maintain  
> fences between themselves, encouraged by the IT funding model.  I assume  
> your apprehension about tying everything together with namespace  
> includes/imports is a lack of flexibility in practice?  Could you expand  
> on that, please?

The problem here is particular with Schema includes, rather than imports.   
The problem, in a nutshell, is that when you include a Schema, you usually  
aren't aware of exactly which definitions you have included.  People tend  
to add an include in order to access a particular  
type/element/attribute/etc. definition, but when you come back later to  
maintain the Schema, you don't know which includes were added for which  
types/etc.

Why is this an issue?  The worst problem in when you get a circular set of  
includes, e.g. A includes B, B includes C, C includes A (where A, B, and C  
are Schema files).  Once you have a circular relationship, you get a very  
tight coupling which makes those files like one large file, for the  
purposes of maintenance.  If you were relying on using separate files so  
that you had smaller groups of definitions that can be edited  
independently, you lose that when you have a circular inclusion  
relationship.

Additionally, sometimes the same definitions are made available via  
multiple include paths.  This makes it hard to resolve circular inclusion  
relationship; you think you've broken the circle, but then you find it  
still exists via another chain of includes.

This is not just a theoretical issue, I spent quite a lot of time for a  
client writing scripts to analyse these dependencies (which have to be  
analysed at the per-definition level, not just the per-file level) so I  
could resolve them in a production set of Schemas.  People had added  
includes over time to access particular definitions, and the result was a  
these circular inclusion relationships; it's easy to add includes without  
realising that you have created a circular dependency.

Another issue is that, if the same file is included multiple times via  
multiple paths (e.g. A includes B which includes C, and A also includes D  
which includes C), depending on the relative paths, the Schema validator  
won't always realise that "C" is the same file in both cases, and will  
issue errors based on the same definitions (apparently) being defined  
twice.  You don't see this problem if all Schema files in the same  
directory, but I have had it when the Schema files have been distributed  
among a hierarchy of directories (for organisational purposes).  I think  
this issue only occurs for include paths containing "..", i.e. include  
paths which explicitly or implicitly referenc a parent or ancestor  
directory.

These particular problems go away if, instead of using includes, you  
generate Schema without includes, i.e. each Schema file (for a particular  
namespace) is generated with its own full set of the definitions that it  
uses.  In order that you retain an appropriate centralised management of  
common definitions, you need generate such Schemas from a source model of  
some sort.

Cheers, Tony.
-- 
Anthony B. Coates
Senior Partner
Miley Watts LLP
Experts In Data
UK: +44 (20) 8816 7700, US: +1 (239) 344 7700
Mobile/Cell: +44 (79) 0543 9026
Data standards participant: genericode, ISO 20022 (ISO 15022 XML),  
UN/CEFACT, MDDL, FpML, UBL.
http://www.mileywatts.com/

Received on Sunday, 3 February 2008 12:07:15 UTC