Re: Impact of XML on Data Modeling

On Wed, 30 Jan 2008 04:33:28 -0000, Tsao, Scott <scott.tsao@boeing.com>  
wrote:

> If these observations are correct, my next question would be: Is the W3C
> XML Schema the best choice on the market today for data modeling in the
> XML world?  (why or why not)

If your only concern is a single technology, then you can get away with  
only using a physical model.  Which is to say that if XML is your only  
concern, you could do your data modelling in an XML schema language (and  
introducing a logical model might not be very beneficial in practice;  
there is a cost to using layered models, and you generally only get a pay  
back on that cost (a) if you need to implement the same data model across  
multiple technologies, e.g. databases and Java/C# as well as XML).

As for which is best, my personal rule of thumb is that W3C XML Schema is  
the best choice where you are dealing with "data-oriented" XML, i.e. XML  
where there isn't much mixed content, and the sequencing of XML child  
elements within a parent element is often not important to the  
interpretation of the data.  By contract, for "document-oriented" XML,  
i.e. XML where there is a significant amount of mixed content, and the  
sequencing of XML elements is usually important, I would suggest RELAX NG  
(but I say that as someone who works almost exclusively in the  
"data-oriented" world).

That said, I've worked with customers who have large numbers of complex  
W3C XML Schemas, and if there are lots of "includes" and "imports" that  
introduce dependencies between those Schemas (as there often are), they  
can become difficult to understand and maintain using XML Schema editors.   
When things get to that scale, I find it works better to introduce a  
higher-level model of some sort, so that the set of XML Schemas becomes  
more like a repository of re-usable XML types.  Some UML tools now do a  
good job of this, and I also had a lot of real-world success using IONA  
Artix Data Services to create a repository of types from which I generated  
hundreds of Schemas which shared types at the repository level, but didn't  
have and Schema "includes", making them easier to deploy and understand.   
Note that this repository isn't a logical model, it's a physical model  
that abstracts away one particular physical issue (which type is defined  
in which file).

Perhaps that's a long way of saying that for larger scale projects, it  
isn't just about the modelling language that you choose, it's also about  
your methodology for working with large models with complex  
interrelationships between types and other definitions.

Cheers, Tony.
-- 
Anthony B. Coates
London, UK
UK: +44 (20) 8816 7700, US: +1 (239) 344 7700
Mobile/Cell: +44 (79) 0543 9026
abcoates.work@yahoo.co.uk

Received on Wednesday, 30 January 2008 21:08:07 UTC