- From: Michael Kay <mike@saxonica.com>
- Date: Tue, 29 Jan 2008 10:13:48 -0000
- To: "'Tsao, Scott'" <scott.tsao@boeing.com>, <xmlschema-dev@w3.org>
ST> I recently read an article A Few Thoughts on Data Modeling and Kids' Soccer - An Interview with William G. Smith <http://www.wilshireconferences.com/interviews/smith.htm> . In this article Mr. Smith advocates a 3-schema architecture for data modeling, i.e., Conceptual, Logical, and Physical data models. I understand that this architecture has been a popular approach by information architects in the late 80's to early 90's, under the banner of Data or Information Resource Management (DRM or IRM). MK> I think the three-schema architecture is rather older than that - mid 70s; it was introduced in a monumentally unreadable report known as the ANSI-SPARC architecture (SPARC = Standards Planning and Review Committee, the equivalent of W3C's TAG?). I don't think it was every popular in engineering practice, but it was very much talked about by the kind of people who speak at conferences. MK> The best definition I ever heard for "Conceptual Schema" was "A schema written in a language that hasn't been invented yet". The Conceptual Schema was a holy grail: as soon as you wrote concrete syntax everyone claimed it was a Logical Schema and not conceptual at all. Though I think that as the idea evolved, and relational implementations became more popular, people started to adopt the label "conceptual schema" to refer to the analysis layer (the entity-relationship model or UML class model or whatever) as distinct from the relational database table design: basically, models that were used for communication between people rather than to control the behaviour of software. MK> Of course in the mid 70s everyone was obsessed with database design as the central point of an information architecture. By the mid 80s many people had started to grasp that databases were usually encapsulated within individual applications and it was the messaging backbone that you really needed to get control of. MK> Information Resource Management is a term I see as somewhat orthogonal. It was adopted as a grander name for what had earlier been called "data dictionaries", when the relational database vendors appropriated the name data dictionary to describe their simple metadata catalogs. I think it's a concept we desperately need to reinvent. When I do consulting with people struggling to define a portfolio of 400 application-to-application messages, all sharing common data elements, I remember fondly the days when we modelled this all in a data dictionary, with links from the detailed message formats to the data flow models of the applications they were implementing. ST> It seems to me that, other than the Conceptual data model, the Logical and Physical data models no longer directly apply to the XML approach of data modeling (e.g., using the W3C XML Schema). For example, Mr. Smith talks about a 3NF logical data model, which only applies when one is taking a strictly 'relational' approach. So, I was wondering if there is a parallel sets of data models in the XML-based data modeling world. MK> Well, I think it was always part of the ANSI/SPARC approach that the Logical model would have to be expressed in some concrete modelling language, and that there are multiple candidates for this based on different paradigms - 3NF relational models and XML schemas being two of the many candidates (others being object models, entity-relationship models, etc). You can think of these as metamodels - the "model" is a description of the user data, the "metamodel" is a description of the model. No-one should be allowed to claim that the approach depends on a single universal metamodel at this level, let alone that the universal metamodel is relational 3NF. One of the aims of the IRDS standards activity in the 1980s was to define a single metametamodel that was capable of desribing all possible metamodels, but I think it became clear that any metamodel of sufficient power could also be used as the metametamodel. MK> 3NF models of course are much more suitable at the database level than at the data interchange level. Message design needs hierarchic models which is what makes XML so suitable. And the data interchange level these days has much more strategic importance than the database level, because it's all to do with optimizing business processes and value chains. TS> Based on my (limited) understanding of the purpose and techniques mentioned in the article, I would say that an XML schema would correspond to the logical data model, and an XML binding (to a particular database or programming language) would correspond to the physical data model. MK> My view of the physical model has always been that it is concerned with things that don't affect the logic of the application, only its 'ilities (availability, dependability, performance, security, potential for change etc). That is, it gets into issues such as partitioning of data on disk and allocation of indexes. I'd say that an XML binding, by contrast, is a mapping between two different logical models of the same data. Regards, Michael Kay Saxonica
Received on Tuesday, 29 January 2008 10:14:02 UTC