RE: Impact of XML on Data Modeling

ST>	I recently read an article A Few Thoughts on Data Modeling and Kids'
Soccer - An Interview with William G. Smith
<http://www.wilshireconferences.com/interviews/smith.htm> .  In this article
Mr. Smith advocates a 3-schema architecture for data modeling, i.e.,
Conceptual, Logical, and Physical data models.  I understand that this
architecture has been a popular approach by information architects in the
late 80's to early 90's, under the banner of Data or Information Resource
Management (DRM or IRM).

MK> I think the three-schema architecture is rather older than that - mid
70s; it was introduced in a monumentally unreadable report known as the
ANSI-SPARC architecture (SPARC = Standards Planning and Review Committee,
the equivalent of W3C's TAG?). I don't think it was every popular in
engineering practice, but it was very much talked about by the kind of
people who speak at conferences.

MK> The best definition I ever heard for "Conceptual Schema" was "A schema
written in a language that hasn't been invented yet". The Conceptual Schema
was a holy grail: as soon as you wrote concrete syntax everyone claimed it
was a Logical Schema and not conceptual at all. Though I think that as the
idea evolved, and relational implementations became more popular, people
started to adopt the label "conceptual schema" to refer to the analysis
layer (the entity-relationship model or UML class model or whatever) as
distinct from the relational database table design: basically, models that
were used for communication between people rather than to control the
behaviour of software.

MK> Of course in the mid 70s everyone was obsessed with database design as
the central point of an information architecture. By the mid 80s many people
had started to grasp that databases were usually encapsulated within
individual applications and it was the messaging backbone that you really
needed to get control of.

MK> Information Resource Management is a term I see as somewhat orthogonal.
It was adopted as a grander name for what had earlier been called "data
dictionaries", when the relational database vendors appropriated the name
data dictionary to describe their simple metadata catalogs. I think it's a
concept we desperately need to reinvent. When I do consulting with people
struggling to define a portfolio of 400 application-to-application messages,
all sharing common data elements, I remember fondly the days when we
modelled this all in a data dictionary, with links from the detailed message
formats to the data flow models of the applications they were implementing.
	 
ST> It seems to me that, other than the Conceptual data model, the Logical
and Physical data models no longer directly apply to the XML approach of
data modeling (e.g., using the W3C XML Schema).  For example, Mr. Smith
talks about a 3NF logical data model, which only applies when one is taking
a strictly 'relational' approach.  So, I was wondering if there is a
parallel sets of data models in the XML-based data modeling world.

MK> Well, I think it was always part of the ANSI/SPARC approach that the
Logical model would have to be expressed in some concrete modelling
language, and that there are multiple candidates for this based on different
paradigms - 3NF relational models and XML schemas being two of the many
candidates (others being object models, entity-relationship models, etc).
You can think of these as metamodels - the "model" is a description of the
user data, the "metamodel" is a description of the model. No-one should be
allowed to claim that the approach depends on a single universal metamodel
at this level, let alone that the universal metamodel is relational 3NF. One
of the aims of the IRDS standards activity in the 1980s was to define a
single metametamodel that was capable of desribing all possible metamodels,
but I think it became clear that any metamodel of sufficient power could
also be used as the metametamodel.

MK> 3NF models of course are much more suitable at the database level than
at the data interchange level. Message design needs hierarchic models which
is what makes XML so suitable. And the data interchange level these days has
much more strategic importance than the database level, because it's all to
do with optimizing business processes and value chains.

TS> Based on my (limited) understanding of the purpose and techniques
mentioned in the article, I would say that an XML schema would correspond to
the logical data model, and an XML binding (to a particular database or
programming language) would correspond to the physical data model.  

MK> My view of the physical model has always been that it is concerned with
things that don't affect the logic of the application, only its 'ilities
(availability, dependability, performance, security, potential for change
etc). That is, it gets into issues such as partitioning of data on disk and
allocation of indexes. I'd say that an XML binding, by contrast, is a
mapping between two different logical models of the same data.

Regards,

Michael Kay	
Saxonica

Received on Tuesday, 29 January 2008 10:14:02 UTC