An argument for bridging information models and ontologies at the syntactic level from Ogbuji, Chimezie on 2008-03-26 (public-hcls-coi@w3.org from January to March 2008)

From: Ogbuji, Chimezie <OGBUJIC@ccf.org>
Date: Tue, 25 Mar 2008 21:07:19 -0400
To: public-hcls-coi@w3.org, public-semweb-lifesci@w3.org
Message-ID: <2702D0EBA4F0A749968E52E8644184EA2888DA@CCHSCLEXMB59.cc.ad.cchs.net>
For some time I have had a concern about a theme in the more common approaches to bridging  
information models and ontologies as a path towards bringing the advantages of the Semantic Web technologies to 'legacy' healthcare terminology systems.  

I wanted to speak on this topic  for some time but have hesitated mostly because my thoughts were not fully baked and (in addition) I thought this anti-pattern was an anomaly, but today's conversation during the COI teleconference suggested that I should speak up about it.

To get right to the point, 1) I consider approaches that attempt to perform this bridging directly between information models and ontologies as examples of this 'anti-pattern.' 2) I think that performing this bridging at the syntactic level addresses the important problem of properly separating these two  in a way that emphasizes their strengths.  

I would like to offer an alternative view point because I think consensus on this particular topic is a significant roadblock to a clear path for moving healthcare terminology systems more towards formal knowledge representation (where they need to be) in a way that doesn't do so at the expense of the strengths of information models and conceptual models ('models of meaning' or ontologies, etc..).

Information models are better equipped to handle messaging, data manipulation, validation, document management (and structured, controlled data entry) than most (I'd venture to say 'all') formal knowledge representations and knowledge representations are better equipped to handle expressive conceptualizations of the real world and inference.  Neither should attempt to do the job of the other and doing so seems fundamentally problematic to me. 

In a perfect world, a messaging dialect (such as HL7 RIM or even Atom for that matter) would be developed with a formal conceptualization as part of its specification.  This conceptualization would be captured in a formal knowledge representation (such as some particular fragment of FOL, for instance) as a way to reach consensus on the 'real world' entities that the messages refer to.  

Such a conceptualization would re-use philosophical precedent in categorizing these real world entities in a well understood (and fairly rigorous) way.  This could bottom out in an alignment with a particular (high fidelity) upper ontology (Cyc, DOLCE, and BFO come to mind) and fleshing out specializations relevant to the particular domain associated with the messages (healthcare in the case of HL7 RIM and “syndication of web content” in the case of Atom).

Consensus on this formal, conceptual model would happen first and then would soon be followed by a process for defining what the syntax would look like (independent of what instances of the syntax denote in the conceptual model).  This separation minimizes interference between concerns about data structures and characteristics of the relevant categories of real world entities that the data structures represent.  

I consider this separation a good practice and it is (perhaps) no surprise that this is how most Semantic Web knowledge representation dialects are formulated (OWL 1.1 and RIF for instance): First there is consensus on their semantics then there is a dialog about how the language is serialized.  Even if they don't happen in that particular order they typically happen independently.

Unfortunately, with regard to healthcare terminologies, we have a situation where there is a large, well-deployed (or at least widely adopted) information model for messaging that was developed without a rigorous (formal) semantics but that is fairly robust with respect to data structures, messaging, syntax, and such.

There are two ways to skin this cat, IMHO.  You can attempt to capture both the information model as well as the conceptualization (or ontology) in a formal knowledge representation (which seems to be the more common approach).  Or you can leave the information model as it is and instead map its (XML) serializations into a corresponding knowledge representation serialization (RDF) that conforms to either a pre-existing conceptual model of healthcare (expressed in OWL) or one that was developed in order formalize the conceptualization of the real world implicitly referenced by the information model.  In the latter case (where, for example, a 'custom' model of meaning for HL7 RIM is developed and expressed formally in OWL) I think it is incredibly important that such a model does not inherit any notions of data constructs, validation, etc. since the necessity of this is completely removed by the syntactic mapping.

There are many parallels between the question of how you deal with HL7 in this way and questions that the GRDDL WG discussed about how Atom syndication content (for which there is plenty in the wild) could be mapped to RDF using a syntactic transformation (which is all GRDDL really is when you boil it down).  Would this involve reusing an already existing ontology of web content (independent of Atom) as the target RDF syntax or would an ontology specifically crafted for Atom (which inherits all the idiosyncrasies of Atom) be adopted instead?

In short, I think developing a syntactic mapping eliminates the need to basically bastardize a knowledge representation into doing what it was never designed to do (capture structural, representationsl, and data-oriented constraints).  Leave that to the originating model (which, by all accounts, has done that particular job quite well).  My concern that this is a better practice has been the main reason why most of my attempts to demonstrate the value of aligning HL7 to 'reference ontologies' for healthcare have been through the use of syntactic mappings (via GRDDL for instance) than to try to bite off an unnecessarily large chunk of capturing both an information model and a model of meaning in a single framework.

My $0.02 (and more)

Chimezie (chee-meh) Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org



P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2007).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Wednesday, 26 March 2008 01:08:20 UTC