RE: An argument for bridging information models and ontologies at the syntactic level from Miller, Michael D (Rosetta) on 2008-03-27 (public-hcls-coi@w3.org from January to March 2008)

From: Miller, Michael D (Rosetta) <Michael_Miller@Rosettabio.com>
Date: Thu, 27 Mar 2008 14:19:31 -0700
To: "jim herber" <jimherber@gmail.com>, "Booth, David (HP Software - Boston)" <dbooth@hp.com>
cc: "Ogbuji, Chimezie" <OGBUJIC@ccf.org>, public-hcls-coi@w3.org, public-semweb-lifesci@w3.org
Message-ID: <C9EDB84D403E654CB78E37A506E406AF01CC3A12@ussemx1101.merck.com>
hi all,
 
yes, i also agree that these are great points, except for a quibble.
 
"Data models like schemas, structures, and data formats are
implementation details"
 
in Model Driven Architecture (MDA), the Platform Independent Model data
model is free of implementation details.  as a developer who works
primarily with data models, whenever i deal with ontologies, it is at
the implementation level for ontologies.  that is, if i have a
transcript, i wish to know what gene ontology terms it is associated
with so that i can make inferences about other transcripts in regards to
their gene expression under certain conditions.
 
i also thought it worth mentioning that in the development of the data
model FuGE for functional genomics, we purposely developed the ontology
package so that objects would reference into ontologies cleanly to
maintain the separation in the points below.
 
cheers,
michael
Michael Miller 
Lead Software Developer 
Rosetta Biosoftware Business Unit 
www.rosettabio.com 


________________________________

	From: public-semweb-lifesci-request@w3.org
[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of jim herber
	Sent: Wednesday, March 26, 2008 10:22 AM
	To: Booth, David (HP Software - Boston)
	Cc: Ogbuji, Chimezie; public-hcls-coi@w3.org;
public-semweb-lifesci@w3.org
	Subject: Re: An argument for bridging information models and
ontologies at the syntactic level
	
	
	Chimezie, excellent observation.  Agree with principals you are
articulating.  
	
	I would add:
	
	1. .
	2. Concept models operate at many levels.  As an example,
concept models may represent the entire data model as a concept, or they
may point at an element within a data model as a concept.
	3. Different concept models that are unrelated or loosely
related may reference the same data model.
	4. Keeping the two (data models and conceptual models) separate
allows them to evolve independently.
	5. Pulling out the mapping versus attempting to represent
mapping and data model in conceptual language fits a basic tenant of
engineering principals, that is "loosely coupled modules with highly
cohesive functionality".
	
	David, do you like "data model to conceptual mapping" better?
	
	
	Jim Herber
	Independent Consultant
	jimherber_at_ gmail.com
	
	
	On Wed, Mar 26, 2008 at 11:47 AM, Booth, David (HP Software -
Boston) <dbooth@hp.com> wrote:
	


		+1.  Except I find the term "syntactic mapping" somewhat
misleading, because to my mind, the anti-pattern you are describing
involves the encoding of syntactic-level concerns into the ontology,
which as you point out, shouldn't be there.  So pertonally I would have
been more inclined to call it "semantic mapping", but maybe someone else
has a better idea.
		
		
		David Booth, Ph.D.
		HP Software
		+1 617 629 8881 office  |  dbooth@hp.com
		http://www.hp.com/go/software
		
		Opinions expressed herein are those of the author and do
not represent the official views of HP unless explicitly stated
otherwise.
		


		> -----Original Message-----
		> From: public-semweb-lifesci-request@w3.org
		> [mailto:public-semweb-lifesci-request@w3.org] On
Behalf Of
		> Ogbuji, Chimezie
		> Sent: Tuesday, March 25, 2008 9:07 PM
		> To: public-hcls-coi@w3.org;
public-semweb-lifesci@w3.org
		> Subject: An argument for bridging information models
and
		> ontologies at the syntactic level
		>
		> For some time I have had a concern about a theme in
the more
		> common approaches to bridging information models and
		> ontologies as a path towards bringing the advantages
of the
		> Semantic Web technologies to 'legacy' healthcare
terminology systems.
		>
		> I wanted to speak on this topic  for some time but
have
		> hesitated mostly because my thoughts were not fully
baked and
		> (in addition) I thought this anti-pattern was an
anomaly, but
		> today's conversation during the COI teleconference
suggested
		> that I should speak up about it.
		>
		> To get right to the point, 1) I consider approaches
that
		> attempt to perform this bridging directly between
information
		> models and ontologies as examples of this
'anti-pattern.' 2)
		> I think that performing this bridging at the syntactic
level
		> addresses the important problem of properly separating
these
		> two  in a way that emphasizes their strengths.
		>
		> I would like to offer an alternative view point
because I
		> think consensus on this particular topic is a
significant
		> roadblock to a clear path for moving healthcare
terminology
		> systems more towards formal knowledge representation
(where
		> they need to be) in a way that doesn't do so at the
expense
		> of the strengths of information models and conceptual
models
		> ('models of meaning' or ontologies, etc..).
		>
		> Information models are better equipped to handle
messaging,
		> data manipulation, validation, document management
(and
		> structured, controlled data entry) than most (I'd
venture to
		> say 'all') formal knowledge representations and
knowledge
		> representations are better equipped to handle
expressive
		> conceptualizations of the real world and inference.
Neither
		> should attempt to do the job of the other and doing so
seems
		> fundamentally problematic to me.
		>
		> In a perfect world, a messaging dialect (such as HL7
RIM or
		> even Atom for that matter) would be developed with a
formal
		> conceptualization as part of its specification.  This
		> conceptualization would be captured in a formal
knowledge
		> representation (such as some particular fragment of
FOL, for
		> instance) as a way to reach consensus on the 'real
world'
		> entities that the messages refer to.
		>
		> Such a conceptualization would re-use philosophical
precedent
		> in categorizing these real world entities in a well
		> understood (and fairly rigorous) way.  This could
bottom out
		> in an alignment with a particular (high fidelity)
upper
		> ontology (Cyc, DOLCE, and BFO come to mind) and
fleshing out
		> specializations relevant to the particular domain
associated
		> with the messages (healthcare in the case of HL7 RIM
and
		> "syndication of web content" in the case of Atom).
		>
		> Consensus on this formal, conceptual model would
happen first
		> and then would soon be followed by a process for
defining
		> what the syntax would look like (independent of what
		> instances of the syntax denote in the conceptual
model).
		> This separation minimizes interference between
concerns about
		> data structures and characteristics of the relevant
		> categories of real world entities that the data
structures represent.
		>
		> I consider this separation a good practice and it is
		> (perhaps) no surprise that this is how most Semantic
Web
		> knowledge representation dialects are formulated (OWL
1.1 and
		> RIF for instance): First there is consensus on their
		> semantics then there is a dialog about how the
language is
		> serialized.  Even if they don't happen in that
particular
		> order they typically happen independently.
		>
		> Unfortunately, with regard to healthcare
terminologies, we
		> have a situation where there is a large, well-deployed
(or at
		> least widely adopted) information model for messaging
that
		> was developed without a rigorous (formal) semantics
but that
		> is fairly robust with respect to data structures,
messaging,
		> syntax, and such.
		>
		> There are two ways to skin this cat, IMHO.  You can
attempt
		> to capture both the information model as well as the
		> conceptualization (or ontology) in a formal knowledge
		> representation (which seems to be the more common
approach).
		> Or you can leave the information model as it is and
instead
		> map its (XML) serializations into a corresponding
knowledge
		> representation serialization (RDF) that conforms to
either a
		> pre-existing conceptual model of healthcare (expressed
in
		> OWL) or one that was developed in order formalize the
		> conceptualization of the real world implicitly
referenced by
		> the information model.  In the latter case (where, for
		> example, a 'custom' model of meaning for HL7 RIM is
developed
		> and expressed formally in OWL) I think it is
incredibly
		> important that such a model does not inherit any
notions of
		> data constructs, validation, etc. since the necessity
of this
		> is completely removed by the syntactic mapping.
		>
		> There are many parallels between the question of how
you deal
		> with HL7 in this way and questions that the GRDDL WG
		> discussed about how Atom syndication content (for
which there
		> is plenty in the wild) could be mapped to RDF using a
		> syntactic transformation (which is all GRDDL really is
when
		> you boil it down).  Would this involve reusing an
already
		> existing ontology of web content (independent of Atom)
as the
		> target RDF syntax or would an ontology specifically
crafted
		> for Atom (which inherits all the idiosyncrasies of
Atom) be
		> adopted instead?
		>
		> In short, I think developing a syntactic mapping
eliminates
		> the need to basically bastardize a knowledge
representation
		> into doing what it was never designed to do (capture
		> structural, representationsl, and data-oriented
constraints).
		>  Leave that to the originating model (which, by all
accounts,
		> has done that particular job quite well).  My concern
that
		> this is a better practice has been the main reason why
most
		> of my attempts to demonstrate the value of aligning
HL7 to
		> 'reference ontologies' for healthcare have been
through the
		> use of syntactic mappings (via GRDDL for instance)
than to
		> try to bite off an unnecessarily large chunk of
capturing
		> both an information model and a model of meaning in a
single
		> framework.
		>
		> My $0.02 (and more)
		>
		> Chimezie (chee-meh) Ogbuji
		> Lead Systems Analyst
		> Thoracic and Cardiovascular Surgery
		> Cleveland Clinic Foundation
		> 9500 Euclid Avenue/ W26
		> Cleveland, Ohio 44195
		> Office: (216)444-8593
		> ogbujic@ccf.org
		>
		>
		>
		> P Please consider the environment before printing this
e-mail
		>
		>
		>
		> Cleveland Clinic is ranked one of the top hospitals in
		> America by U.S. News & World Report (2007).
		> Visit us online at http://www.clevelandclinic.org for
a
		> complete listing of our services, staff and locations.
		>
		>
		> Confidentiality Note:  This message is intended for
use only
		> by the individual or entity to which it is addressed
and may
		> contain information that is privileged, confidential,
and
		> exempt from disclosure under applicable law.  If the
reader
		> of this message is not the intended recipient or the
employee
		> or agent responsible for delivering the message to the
		> intended recipient, you are hereby notified that any
		> dissemination, distribution or copying of this
communication
		> is strictly prohibited.  If you have received this
		> communication in error,  please contact the sender
		> immediately and destroy the material in its entirety,
whether
		> electronic or hard copy.  Thank you.
		>
		>
Received on Thursday, 27 March 2008 21:20:47 UTC