Re: Versioning vs Temporal modeling of Patient State from William Bug on 2007-01-11 (public-semweb-lifesci@w3.org from January 2007)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Thu, 11 Jan 2007 18:31:59 -0500
To: "Xiaoshu Wang" <wangxiao@musc.edu>
Cc: "'w3c semweb hcls'" <public-semweb-lifesci@w3.org>
Message-Id: <6DA30B8C-0CB5-4037-9FC1-1B3AF0434161@DrexelMed.edu>
On Jan 11, 2007, at 5:30 PM, Xiaoshu Wang wrote:

> That is why I consider the OBO Foundry's wording "the original URI
> should still point to the old term or concept, even if it is  
> deprecated"
> (From William Bug) is a bit self-contradicted.

Just to be clear, I was quoting Dirk here from his post to this  
thread, not the OBO Foundry.  The wording of the related OBO Foundry  
principles (http://obofoundry.org/) is:

"3. The ontology possesses a unique identifier space within OBO."

"4. The ontology provider has procedures for identifying distinct  
successive versions."

This wording was kept purposefully vague, because there was  
apparently much discussion no real consensus regarding exactly what  
to recommend as an implementation strategy, and strategies might  
differ according the formalism being used (OBO vs. OWL vs. RDF, for  
instance).  At least that was my understanding.

This is very much similar to the strategies that have been devised  
over the last several decades for managing controlled vocabularies 
(CVs).  The A&I industry that uses CVs to index academic literature  
must deal with term deprecation and have done over the years.  One of  
the problems when you are dealing with CVs is, since there is a lack  
of consistent, formally sound underlying semantic framework, even if  
you've developed a means of tracking changes in the CV, and all the  
annotations you've created with the terms - annotations of the  
scientific literature or experimental data repositories - have been  
dated so you can track the chronology of usage relative to the  
evolution of the CV, there is still no clear way to address the issue  
Vipul mentioned, namely how has the change made to a term in the CV  
effected the semantic entailments.  Though CV and thesaurus curators  
have tried to develop best practices over the ages to address this  
specific question (some of which are in SKOS), just as Vipul  
mentions, these are for human consumption at best (and don't really  
do a very effective and consistent job of communicating the changes  
in entailed meaning to humans very well, for that matter).  They most  
definitely do provide what is needed for automatic reasoners to  
negotiate the change in semantic entailment.

Explicit semantic frameworks are designed to over come some of this  
deficit, with their focus on DEFINITIONS.  Many terms in CVs  
completely lacked definitions or when they had them, lacked a  
consistent means to express the intended meaning and usage of a  
term.  The biomedical ontology community has slowly come to recognize  
- with input from applied, formal ontologists, medical  
informaticists, and various C.S. investigators such as logic  
programming & DL experts and computational linguists contributing to  
this field - that there must be a primacy given to definitions (think  
"defined class" in OWL).  Of course, we are still working on the  
preferred means of provided consistent definitions, and it's not at  
all clear where the field currently stands on this issue.   
Practitioners from the fields I list in the previous sentence have  
different ways of specifying what a definition is - though one does  
hope they can all ultimately be kept commensurate (a pipe dream?).

The point Trish was making regarding how this is dealt with  
effectively across formalisms (OBO vs. OWL, for instance) is well  
taken.  I don't believe this has been worked on directly.

However, creating metadata tags to track the evolution of the  
semantic graph is a task OBI has begun to tackle directly (https:// 
www.cbil.upenn.edu/fugowiki/index.php/ 
RepresentationalUnitMetadataTable).  There are many properties -  
still under discussion, as Trish says - this is very much a work in  
progress - specifically designed to track the evolution of the  
underlying semantic graph.  If you search on "Bill Bug follow-up",  
you'll clearly see what I'm referring to.  I group these as the  
AnnotationProperties concerned with details related to "CLASS_ID/ 
CLASS AXIOMS//CLASS ASSERTIONS".  The more I've thought through what  
is presented on that OBO Wiki page - much of it coming from the MSI,  
NCIT, BIRN, & MAGE communities within OBI - the more I think there's  
a need for a even a few more of these properties to track a bit more  
detail on the semantic graph as it evolves over time.

I'm pretty much convinced there's a need to provide this highly  
granular versioning within the TBox.  I completely agree with  
Chimezie's point.  In the ABox, things are completely different,  
though as Kei pointed out, versioning of accession numbers for  
GENBANK entries is certainly an major issue.  To my mind, you can  
think of a GENBANK entry as having a place both in the ABox and the  
TBox, depending on how you believe the implied semantics in that  
artifact are best represented.  There is the experimental evidence  
that led to the original submission of a particular sequence of  
whatever sort and creation of the GENBANK record.  That evidence is  
in the ABox, as I see it.  Then there is the record itself, which  
subsequently is referred to by many who derive their own sequence  
evidence through experiments and point to that record as the  
defining, granular type for their particular piece of evidence.  I  
know this issue has engendered a great deal of discussion amongst  
those working on the Sequence Ontology and within the GO Consortium  
and GO user community.  I can't say my view is representative of  
those discussions, and my sense is they did not necessarily reach a  
consensus opinion on the issue.  There has been significant work on  
this issue, but I don't think there is not complete agreement here of  
exactly where the boundary is between TBox and the ABox.  I'm  
speaking in the abstract here, of course.  If someone creates a set  
of assertions in a particular DL formalism representing a GENBANK  
record and all its many referents, it will be quite clear on the  
entailments of that formalism where the boundary lies.  What I'm  
saying is I don't believe there is a consensus on how one should  
formalize this issue.

Oops - gotta go, or I'll miss my 10 year old's school concert!

Cheers,
Bill


Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu
Received on Thursday, 11 January 2007 23:32:08 UTC