Re: Versioning vs Temporal modeling of Patient State from William Bug on 2007-01-12 (public-semweb-lifesci@w3.org from January 2007)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Fri, 12 Jan 2007 07:55:04 -0500
To: Vipul Kashyap <VKASHYAP1@PARTNERS.ORG>, Bijan Parsia <bparsia@cs.man.ac.uk>, dirk.colaert@agfa.com, Xiaoshu Wang <wangxiao@musc.edu>, w3c semweb hcls <public-semweb-lifesci@w3.org>
Message-Id: <4646F0E0-9B22-4220-8F3F-8281EF3438DD@DrexelMed.edu>
Just a quick addendum:

I would point out the issues I mention below regarding efforts to  
improved F-measure derived from human efforts don't even begin to  
address the enormous amount of relevant work from the Text Mining  
field (various approaches to improving on standard IR sparse matrix  
term representation, NLP, LSI, text summarization, text  
categorization, etc.).  There is also the work Bob Futrelle has  
brought up before regarding "hedging" and other issues that very much  
effect our ability to automatically perform KE/KR/KM on unstructured  
text.  This is not unrelated to work on "speech acts" as applied to  
formalization of clinical records, of course.

It's not completely clear to me how or whether that specific work can  
be put to use on this versioning problem.

Cheers,
Bill


On Jan 12, 2007, at 7:40 AM, William Bug wrote:

> The IFOMIS work Dirk, Kirsten, and others have cited on referent  
> tracking is definitely important work to review in this light.  I'd  
> not been familiar with the model theoretic work Bijan mentions, but  
> clearly that is important.
>
> Werner Ceusters also has a list - a Google list I believe - on  
> referent tracking.
>
> This work - and related work on "speech acts" - is most definitely  
> relevant to this discussion and very specifically is designed to  
> address ABox.  As the citations given indicate, most of this work  
> has been done in the clinical domain with a focus on patient  
> records, which was the origin of this thread and would be directly  
> relevant to the Use Case Nigam put out there.
>
> Some of that work has begun to seep into the discussions regarding  
> the sort of GENBANK issues Kei mentioned, but its still really just  
> discussion to my knowledge.  As you could tell from the way I  
> couched my description of that problem, clearly referent tracking  
> is a big part of what must be accomodated in that domain as well -  
> both in terms of the actual content and evolution of a record  in  
> GENBANK, TrEMBL, etc., as well as the many ways in which  
> researchers link to and reference such records.
>
> Also - the work I was mentioning regarding TBox focussed, highly  
> granular revisions, has been informally discussed by NCBO folks  
> including Chris Mungal, Fabian Neuhaus, Barry and others - again  
> with an eye toward providing reasoning services to support this  
> requirement of the nature of what Bijan, Dirk and others mention  
> below.  This is associated with the discussions on this topic  
> amongst both BIRN, OBI, and NCIT participants but has all been very  
> informal so far - AFAIK.
>
> One of the things I would point out regarding the metadata  
> properties I was referring to, is this was really meant to be just  
> a simple, "low hanging fruit" approach to a much more complicated  
> problem.  There was not thought given to how one would actually  
> construct automatic means to mediate reasoning on - or even just  
> representing - the evolving semantic graph.  The idea was simply -  
> many biomedical ontology development projects have begun to notice  
> the pressing need for version control which appears to be required  
> at a very granular level.  Standard source version control systems  
> - e.g., CVS, SVN, etc. - just make the problem worse in my  
> opinion.  This is where I'd differ with the point Vipul makes.   
> It's not that there are NO aspects of the software version process  
> relevant to this issue.  It's just I believe there are complex  
> issues in this domain - some of which Bijan mentioned - some of  
> which I mention below regarding application the traditional  
> approach to employing CVs for literature annotation - that extend  
> greatly beyond what the common practice in software version control  
> is intended to support.  In that domain, highly granular version  
> management has been required, and I believe something like it will  
> be required in the ontology development space as well.  Perhaps  
> that's just a qualification and rewording of the point Vipul was  
> trying to make.
>
> SKOS, as I mentioned, does try to absorb some of what has been done  
> on this issue in the A&I/library science world in relation to CV  
> application to the literature annotation process.  This has long  
> been recognized in that field as extremely important to the proper  
> curation of a CV/taxonomy/classification scheme/thesaurus.  In that  
> domain, if you step back a bit from the details and ask - what is  
> the intended purpose of a CV in that domain - the answer clearly is  
> to improve both precision and recall (F-measure from standard IR)  
> for boolean, term-based queries.  Anyone who has used MEDLINE over  
> the years has learned the utility of this approach - and its  
> limitations (the barrage of false positives and unknown number of  
> false negatives that typically still effect query results).  There  
> is no doubt just looked at empirically that having the people who  
> are annotating the literature use a CV greatly improves the F- 
> measure of the search system used to mine the resulting inverted  
> indexes.  However, I know from time working with the creators of  
> the Biological Abstracts, that it took months of training for the  
> "indexers" to get good at consistently applying CV terms - and a  
> lot of QA/QC was still needed to constantly monitor the output.   
> The reason really comes down to the lack of complete, detailed  
> definitions and lack of a formal, semantic graph really left way  
> too much leeway for indexers, even when a moderate amount of effort  
> was dedicated to incentivizing indexers.  Having said that, when  
> highly specific definitions were used, it was found indexers both  
> greatly slowed in the annotation output AND use of CV terms went  
> way down, both of which are really at odds to the intended goal of  
> the process (back to F-measure), which is to provide maximal  
> annotation given according to a CV.  Even with this work, BIOSIS  
> (publishers of the Biological Abstracts) and really all the A&I  
> vendors I knew of, still required a huge educational staff that  
> would constantly travel the world providing demos and updates to  
> librarians, so they could be kept informed on how best to use the  
> resulting CV indexes.
>
> It was still clearly an art to maximize F-measure - one that very  
> much depended on quality and structure of the CV/classification  
> scheme/taxonomy, the talents of the indexers applying the CVs in  
> the annotation process and the talents of the info. retrieval  
> experts/librarians in constructing queries. By far the most  
> confounding aspect of this process was the need to alter indexer  
> and searcher practice, as CV changes were introduced - as was of  
> course inevidible - both due to changes in the *world* and changes  
> in *knowledge representation*, as Bijan describes it below.  It was  
> partly because of this, that various CV curatorial practices were  
> developed that again are partially represented in SKOS - fields  
> such as "scope notes", "history notes", etc., which all relate to  
> the versioning issue in this context, but, of course, are designed  
> for human consumption and are not particularly useful to KE/KR  
> algorithms.
>
> My sense - as you can see in that OBI Wiki page I cited - is there  
> is a need to provide such curation support in the ontology  
> development process both to address the lexical issues as has been  
> historically done in info. science/library science, as well as to  
> address semantic graph evolution.  Both of these requirements arise  
> due both to changes in *world* and QA/QC performed on the KR  
> (changes in *knowledge*).  My sense is in providing this first  
> simple step - a shared collection of AnnotationProperties used  
> across the community when building OWL-based ontologies - we  
> provide the structure required to develop software tools to help  
> automate the process.  Nothing extending to the complexity of  
> automatic reasoning, but just something to address the need quickly  
> - a structured model for these processes, if you will, that can  
> evolve toward the more complex "referent tracking" and "speech act"  
> formalism.  This stop-gap isn't nearly enough to fully address this  
> complex issue, but it should be relatively easy to implement and to  
> put into practice (with a minimal amount of automated support for  
> ontology curators), and if done correctly, should be something that  
> can migrate to the more complex approach later.  Providing too  
> complex a strategy for addressing this versioning issue now might  
> prohibitively slow the ontology development process as it is being  
> carried out by various community biomed. ontology development  
> projects.
>
> As you can tell, this is just a suggestion which OBI, BIRNLex, and  
> a few other ontology developers have just begun to implement, so  
> this is most definitely a work-in-progress.
>
> Having a review of the topic, as Vipul suggests, at this stage in  
> the game by the several folks who've provided valuable pointers and  
> feedback, would be a wonderful idea, I think.
>
> Cheers,
> Bill
>
>
> On Jan 12, 2007, at 6:26 AM, Kashyap, Vipul wrote:
>
>>
>>
>> Is there any work in the literature related to:
>>
>> - Defining what and when a version is?
>> - Do all updates necessarily lead to a new version?
>> - Is there a utility to instance versioning?
>>
>> The observation about the utility of knowledge base update and  
>> revision is an
>> astute one. IMHO the utility of instance versioning is not clear  
>> either.
>>
>> Just my 2 cents,
>>
>> ---Vipul
>>
>>
>>> -----Original Message-----
>>> From: public-semweb-lifesci-request@w3.org [mailto:public-semweb- 
>>> lifesci-
>>> request@w3.org] On Behalf Of Bijan Parsia
>>> Sent: Friday, January 12, 2007 5:28 AM
>>> To: dirk.colaert@agfa.com
>>> Cc: wangxiao@musc.edu; 'w3c semweb hcls'; public-semweb-lifesci-
>>> request@w3.org
>>> Subject: Re: Versioning vs Temporal modeling of Patient State
>>>
>>>
>>> On Jan 12, 2007, at 9:36 AM, dirk.colaert@agfa.com wrote:
>>>
>>>> Recently I had an interesting conversation with Werner Cuesters,
>>>> professor in Bufallo and colleague of Barry Smith. He has some
>>>> theory about ontology maintenance and versioning and it considers
>>>> both "classes" and "instances". Both can change either because you
>>>> made en error, either you view on the world changed, either because
>>>> the world changed . It turns out that you can only handle changes
>>>> if you know for each change exactly what de reason of the change
>>>> was. That reason should be documented in the system.
>>> [snip]
>>>
>>> The standard lingo for this is that a change to the knowledge base
>>> due to a change in the *world* is called an *update* whereas a  
>>> change
>>> in your knowledge base due to a change in *your knowledge* of the
>>> (current static) world is called a *revision*. The locus classicus
>>> for this, IMHO, is:
>>> 	<http://citeseer.ist.psu.edu/417296.html>
>>>
>>> Following there model theoretic accounts, there is a spate of work
>>> defining reasoning services that compute the updated or revisied
>>> knowledge base given a proposed update or revision. E.g., recently:
>>> 	<http://lat.inf.tu-dresden.de/~clu/papers/archive/kr06c.pdf>
>>>
>>> The utility of model oriented revision and update for expressive
>>> logics is, IMHO, not fully established, though it is conceptually
>>> useful in my experience. There is, of course, a large chunk of work
>>> on revising (and even updating) belief *bases*, that is, attending
>>> primarily to the *asserted* set of formulae.
>>>
>>> Hope this helps.
>>>
>>> Cheers,
>>> Bijan.
>>>
>>>
>>
>>
>>
>>
>>
>> THE INFORMATION TRANSMITTED IN THIS ELECTRONIC COMMUNICATION IS  
>> INTENDED ONLY FOR THE PERSON OR ENTITY TO WHOM IT IS ADDRESSED AND  
>> MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED MATERIAL.  ANY REVIEW,  
>> RETRANSMISSION, DISSEMINATION OR OTHER USE OF OR TAKING OF ANY  
>> ACTION IN RELIANCE UPON, THIS INFORMATION BY PERSONS OR ENTITIES  
>> OTHER THAN THE INTENDED RECIPIENT IS PROHIBITED.  IF YOU RECEIVED  
>> THIS INFORMATION IN ERROR, PLEASE CONTACT THE SENDER AND THE  
>> PRIVACY OFFICER, AND PROPERLY DISPOSE OF THIS INFORMATION.
>>
>>
>>
>
> Bill Bug
> Senior Research Analyst/Ontological Engineer
>
> Laboratory for Bioimaging  & Anatomical Informatics
> www.neuroterrain.org
> Department of Neurobiology & Anatomy
> Drexel University College of Medicine
> 2900 Queen Lane
> Philadelphia, PA    19129
> 215 991 8430 (ph)
> 610 457 0443 (mobile)
> 215 843 9367 (fax)
>
>
> Please Note: I now have a new email - William.Bug@DrexelMed.edu
>
>
>
>

Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu
Received on Friday, 12 January 2007 12:55:16 UTC