- From: William Bug <William.Bug@DrexelMed.edu>
- Date: Wed, 14 Jun 2006 11:02:39 -0400
- To: kc28 <kei.cheung@yale.edu>
- Cc: John Rumble <jumbleusa@earthlink.net>, Phillip Lord <phillip.lord@newcastle.ac.uk>, Eric Neumann <eneumann@teranode.com>, w3c semweb hcls <public-semweb-lifesci@w3.org>, Jack Park <jack.park@sri.com>
By all means - versioning is crucial - and all knowledge maps/ association files/annotations referencing nodes in an ontology MUST include the version number. For an example of how biomed. ontology curators deal with the issue of versioning, see the Gene Ontology Consortium web site pages describing their SOP on this issue: http://www.geneontology.org/GO.usage.shtml#obsoletions see "Obsoleting Terms" and "Merges, Splits, Movements" All of the OBO Foundary ontologies are set up in a source control system, have an official "release" policy, and associated mailing- lists to request changes/corrections and announce new releases. Generally as is the case with evolving software API & format specs, new versions are backward compatible - e.g., annotations citing terms/ concepts/entities/nodes as they existed in a previous graph, can be resolved in the more recent versions. However, since we are talking about "theories of reality" here - and as many have pointed out, our descriptions of reality evolve in often non-monotonic ways, the mapping across versions from a node in one version to the "equivalent" node(s) in other versions may be a far from trivial process. Sometimes the mappings can be giving using DL rules, others simply require a deterministic look-up table. The curation process of deciding how to "migrate" nodes as changes/ corrections are required can be quite complex, as can be seen when you review what GO curators are required to do to keep knowledge maps/ association files current when they originally referenced nodes in prior versions of GO (see refs above). I realize this may sound hideously complex, very labor intensive, and "fragile", but the process actually works. Here, too, I think its important to remember what the original requirements are for a given knowledge resource. In the case of GO, the core curation process has focussed on mapping occurrences of specific biomolecular and subcellular entities as they occur in the literature. A significant portion of the GO curation process still revolves around explicitly tracking entity occurrences in the literature. Of course, a whole slew of powerful tools and valuable research has grown around GO - especially as its formal specificity has improved over the last 5 years or so, many of which are designed to use GO to organize/pool/analyze primary research data, as opposed to focusing on it's "representation" in the literature. I think this is where ontology practicing is most likely to provide the greatest benefit in the coming decade - as applied to primary data repositories. It is here, too, where Semantic Web technologies are most likely to relevant and provide a powerful, flexible formalism for representing semantic info associated with scientific observations - with explicit links to various knowledge resources across the formal semantic spectrum (from flat term lists through, thorough, computable and relatively complete theories of reality). The following two threads of activity in biomedical KR are important to understand as related, yet distinct threads of activity: 1) KR applied to existing descriptions of research data: From repositories of primary data such as GENBANK and GEO on through the highly reduced representations found in the STM literature. Analysis of the semantic and lexical content of the later have been going on since the 1940s & 1950s (at least in the info/library science fields) and more recently (since the 1960s) in the converging C.S./Linguistics fields (e.g., Comp. Linguistics and Info Retrieval). Only in the last 15 years have ontologies played any significant role in these pursuits. TextPresso - the text mining framework recommended by the model organism database consoritum (http://www.gmod.org/home & http://www.textpresso.org/) is a good example of this approach coming from the bioinformatics community, but there are other examples using much more powerful Comp. Linguistic techniques. 2) Use in creating NEW descriptions of primary data: Here, ontologies along with SW tech and other KR tools (such as the Topic Maps Reference Model (TMRM) Jack Park and his colleagues at SRI are working on) and C.S. techniques for federating inter-related data repositories can be combined to transform our ability to compute across large swarths of data. In this case, the first digital representation of research data derives from a formally sound, computable framework. It is this latter approach, combined with the armamentarium of informatics tools accumulated over the last 30 years from various fields, that will bring the bulk of biomedical researchers forward from the still 19th approach to forcing all contributions to the evolving biomed knowledge base to pass through a human brain for knowledge extraction to one where human cognitive capacity is truly being augmented via automation (in the sense espoused by Doug Englebart and Vanevar Bush) and all new scientific descriptions can be automatically analyzed in the context of all relevant prior knowledge. I consider this transformation very much like the that has taken place over the last 30 years to augment our tools for observation (automated, high throughput sequencing; molecular imaging and all forms of microscopy; microarrays; etc.) I think some of the disagreement/confusion on the topic of the accuracy and effectiveness of biomedical ontologies derives from collapsing these two approaches to KR, which though highly inter- related, bring with them distinct approaches, limits, caveats, and capabilities. Just my $0.02. Cheers, Bill On Jun 13, 2006, at 10:00 PM, kc28 wrote: > > This brings up an interesting issue -- how ontological evolution > would impact mapping or integration of overlapping ontologies. I > believe it's quite a research challenge. We might need to > incorporate the notion of versioning into the ontological > structure. For example, what versions of the protein classes/ > instances can be mapped between two ontologies. Just my two-cent > thought. > > Cheers, > > -Kei > > John Rumble wrote: > >> An unwritten rule about higher level ontologies is that they >> reflect our knowledge today, not tomorrow. As knowledge evolves, >> the upper level ontologies, especially, must also evolve. The >> example of the concept "protein" is very apropos here. We can view >> it from functional, structural, integrative angles, and I am sure >> there are a bunch more. Then think about how our "concept" of a >> protein in each of those views has evolved over the last 10 years, >> 20 years, 75 years. The problem is evident. >> At whatever level an ontology is developed, someone smarter or >> with more insight or standing on the shoulder of giants will use >> that onotlogy as a building block for a new and better higher >> level view of nature. We have not reached the end of science yet. >> In my days of leading similar standards developments, some of the >> best progress we made was when we banned discussions of (1) higher- >> level ontologies (though we called them something else back in >> those old days) and (2) acronyms. >> For those of you who have requested more references on my >> previous e-mail about experiment description, it will have to wait >> a few more days. Unfortunately bioinformatics have not solved my >> kidney stone issues, which severely limit my ability to pull the >> requested information together. >> John >> Dr. John Rumble >> Technical Director >> Information International Associates >> Oak Ridge TN >> www.infointl.com <http://www.infointl.com> >> jrumble@iiaweb.com <mailto:jrumble@iiaweb.com> >> jumbleusa@earthlink.net <mailto:jumbleusa@earthlink.net> >> 301 963 7903 (Home Office) >> 301 502 5729 (Cell) >> 865 298 1251 (Oak Ridge Office) > > > Bill Bug Senior Analyst/Ontological Engineer Laboratory for Bioimaging & Anatomical Informatics www.neuroterrain.org Department of Neurobiology & Anatomy Drexel University College of Medicine 2900 Queen Lane Philadelphia, PA 19129 215 991 8430 (ph) 610 457 0443 (mobile) 215 843 9367 (fax) Please Note: I now have a new email - William.Bug@DrexelMed.edu This email and any accompanying attachments are confidential. This information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this email communication by others is strictly prohibited. If you are not the intended recipient please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Wednesday, 14 June 2006 15:03:07 UTC