- From: William Bug <William.Bug@DrexelMed.edu>
- Date: Tue, 11 Jul 2006 02:40:27 -0400
- To: Alan Ruttenberg <alanruttenberg@gmail.com>
- Cc: Trish Whetzel <whetzel@pcbi.upenn.edu>, Alan Rector <rector@cs.man.ac.uk>, w3c semweb hcls <public-semweb-lifesci@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>
- Message-Id: <444832A0-A05A-4FF5-A42C-77FD87D2D3FB@DrexelMed.edu>
Hi Trish, I too would be interested in hearing more about what Chris M. has been doing with alphanumeric IDs in translating between OBO format & OWL. As I've mentioned earlier, I'm more comfortable with the sort of URI Alan presents below, than one where term strings are used as IDs. In essence, the string 'GO' becomes a namespace ID, even if XML namespaces are not being explicitly involved here. In LSID-speak, I suppose the 'GO' would be the AuthorityNamespaceID of the Namespace Specific String, '0000001' would be the ObjectID, and it would be followed by any available RevisionID. The reason I care about this is on the BIRN project, we are expecting to follow the OBO Foundry/NCBO recommendations of using a shared set of foundational ontological entities - and shared relations as per the growing relations defined in the OBO Relation ontology in order to construct the more complex domain ontology we need within BIRN. I wouldn't call it an application ontology per se, though we will eventually be building those too, but we will also have the need to add granularity & additional branches to some of the existing OBO Foundry ontologies just to create a core 'is_a' graph. For instance, we will have many instruments we need to define not currently in FuGO, nor are they currently within the immediate scope of FuGO - e.g., fMRI, MRM, EM, LSCM, etc. - lots of imaging techniques, basically where device settings, specimen/subject preparation, and the details of image provenance will be critical for performing large- scale, meta-analysis across the entire repository of data in BIRN. What does this have to do with IDs? Already, as I mentioned, we are trying to use the OBO Foundry approach which requires re-use of entities from other ontologies, and - as mentioned above - use of a shared, upper level ontologies (e.g., BFO/UBO), including an ontology of relations (OBO RO). By definition, this implies the ontology we build will have to reference entities from other ontologies external to the BIRN ontology. As someone with extensive experience in the design and implementation of RDBMS repositories, OO frameworks, and distributed computing systems such as those built using web services, my penchant is to simply refer to those nodes in other ontological graphs, and not "hard code" any of the artifacts themselves in the BIRN ontology. Since we're using OWL right now to build the more fundamental, subsumption hierarchy of classes, there are facilities in RDF/XML (e.g., namespaces, and URIs) which allow for making references to an external resource or collection of entities. My concern is there doesn't appear to be ubiquitously accepted way to do this sort of distributed OWL-based ontology development. Liju Fan (Ontology Works, LLC) who participates on the FuGO efforts had pointed me to TopBraid Composer (http://www.topbraidcomposer.com/) as a tool that can support this sort of activity. Built on the Eclipse platform using the Jena RDF API & Pellet OWL Reasoner, and with Holger Knublauch of Protégé-OWL fame as the Product Technical Director, this looks very promising, and I certainly intend to use it next time I work on the BIRNLex knowledge resource we are building according to the above principles. I've not had a chance to work on this since last week when Liju pointed me to the tool, so I don't know whether it will fit all the requirements, but it does sound very promising. Unfortunately, at ~$1000/seat (when purchased in qty 10), this is completely impractical for the bulk of the work we need to do on BIRN and other shared, neuroinformatics projects. However, there doesn't appear to be a means within the OBO/NCBO community for doing this sort of distributed ontology design right now. Two of the tools in wide spread use - Protégé and OBO-Edit are really not designed to support distributed and shared development, such as you'd find in a typical distributed architecture - whether it be a standard client-server RDBMS-based approach, one using some "active pages" technology such as php, Zope, Ruby on Rails, Java Servlet/Portlet frameworks, etc. - or a more asynchronous approach using messaging and/or web services to assemble the required components from the various authoritative sources. There is a web- based version of Protégé that is slowly moving forward, and OBO Edit has a powerful, modular data adapter approach to accessing ontology content, but this sort of distributed development of a complex ontology (e.g., one composed of sub-graphs from different sources) is clearly not the norm right now. Absent an effective technical solution to this problem right now, my feeling is the easiest solution is to do what we are currently doing with BIRNLex: 1) Import the pieces you need from elsewhere into your OWL file; 2) Use 'source' and 'version' properties to clearly state from where and when you derived those entities; 3) Import all the associated properties - especially definitions; 4) Use alphanumeric IDs concatenating a source acronym - e.g. FUGO, UBO, CHEBI, GO, BIRN - with the unique ID from that source (which will preferably be an integer that is unique within the namespace of that source). As I state above, this is not preferred, but this seems like a workable solution for the time being. I do expect the software tools supporting ontology development will have to address these requirements in time given the OBO Foundry Principles - and the general call across the field to encourage re-use and references to shared upper & middle level ontologies. As I said, I realize RDF with its intrinsic use of URIs can support this sort of distributed approach to ontology construction, but the tools I've seen so far - primarily OBO-Edit & Protégé are not quite up to it yet. I'm also convinced once we move to this paradigm for developing ontologies, VCS systems like CVS & SVN can be replaced with system capable of more efficiently managing ontology entity version control. As much as I'm convinced my life would be hell without SVN to manage all the code in our lab - and despite these systems having proven to be extremely useful in supporting community ontology development so far, I think they are far from ideal. (Can you say "diff"?) How does the plan above sound to other folks - a reasonable compromise for now - or a recipe for disaster? I'd really appreciate input from others on this topic. I'm especially interested to know whether the issues Chris M. is addressing as referred to below by Trish are in any way related to the issues I describe here. Cheers, Bill On Jul 10, 2006, at 11:28 PM, Alan Ruttenberg wrote: > > Hi Trish, > > What was the specifics of the argument for alphanumeric versus > numeric identifiers? > > If you check out the go-format list I recently sent some examples > that use identifiers of the form > > http://www.bioontologies.org/2006/02/obo/GO#0000001 > > Details are in http://sourceforge.net/mailarchive/message.php? > msg_id=24431577 > > BTW, all of them are alphanumeric in the sense that they are URIs. > But a little care needs to be taken because of qnames, etc. used > in xml. Nothing that can't be worked around in a reasonable manner. > > Regards, > Alan > > On Jul 10, 2006, at 12:23 PM, Trish Whetzel wrote: > >> As one note, I wanted to mention that it seems as though >> alphanumeric versus solely numeric identifiers would be preferred >> based on viewing preliminary work by Chris Mungall in efforts to >> translate OBO format ontologies to OWL. >> >> Trish > > Bill Bug Senior Analyst/Ontological Engineer Laboratory for Bioimaging & Anatomical Informatics www.neuroterrain.org Department of Neurobiology & Anatomy Drexel University College of Medicine 2900 Queen Lane Philadelphia, PA 19129 215 991 8430 (ph) 610 457 0443 (mobile) 215 843 9367 (fax) Please Note: I now have a new email - William.Bug@DrexelMed.edu This email and any accompanying attachments are confidential. This information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this email communication by others is strictly prohibited. If you are not the intended recipient please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Tuesday, 11 July 2006 06:40:56 UTC