Ontology entity IDs from William Bug on 2006-07-11 (public-semweb-lifesci@w3.org from July 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Tue, 11 Jul 2006 02:40:27 -0400
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: Trish Whetzel <whetzel@pcbi.upenn.edu>, Alan Rector <rector@cs.man.ac.uk>, w3c semweb hcls <public-semweb-lifesci@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>
Message-Id: <444832A0-A05A-4FF5-A42C-77FD87D2D3FB@DrexelMed.edu>
Hi Trish,

I too would be interested in hearing more about what Chris M. has  
been doing with alphanumeric IDs in translating between OBO format &  
OWL.

As I've mentioned earlier, I'm more comfortable with the sort of URI  
Alan presents below, than one where term strings are used as IDs.  In  
essence, the string 'GO' becomes a namespace ID, even if XML  
namespaces are not being explicitly involved here.

In LSID-speak, I suppose the 'GO' would be the AuthorityNamespaceID  
of the Namespace Specific String, '0000001' would be the ObjectID,  
and it would be followed by any available RevisionID.

The reason I care about this is on the BIRN project, we are expecting  
to follow the OBO Foundry/NCBO recommendations of using a shared set  
of foundational ontological entities - and shared relations as per  
the growing relations defined in the OBO Relation ontology in order  
to construct the more complex domain ontology we need within BIRN.  I  
wouldn't call it an application ontology per se, though we will  
eventually be building those too, but we will also have the need to  
add granularity & additional branches to some of the existing OBO  
Foundry ontologies just to create a core 'is_a' graph.  For instance,  
we will have many instruments we need to define not currently in  
FuGO, nor are they currently within the immediate scope of FuGO -  
e.g., fMRI, MRM, EM, LSCM, etc. - lots of imaging techniques,  
basically where device settings, specimen/subject preparation, and  
the details of image provenance will be critical for performing large- 
scale, meta-analysis across the entire repository of data in BIRN.

What does this have to do with IDs?

Already, as I mentioned, we are trying to use the OBO Foundry  
approach which requires re-use of entities from other ontologies, and  
- as mentioned above - use of a shared, upper level ontologies (e.g.,  
BFO/UBO), including an ontology of relations (OBO RO).  By  
definition, this implies the ontology we build will have to reference  
entities from other ontologies external to the BIRN ontology.  As  
someone with extensive experience in the design and implementation of  
RDBMS repositories, OO frameworks, and distributed computing systems  
such as those built using web services, my penchant is to simply  
refer to those nodes in other ontological graphs, and not "hard code"  
any of the artifacts themselves in the BIRN ontology.  Since we're  
using OWL right now to build the more fundamental, subsumption  
hierarchy of classes, there are facilities in RDF/XML (e.g.,  
namespaces, and URIs) which allow for making references to an  
external resource or collection of entities.

My concern is there doesn't appear to be ubiquitously accepted way to  
do this sort of distributed OWL-based ontology development.  Liju Fan  
(Ontology Works, LLC) who participates on the FuGO efforts had  
pointed me to TopBraid Composer (http://www.topbraidcomposer.com/) as  
a tool that can support this sort of activity.  Built on the Eclipse  
platform using the Jena RDF API & Pellet OWL Reasoner, and with  
Holger Knublauch of Protégé-OWL fame as the Product Technical  
Director, this looks very promising, and I certainly intend to use it  
next time I work on the BIRNLex knowledge resource we are building  
according to the above principles.  I've not had a chance to work on  
this since last week when Liju pointed me to the tool, so I don't  
know whether it will fit all the requirements, but it does sound very  
promising.

Unfortunately, at ~$1000/seat (when purchased in qty 10), this is  
completely impractical for the bulk of the work we need to do on BIRN  
and other shared, neuroinformatics projects.

However, there doesn't appear to be a means within the OBO/NCBO  
community for doing this sort of distributed ontology design right  
now.  Two of the tools in wide spread use - Protégé and OBO-Edit are  
really not designed to support distributed and shared development,  
such as you'd find in a typical distributed architecture - whether it  
be a standard client-server RDBMS-based approach, one using some  
"active pages" technology such as php, Zope, Ruby on Rails, Java  
Servlet/Portlet frameworks, etc. - or a more asynchronous approach  
using messaging and/or web services to assemble the required  
components from the various authoritative sources.  There is a web- 
based version of Protégé that is slowly moving forward, and OBO Edit  
has a powerful, modular data adapter approach to accessing ontology  
content, but this sort of distributed development of a complex  
ontology (e.g., one composed of sub-graphs from different sources) is  
clearly not the norm right now.

Absent an effective technical solution to this problem right now, my  
feeling is the easiest solution is to do what we are currently doing  
with BIRNLex:
	1) Import the pieces you need from elsewhere into your OWL file;
	2) Use 'source' and 'version' properties to clearly state from where  
and when you derived those entities;
	3) Import all the associated properties - especially definitions;
	4) Use alphanumeric IDs concatenating a source acronym - e.g. FUGO,  
UBO, CHEBI, GO, BIRN - with the unique ID from that source (which  
will preferably be an integer that is unique within the namespace of  
that source).

As I state above, this is not preferred, but this seems like a  
workable solution for the time being.   I do expect the software  
tools supporting ontology development will have to address these  
requirements in time given the OBO Foundry Principles - and the  
general call across the field to encourage re-use and references to  
shared upper & middle level ontologies.  As I said, I realize RDF  
with its intrinsic use of URIs can support this sort of distributed  
approach to ontology construction, but the tools I've seen so far -  
primarily OBO-Edit & Protégé are not quite up to it yet.  I'm also  
convinced once we move to this paradigm for developing ontologies,  
VCS systems like CVS & SVN can be replaced with system capable of  
more efficiently managing ontology entity version control.  As much  
as I'm convinced my life would be hell without SVN to manage all the  
code in our lab - and despite these systems having proven to be  
extremely useful in supporting community ontology development so far,  
I think they are far from ideal. (Can you say "diff"?)

How does the plan above sound to other folks - a reasonable  
compromise for now - or a recipe for disaster?

I'd really appreciate input from others on this topic.  I'm  
especially interested to know whether the issues Chris M. is  
addressing as referred to below by Trish are in any way related to  
the issues I describe here.

Cheers,
Bill


On Jul 10, 2006, at 11:28 PM, Alan Ruttenberg wrote:

>
> Hi Trish,
>
> What was the specifics of the argument for alphanumeric versus  
> numeric identifiers?
>
> If you check out the go-format list I recently sent some examples  
> that use identifiers of the form
>
> http://www.bioontologies.org/2006/02/obo/GO#0000001
>
> Details are in http://sourceforge.net/mailarchive/message.php? 
> msg_id=24431577
>
> BTW, all of them are alphanumeric in the sense that they are URIs.  
> But a little care needs to be taken because of  qnames, etc. used  
> in xml. Nothing that can't be worked around in a reasonable manner.
>
> Regards,
> Alan
>
> On Jul 10, 2006, at 12:23 PM, Trish Whetzel wrote:
>
>> As one note, I wanted to mention that it seems as though  
>> alphanumeric versus solely numeric identifiers would be preferred  
>> based on viewing preliminary work by Chris Mungall in efforts to  
>> translate OBO format ontologies to OWL.
>>
>> Trish
>
>

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.
Received on Tuesday, 11 July 2006 06:40:56 UTC