Re: Distributed ontology development (was Ontology entity IDs) from William Bug on 2006-07-17 (public-semweb-lifesci@w3.org from July 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Mon, 17 Jul 2006 00:36:26 -0400
To: Mark Musen <musen@Stanford.EDU>
Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, Trish Whetzel <whetzel@pcbi.upenn.edu>, Alan Rector <rector@cs.man.ac.uk>, w3c semweb hcls <public-semweb-lifesci@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>
Message-Id: <FA06DB10-F016-4368-98D2-B850DCFEAA65@DrexelMed.edu>
Many thanks for the info, Mark.  That sounds very promising.

Are you referring to the JDBC Protégé (http://protege.cim3.net/cgi- 
bin/wiki.pl?JdbcDatabaseBackend), or are there other ways of  
connecting Protégé to an RDBMS backend?

It certainly is a hurculean task to work out the O-R mapping in a way  
that is flexible enough to accommodate all the graphs someone might  
construct either in Protege-Frames or Protege-OWL, so if this is  
already implemented and working, it behooves all us who need to  
support this sort of community ontology curation re-use what's being  
constructed by SMI and/or NCI.

I'd actually been poking around the Protégé Wiki for sometime (http:// 
protege.cim3.net/) and was aware this system existed.  We've been  
discussing on the BIRN Ontology Task Force Tcons - with input from  
Daniel - how we might be able to construct such a shared system.

I'm a strong proponent for this approach.  On the BIRN Ontology Task  
Force, we'll need to label classes and attributes with various levels  
of curation status (e.g., "fully vetted", "good graph location; poor  
definition", "temporary graph location; needs to move eventually",  
etc.)  We need to be able to release "versions" of the ontology based  
on these status tags and other attributes.  Ultimately, we'll also  
want have node & edge-level unique IDs for the published version of  
the ontology (which will likely be used to create node-level URIs).   
All of this will be easier to manage from a RDBMS, than it will by  
issuing versions in CVS or SVN as is typically done.  Actually, I  
think to manage the elements down to that level will be nearly  
impossible to do within CVS/SVN.

The only problem is creating an efficient means to support this sort  
of community curation - and sharing of ontologies from other sources  
- a direct JDBC connection isn't going to work well.  They'll be  
firewall issues which I believe will add way too much to each  
individual's overhead of bringing this capability online.  When the  
group is supported by a single IT staff and working within the same  
LAN environment (including those who'd connect via VPN), this can be  
a viable approach, but outside of that, it will probably be too much  
trouble for all the folks who need access.

This is why we've been talking with Daniel about expanding the web  
version of Protégé developed in your group so as to "open" it and  
release it from the JDBC port requirements using a combination of a  
service-oriented architecture (web services) and the Java Portlet  
framework.  In our lab, we've implemented very simple WSDL web  
service response/request pairs to implement a generic SQL interface  
via web services to meet this need.  It works extremely well, even  
for fairly complicated queries and can even be used to return binary  
objects (in our case histological images) via SOAP + attachments.   
This is all running over relatively firewall friendly ports such as  
are used by HTTP and the Tomcat Java Servlet framework.

I assume when you mention the NCIT community curation this is a  
project being developed/hosted/supported by the NCI Bioinformatics  
group as a part of the caBIG project?  I know they are very committed  
to using Apache implementations of various Java specs and web-based  
architectural tools.  By any chance is the work they are doing with  
the Protége-RDBMS shared ontology environment (CODS - Collaborative  
Ontology Development Server (or Collaborative Ontology Development  
Service project)) taking this approach to make the system less  
reliant on running JDBC over the net and through firewalls?  I saw on  
one of the Protégé CODS server configuration pages ports 4020 - 4039  
were used, which again, given these do have public assignments for  
proprietary applications (http://www.iana.org/assignments/port- 
numbers) can be difficult to use, unless all contributors are being  
hosted by the same IT staff and/or are on the same LAN (even if its a  
VLAN).

Are there pages on the Protégé Wiki where more complete documentation  
discusses some of these details for the NCI CODS project?

Many thanks again for the info, Mark.

Cheers,
Bill

On Jul 16, 2006, at 12:53 AM, Mark Musen wrote:

> On Jul 10, 2006, at 11:40 PM, William Bug wrote:
>> However, there doesn't appear to be a means within the OBO/NCBO  
>> community for doing this sort of distributed ontology design right  
>> now.  Two of the tools in wide spread use - Protégé and OBO-Edit  
>> are really not designed to support distributed and shared  
>> development, such as you'd find in a typical distributed  
>> architecture - whether it be a standard client-server RDBMS-based  
>> approach, one using some "active pages" technology such as php,  
>> Zope, Ruby on Rails, Java Servlet/Portlet frameworks, etc. - or a  
>> more asynchronous approach using messaging and/or web services to  
>> assemble the required components from the various authoritative  
>> sources.
>
> Bill,
>
> I hate to sound like a salesperson, but Protégé in its multi-user  
> mode (using the relational database backend) would seem to be just  
> what you are looking for.  Protégé (both the frames and the OWL  
> facility) allow distributed users to work simultaneously on an  
> ontology stored on a remote server.  As the ontology is updated,  
> all the Protégé clients refresh automatically to display the changes.
>
> NCI currently is experimenting with this architecture for the  
> development of the NCI Thesaurus in OWL, and they have developers  
> stationed all across the country.  I'm told that Perot Systems,  
> using the frame-based representation, has nearly 100 Protégé users  
> working on the same ontology simultaneously.
>
> Mark
>
> P.S. While I'm plugging Protégé, don't forget that the Ninth Annual  
> Protégé Conference takes place at Stanford next week (see http:// 
> protege.stanford.edu/conference/2006/).
>
>

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.
Received on Monday, 17 July 2006 04:37:06 UTC