W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > July 2006

Re: Distributed ontology development (was Ontology entity IDs)

From: Mark Musen <musen@stanford.edu>
Date: Mon, 17 Jul 2006 03:57:03 -0700
Message-Id: <73D73E67-8B39-49C8-893D-E0F31AFCCBE9@Stanford.EDU>
Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, Trish Whetzel <whetzel@pcbi.upenn.edu>, Alan Rector <rector@cs.man.ac.uk>, w3c semweb hcls <public-semweb-lifesci@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>
To: William Bug <William.Bug@DrexelMed.edu>
On Jul 16, 2006, at 9:36 PM, William Bug wrote:
> Are you referring to the JDBC Protégé (http://protege.cim3.net/cgi- 
> bin/wiki.pl?JdbcDatabaseBackend), or are there other ways of  
> connecting Protégé to an RDBMS backend?

That's the JDBC backend, Bill.

> It certainly is a hurculean task to work out the O-R mapping in a  
> way that is flexible enough to accommodate all the graphs someone  
> might construct either in Protege-Frames or Protege-OWL, so if this  
> is already implemented and working, it behooves all us who need to  
> support this sort of community ontology curation re-use what's  
> being constructed by SMI and/or NCI.

Yes, see http://protege.cim3.net/cgi-bin/wiki.pl?MultiUserTutorial

> The only problem is creating an efficient means to support this  
> sort of community curation - and sharing of ontologies from other  
> sources - a direct JDBC connection isn't going to work well.   
> They'll be firewall issues which I believe will add way too much to  
> each individual's overhead of bringing this capability online.   
> When the group is supported by a single IT staff and working within  
> the same LAN environment (including those who'd connect via VPN),  
> this can be a viable approach, but outside of that, it will  
> probably be too much trouble for all the folks who need access.

We've been putting an enormous amount of energy into enhancing the  
performance of the thick Protégé client the past few months for  
precisely this reason.  In supporting NCI, we indeed have to deal  
with the significant latencies imposed by fire walls and VPNs.  The  
enhancements we have made have led to remarkably improved database  
performance and transaction processing.  For example, some  
transaction times have been improved by two orders of magnitude or  
more.  We will be migrating all these changes back into the main  
Protégé release over the next several months.

> This is why we've been talking with Daniel about expanding the web  
> version of Protégé developed in your group so as to "open" it and  
> release it from the JDBC port requirements using a combination of a  
> service-oriented architecture (web services) and the Java Portlet  
> framework.  In our lab, we've implemented very simple WSDL web  
> service response/request pairs to implement a generic SQL interface  
> via web services to meet this need.  It works extremely well, even  
> for fairly complicated queries and can even be used to return  
> binary objects (in our case histological images) via SOAP +  
> attachments.  This is all running over relatively firewall friendly  
> ports such as are used by HTTP and the Tomcat Java Servlet framework.

We'd certainly like to know more about what you are doing within  
BIRN.  The "Web version" of Protégé, alas, was a student project,  
that I think will need a bit more work to be stable.  We do intend to  
put additional effort into enhancing "Web Protégé" in the next few  

The community should note that Stanford recently submitted a proposal  
to the National Library of Medicine for ongoing support of the  
Protégé resource.  One of our key objectives in the new phase of our  
work is to engineer a true thin-client version of Protégé that adopts  
a services-oriented architecture.  We would welcome input for the  
entire community as we move forward with these plans, assuming that  
we get funded to work on them.

> I assume when you mention the NCIT community curation this is a  
> project being developed/hosted/supported by the NCI Bioinformatics  
> group as a part of the caBIG project?

Although the NCI Thesaurus is an important resource for caBIG, our  
work with NCI predates caBIG and comes directly from the NCI Center  
for Bioinformatics.  We work directly with the folks at NCI  
developing the caCORE resources.

> By any chance is the work they are doing with the Protége-RDBMS  
> shared ontology environment (CODS - Collaborative Ontology  
> Development Server (or Collaborative Ontology Development Service  
> project)) taking this approach to make the system less reliant on  
> running JDBC over the net and through firewalls?  I saw on one of  
> the Protégé CODS server configuration pages ports 4020 - 4039 were  
> used, which again, given these do have public assignments for  
> proprietary applications (http://www.iana.org/assignments/port- 
> numbers) can be difficult to use, unless all contributors are being  
> hosted by the same IT staff and/or are on the same LAN (even if its  
> a VLAN).
> Are there pages on the Protégé Wiki where more complete  
> documentation discusses some of these details for the NCI CODS  
> project?

The CODS project is not supported by NCI, but rather by CIM3  
Engineering.  The goal of CODS is to make the multi-user version of  
Protégé publicly available so that users can experiment with creating  
and maintaining a shared ontology library (see http:// 

> Many thanks again for the info, Mark.

My pleasure!

> Cheers,
> Bill
> On Jul 16, 2006, at 12:53 AM, Mark Musen wrote:
>> On Jul 10, 2006, at 11:40 PM, William Bug wrote:
>>> However, there doesn't appear to be a means within the OBO/NCBO  
>>> community for doing this sort of distributed ontology design  
>>> right now.  Two of the tools in wide spread use - Protégé and OBO- 
>>> Edit are really not designed to support distributed and shared  
>>> development, such as you'd find in a typical distributed  
>>> architecture - whether it be a standard client-server RDBMS-based  
>>> approach, one using some "active pages" technology such as php,  
>>> Zope, Ruby on Rails, Java Servlet/Portlet frameworks, etc. - or a  
>>> more asynchronous approach using messaging and/or web services to  
>>> assemble the required components from the various authoritative  
>>> sources.
>> Bill,
>> I hate to sound like a salesperson, but Protégé in its multi-user  
>> mode (using the relational database backend) would seem to be just  
>> what you are looking for.  Protégé (both the frames and the OWL  
>> facility) allow distributed users to work simultaneously on an  
>> ontology stored on a remote server.  As the ontology is updated,  
>> all the Protégé clients refresh automatically to display the changes.
>> NCI currently is experimenting with this architecture for the  
>> development of the NCI Thesaurus in OWL, and they have developers  
>> stationed all across the country.  I'm told that Perot Systems,  
>> using the frame-based representation, has nearly 100 Protégé users  
>> working on the same ontology simultaneously.
>> Mark
>> P.S. While I'm plugging Protégé, don't forget that the Ninth  
>> Annual Protégé Conference takes place at Stanford next week (see  
>> http://protege.stanford.edu/conference/2006/).
> Bill Bug
> Senior Analyst/Ontological Engineer
> Laboratory for Bioimaging  & Anatomical Informatics
> www.neuroterrain.org
> Department of Neurobiology & Anatomy
> Drexel University College of Medicine
> 2900 Queen Lane
> Philadelphia, PA    19129
> 215 991 8430 (ph)
> 610 457 0443 (mobile)
> 215 843 9367 (fax)
> Please Note: I now have a new email - William.Bug@DrexelMed.edu
> This email and any accompanying attachments are confidential.
> This information is intended solely for the use of the individual
> to whom it is addressed. Any review, disclosure, copying,
> distribution, or use of this email communication by others is strictly
> prohibited. If you are not the intended recipient please notify us
> immediately by returning this message to the sender and delete
> all copies. Thank you for your cooperation.

Received on Monday, 17 July 2006 10:59:02 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:20:17 UTC