Re: Distributed ontology development (was Ontology entity IDs)

Hello All:

I had done some work with Peter Karp on collaborative KB development 
which works
quite well in practice.  This work is described at:

I will be happy to discuss it more as per the requirements.

Keep Smiling!

Mark Musen wrote:

> On Jul 16, 2006, at 9:36 PM, William Bug wrote:
>> Are you referring to the JDBC Protégé 
>> (, or are 
>> there other ways of connecting Protégé to an RDBMS backend?  
> That's the JDBC backend, Bill.
>> It certainly is a hurculean task to work out the O-R mapping in a way 
>> that is flexible enough to accommodate all the graphs someone might 
>> construct either in Protege-Frames or Protege-OWL, so if this is 
>> already implemented and working, it behooves all us who need to 
>> support this sort of community ontology curation re-use what's being 
>> constructed by SMI and/or NCI.
> Yes, see
>> The only problem is creating an efficient means to support this sort 
>> of community curation - and sharing of ontologies from other sources 
>> - a direct JDBC connection isn't going to work well.  They'll be 
>> firewall issues which I believe will add way too much to each 
>> individual's overhead of bringing this capability online.  When the 
>> group is supported by a single IT staff and working within the same 
>> LAN environment (including those who'd connect via VPN), this can be 
>> a viable approach, but outside of that, it will probably be too much 
>> trouble for all the folks who need access.
> We've been putting an enormous amount of energy into enhancing the 
> performance of the thick Protégé client the past few months for 
> precisely this reason.  In supporting NCI, we indeed have to deal with 
> the significant latencies imposed by fire walls and VPNs.  The 
> enhancements we have made have led to remarkably improved database 
> performance and transaction processing.  For example, some transaction 
> times have been improved by two orders of magnitude or more.  We will 
> be migrating all these changes back into the main Protégé release over 
> the next several months.
>> This is why we've been talking with Daniel about expanding the web 
>> version of Protégé developed in your group so as to "open" it and 
>> release it from the JDBC port requirements using a combination of a 
>> service-oriented architecture (web services) and the Java Portlet 
>> framework.  In our lab, we've implemented very simple WSDL web 
>> service response/request pairs to implement a generic SQL interface 
>> via web services to meet this need.  It works extremely well, even 
>> for fairly complicated queries and can even be used to return binary 
>> objects (in our case histological images) via SOAP + attachments.  
>> This is all running over relatively firewall friendly ports such as 
>> are used by HTTP and the Tomcat Java Servlet framework.
> We'd certainly like to know more about what you are doing within 
> BIRN.  The "Web version" of Protégé, alas, was a student project, that 
> I think will need a bit more work to be stable.  We do intend to put 
> additional effort into enhancing "Web Protégé" in the next few months.
> The community should note that Stanford recently submitted a proposal 
> to the National Library of Medicine for ongoing support of the Protégé 
> resource.  One of our key objectives in the new phase of our work is 
> to engineer a true thin-client version of Protégé that adopts a 
> services-oriented architecture.  We would welcome input for the entire 
> community as we move forward with these plans, assuming that we get 
> funded to work on them.
>> I assume when you mention the NCIT community curation this is a 
>> project being developed/hosted/supported by the NCI Bioinformatics 
>> group as a part of the caBIG project? 
> Although the NCI Thesaurus is an important resource for caBIG, our 
> work with NCI predates caBIG and comes directly from the NCI Center 
> for Bioinformatics.  We work directly with the folks at NCI developing 
> the caCORE resources.
>> By any chance is the work they are doing with the Protége-RDBMS 
>> shared ontology environment (CODS - Collaborative 
>> Ontology Development Server (or Collaborative Ontology Development 
>> Service project)) taking this approach to make the system less 
>> reliant on running JDBC over the net and through firewalls?  I saw on 
>> one of the Protégé CODS server configuration pages ports 4020 - 4039 
>> were used, which again, given these do have public assignments for 
>> proprietary applications 
>> ( can be difficult to 
>> use, unless all contributors are being hosted by the same IT staff 
>> and/or are on the same LAN (even if its a VLAN).
>> Are there pages on the Protégé Wiki where more complete documentation 
>> discusses some of these details for the NCI CODS project?
> The CODS project is not supported by NCI, but rather by CIM3 
> Engineering.  The goal of CODS is to make the multi-user version of 
> Protégé publicly available so that users can experiment with creating 
> and maintaining a shared ontology library 
> (see
>> Many thanks again for the info, Mark.
> My pleasure!
>> Cheers,
>> Bill
>> On Jul 16, 2006, at 12:53 AM, Mark Musen wrote:
>>> On Jul 10, 2006, at 11:40 PM, William Bug wrote:
>>>> However, there doesn't appear to be a means within the OBO/NCBO 
>>>> community for doing this sort of distributed ontology design right 
>>>> now.  Two of the tools in wide spread use - Protégé and OBO-Edit 
>>>> are really not designed to support distributed and shared 
>>>> development, such as you'd find in a typical distributed 
>>>> architecture - whether it be a standard client-server RDBMS-based 
>>>> approach, one using some "active pages" technology such as php, 
>>>> Zope, Ruby on Rails, Java Servlet/Portlet frameworks, etc. - or a 
>>>> more asynchronous approach using messaging and/or web services to 
>>>> assemble the required components from the various authoritative 
>>>> sources.
>>> Bill,
>>> I hate to sound like a salesperson, but Protégé in its multi-user 
>>> mode (using the relational database backend) would seem to be just 
>>> what you are looking for.  Protégé (both the frames and the OWL 
>>> facility) allow distributed users to work simultaneously on an 
>>> ontology stored on a remote server.  As the ontology is updated, all 
>>> the Protégé clients refresh automatically to display the changes.
>>> NCI currently is experimenting with this architecture for the 
>>> development of the NCI Thesaurus in OWL, and they have developers 
>>> stationed all across the country.  I'm told that Perot Systems, 
>>> using the frame-based representation, has nearly 100 Protégé users 
>>> working on the same ontology simultaneously.
>>> Mark
>>> P.S. While I'm plugging Protégé, don't forget that the Ninth Annual 
>>> Protégé Conference takes place at Stanford next week (see 
>> Bill Bug
>> Senior Analyst/Ontological Engineer
>> Laboratory for Bioimaging  & Anatomical Informatics
>> Department of Neurobiology & Anatomy
>> Drexel University College of Medicine
>> 2900 Queen Lane
>> Philadelphia, PA    19129
>> 215 991 8430 (ph)
>> 610 457 0443 (mobile)
>> 215 843 9367 (fax)
>> Please Note: I now have a new email - 
>> <>
>>This email and any accompanying attachments are confidential. 
>>This information is intended solely for the use of the individual 
>>to whom it is addressed. Any review, disclosure, copying, 
>>distribution, or use of this email communication by others is strictly 
>>prohibited. If you are not the intended recipient please notify us 
>>immediately by returning this message to the sender and delete 
>>all copies. Thank you for your cooperation.

Received on Wednesday, 26 July 2006 16:06:40 UTC