Re: Distributed ontology development (was Ontology entity IDs) from Vinay Chaudhri on 2006-07-26 (public-semweb-lifesci@w3.org from July 2006)

From: Vinay Chaudhri <Vinay.Chaudhri@sri.com>
Date: Wed, 26 Jul 2006 09:00:45 -0700
To: w3c semweb hcls <public-semweb-lifesci@w3.org>
Message-ID: <44C791AD.80309@sri.com>
Hello All:

I had done some work with Peter Karp on collaborative KB development 
which works
quite well in practice.  This work is described at:

http://www.ai.sri.com/pub_list/390

I will be happy to discuss it more as per the requirements.

Keep Smiling!
Vinay.

Mark Musen wrote:

> On Jul 16, 2006, at 9:36 PM, William Bug wrote:
>
>> Are you referring to the JDBC Protégé 
>> (http://protege.cim3.net/cgi-bin/wiki.pl?JdbcDatabaseBackend), or are 
>> there other ways of connecting Protégé to an RDBMS backend?  
>
>
> That's the JDBC backend, Bill.
>
>
>> It certainly is a hurculean task to work out the O-R mapping in a way 
>> that is flexible enough to accommodate all the graphs someone might 
>> construct either in Protege-Frames or Protege-OWL, so if this is 
>> already implemented and working, it behooves all us who need to 
>> support this sort of community ontology curation re-use what's being 
>> constructed by SMI and/or NCI.
>>
>
> Yes, see http://protege.cim3.net/cgi-bin/wiki.pl?MultiUserTutorial
>
>> The only problem is creating an efficient means to support this sort 
>> of community curation - and sharing of ontologies from other sources 
>> - a direct JDBC connection isn't going to work well.  They'll be 
>> firewall issues which I believe will add way too much to each 
>> individual's overhead of bringing this capability online.  When the 
>> group is supported by a single IT staff and working within the same 
>> LAN environment (including those who'd connect via VPN), this can be 
>> a viable approach, but outside of that, it will probably be too much 
>> trouble for all the folks who need access.
>>
>
> We've been putting an enormous amount of energy into enhancing the 
> performance of the thick Protégé client the past few months for 
> precisely this reason.  In supporting NCI, we indeed have to deal with 
> the significant latencies imposed by fire walls and VPNs.  The 
> enhancements we have made have led to remarkably improved database 
> performance and transaction processing.  For example, some transaction 
> times have been improved by two orders of magnitude or more.  We will 
> be migrating all these changes back into the main Protégé release over 
> the next several months.
>
>
>> This is why we've been talking with Daniel about expanding the web 
>> version of Protégé developed in your group so as to "open" it and 
>> release it from the JDBC port requirements using a combination of a 
>> service-oriented architecture (web services) and the Java Portlet 
>> framework.  In our lab, we've implemented very simple WSDL web 
>> service response/request pairs to implement a generic SQL interface 
>> via web services to meet this need.  It works extremely well, even 
>> for fairly complicated queries and can even be used to return binary 
>> objects (in our case histological images) via SOAP + attachments.  
>> This is all running over relatively firewall friendly ports such as 
>> are used by HTTP and the Tomcat Java Servlet framework.
>>
>
> We'd certainly like to know more about what you are doing within 
> BIRN.  The "Web version" of Protégé, alas, was a student project, that 
> I think will need a bit more work to be stable.  We do intend to put 
> additional effort into enhancing "Web Protégé" in the next few months.
>
> The community should note that Stanford recently submitted a proposal 
> to the National Library of Medicine for ongoing support of the Protégé 
> resource.  One of our key objectives in the new phase of our work is 
> to engineer a true thin-client version of Protégé that adopts a 
> services-oriented architecture.  We would welcome input for the entire 
> community as we move forward with these plans, assuming that we get 
> funded to work on them.
>
>
>> I assume when you mention the NCIT community curation this is a 
>> project being developed/hosted/supported by the NCI Bioinformatics 
>> group as a part of the caBIG project? 
>
>
> Although the NCI Thesaurus is an important resource for caBIG, our 
> work with NCI predates caBIG and comes directly from the NCI Center 
> for Bioinformatics.  We work directly with the folks at NCI developing 
> the caCORE resources.
>
>
>> By any chance is the work they are doing with the Protége-RDBMS 
>> shared ontology environment (CODS - Collaborative 
>> Ontology Development Server (or Collaborative Ontology Development 
>> Service project)) taking this approach to make the system less 
>> reliant on running JDBC over the net and through firewalls?  I saw on 
>> one of the Protégé CODS server configuration pages ports 4020 - 4039 
>> were used, which again, given these do have public assignments for 
>> proprietary applications 
>> (http://www.iana.org/assignments/port-numbers) can be difficult to 
>> use, unless all contributors are being hosted by the same IT staff 
>> and/or are on the same LAN (even if its a VLAN).
>>
>> Are there pages on the Protégé Wiki where more complete documentation 
>> discusses some of these details for the NCI CODS project?
>
>
> The CODS project is not supported by NCI, but rather by CIM3 
> Engineering.  The goal of CODS is to make the multi-user version of 
> Protégé publicly available so that users can experiment with creating 
> and maintaining a shared ontology library 
> (see http://protege.cim3.net/cgi-bin/wiki.pl?CODS).
>
>
>>
>> Many thanks again for the info, Mark.
>>
>
> My pleasure!
>
>
>> Cheers,
>> Bill
>>
>> On Jul 16, 2006, at 12:53 AM, Mark Musen wrote:
>>
>>> On Jul 10, 2006, at 11:40 PM, William Bug wrote:
>>>
>>>> However, there doesn't appear to be a means within the OBO/NCBO 
>>>> community for doing this sort of distributed ontology design right 
>>>> now.  Two of the tools in wide spread use - Protégé and OBO-Edit 
>>>> are really not designed to support distributed and shared 
>>>> development, such as you'd find in a typical distributed 
>>>> architecture - whether it be a standard client-server RDBMS-based 
>>>> approach, one using some "active pages" technology such as php, 
>>>> Zope, Ruby on Rails, Java Servlet/Portlet frameworks, etc. - or a 
>>>> more asynchronous approach using messaging and/or web services to 
>>>> assemble the required components from the various authoritative 
>>>> sources.
>>>
>>>
>>> Bill,
>>>
>>> I hate to sound like a salesperson, but Protégé in its multi-user 
>>> mode (using the relational database backend) would seem to be just 
>>> what you are looking for.  Protégé (both the frames and the OWL 
>>> facility) allow distributed users to work simultaneously on an 
>>> ontology stored on a remote server.  As the ontology is updated, all 
>>> the Protégé clients refresh automatically to display the changes.
>>>
>>> NCI currently is experimenting with this architecture for the 
>>> development of the NCI Thesaurus in OWL, and they have developers 
>>> stationed all across the country.  I'm told that Perot Systems, 
>>> using the frame-based representation, has nearly 100 Protégé users 
>>> working on the same ontology simultaneously.
>>>
>>> Mark
>>>
>>> P.S. While I'm plugging Protégé, don't forget that the Ninth Annual 
>>> Protégé Conference takes place at Stanford next week (see 
>>> http://protege.stanford.edu/conference/2006/).
>>>
>>>
>>
>> Bill Bug
>> Senior Analyst/Ontological Engineer
>>
>> Laboratory for Bioimaging  & Anatomical Informatics
>> www.neuroterrain.org
>> Department of Neurobiology & Anatomy
>> Drexel University College of Medicine
>> 2900 Queen Lane
>> Philadelphia, PA    19129
>> 215 991 8430 (ph)
>> 610 457 0443 (mobile)
>> 215 843 9367 (fax)
>>
>>
>> Please Note: I now have a new email - William.Bug@DrexelMed.edu 
>> <mailto:William.Bug@DrexelMed.edu>
>>
>>
>>
>>
>>This email and any accompanying attachments are confidential. 
>>This information is intended solely for the use of the individual 
>>to whom it is addressed. Any review, disclosure, copying, 
>>distribution, or use of this email communication by others is strictly 
>>prohibited. If you are not the intended recipient please notify us 
>>immediately by returning this message to the sender and delete 
>>all copies. Thank you for your cooperation.
>>
>
Received on Wednesday, 26 July 2006 16:06:40 UTC