Re: LSIDs and ontology segmentation from Chimezie Ogbuji on 2006-07-13 (public-semweb-lifesci@w3.org from July 2006)

From: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
Date: Thu, 13 Jul 2006 14:43:40 -0400 (EDT)
To: Mark Wilkinson <markw@illuminae.com>
cc: public-semweb-lifesci@w3.org
Message-ID: <Pine.GSO.4.60.0607131359220.11590@joplin.bio.ri.ccf.org>
> In a publication that will be available soon [1] we (briefly) discuss
> the problem of actually *using* the currently available ontologies in a
> "real" Semantic Web setting - i.e. dynamically downloading whatever
> ontologies are necessary given the predicates that you find in some
> discovered RDF instance document.
>  The OWL representation of GO is over
> 10 Meg... for heavens sake!... and GO is a small ontology compared to
> things like the NCI Metathesaurus.
>
> The problem with using document#fragment URLs to identify ontology nodes
> is that the defined behaviour for resolving such an identifier is to
> drop the fragment (since that isn't available server-side anyway) and to
> return the entire document... all 10Meg's of GO... each time...  We
> would argue, therefore, that the URL (if you adopt its default
> behaviour) is not only a bit of a nuisance, it is a blocker in some/many
> cases.

I don't think this particular case has much to do with URLs themselves but 
as to how an ontology author wishes to distribute his/her ontology.  The 
behavior you mention is only the case if the ontology terms are URLs - 
i.e., they are locators as well as identifiers.  Even for ontologies of 
small size, I would consider this a bad practice for ontology 
distribution.  There are many consequences for resolving terms from an 
ontology out of context, the primary one being that in doing so you may 
not have enough closure to faciliate reasoning.

Automatically attempting to dereference vocabulary terms in an instance 
graph in order to tie them in with their defining ontology is one of many options. 
In an earlier thread, it's been pointed out that more 'controlled' 
mechanisms can be used to do this.  For one thing interpreting a Semantic 
Web in this way this assumes that the terms are URLs specifically - which
is not practical (for reasons you've pointed out as well as the issues 
with reasoning).

I would think that an author of an ontology of this size 
would want to consider fragmenting the ontology (perhaps by 
sub-domains) and linking them with owl:imports.  In such a scenario, the 
terms could simply be identifiers asserted within each ontology fragment 
and only the ontology fragments would need URLs for dynamic resolution.

>
> There's been some exciting work in the domain of ontology segmentation
> [2,3,4,5] that, we believe, is perhaps a more rational way of working
> with these massive ontologies when you need to get on-the-fly access to
> only the portions of the ontology that are relevant to your Blackberry's
> agent at that moment.

I think the combination of fragmenting ontologies using terms 
that were meant to suite this purpose as well as more controlled mechanisms for 
calculating web closure address this issue.

>  I know that others (e.g. Damian Gessler and
> collaborators at NCGR, but I don't have the reference to his submitted
> manuscript at hand right now... sorry Damian!) are also working on the
> problem of segmentation by passing a self-inflating "flattened" ontology
> fragment.  The problem is that there is no Semantic Web-style protocol
> available to specify that this is the behaviour you want, or for the
> agent to know that this is the behaviour to expect.

I'm curious about how your thoughts on:

http://esw.w3.org/topic/HCLS/WebClosureSocialConvention

> Here is where I think the LSID could really shine!  Unlike a URL, the
> LSID does not have to return an entire document in response to a
> getMetaData call.  Thus, if an LSID were used as the identifier for an
> ontology node, the behaviour of the getMetadata call could be, by
> convention or by standard, to return only the relevant ontology
> fragment, where that fragment was generated by e.g. the Rector
> Segmentation generator in the background.

Determining such a fragment depends heavily on relationships between 
terms as well as  decidability / complexity issues (some ontologies 
specifically partition out parts that would cause the ontologies to be 
OWL-full).  Issues such as these are best addressed by the author of an 
ontology directly and there are existing tools for doing so - just a lack 
of any protocol to guide agents.

I'm not familiar with the full mechanics of LSID resolution, but it sounds 
to me like what you suggest could be the behavior for calling getMetadata 
on terms in an ontology can be addressed by distributing fragments of an 
ontology (grouped logically or for by levels of complexity - 
OWL-DL/OWL-Lite, etc..), ontology linking terms, and a set of protocols 
for 'guided' web closure that agents can follow.

Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org


>
>
> [1] Good, B, Wilkinson, M. (in press). The Life Sciences Semantic Web is
> Full of Creeps!  Briefings in Bioinformatics.
> [2] Noy, N, Musen, M. Specifying Ontology Views by Traversal. 2004.
> [3] Alani, H, Harris, S, O'Neil, B. Ontology Winnowing: A Case Study on
> the AKT Reference Ontology. 2005.
> [4] Seidenberg, J, Rector, A (2006), 'Web Ontology Segmentation:
> Analysis, Classification and Use', World Wide Web, ACM, Edinburgh,
> Scotland.
> [5] Stuckenschmidt, H, Klein, M. Structure-Based Partitioning of Large
> Concept Hierarchies. 2004.
>
>
>
>
> --
> Mark Wilkinson
> Asst. Professor, Dept. of Medical Genetics
> University of British Columbia
> PI in Bioinformatics, iCAPTURE Centre
> St. Paul's Hospital, Rm. 166, 1081 Burrard St.
> Vancouver, BC, V6Z 1Y6
> tel: 604 682 2344 x62129
> fax: 604 806 9274
>
> "Since the point of a definition is to explain the meaning of a term to
>   someone who is unfamiliar with its proper application, the use of
> language that doesn't help such a person learn how to apply the term is
> pointless. Thus, "happiness is a warm puppy" may be a lovely thought,
>                     but it is a lousy definition."
>                                                                Köhler et al, 2006
>
>
>
Received on Thursday, 13 July 2006 18:43:56 UTC