RE: LSIDs and ontology segmentation from Miller, Michael D (Rosetta) on 2006-07-13 (public-semweb-lifesci@w3.org from July 2006)

From: Miller, Michael D (Rosetta) <Michael_Miller@Rosettabio.com>
Date: Thu, 13 Jul 2006 12:55:18 -0700
To: "Chimezie Ogbuji" <ogbujic@bio.ri.ccf.org>, "Mark Wilkinson" <markw@illuminae.com>
cc: public-semweb-lifesci@w3.org
Message-ID: <E1G17Ha-0004bs-RH@maggie.w3.org>
Hi All,

> I would think that an author of an ontology of this size 
> would want to consider fragmenting the ontology (perhaps by 
> sub-domains) and linking them with owl:imports.  In such a 
> scenario, the 
> terms could simply be identifiers asserted within each 
> ontology fragment 
> and only the ontology fragments would need URLs for dynamic 
> resolution.

My impression of the GO ontology (the example given) is that it can definitely be divided into three partitions, Molecular Function, Biological Process and Cellular Component, but beyond that, any partioning would be entirely arbitrary.  It and the Taxon ontology are essentially a DAG and a simple Tree respectively, so the only things of interest for the huge majority of current use cases is traversing these paths, which can be easily done by incremental fetches given a starting term from a gene without reading in the entire ontologies.

The other point I would want to make is that many times one doesn't want to do any reasoning, the search is simply for objects that have been annotated with individuals that are a subclass of a class in some ontology and to display the definitions from the ontology to the users. It is also of interest to go out and ask for a particular object one is interested in, what objects out there in the wide world that are also annotated with individuals for the same classes.  

So if I have an ExperimentDesign annotated with the MGED Ontology individual cellular_modification_design of the class ExperimentDesignType, I might like to find similar ExperimentDesigns.  How far I would want a tool to go in traversing what is similar is a function of time and resources, I might want a tool that did exact matches, a tool that searched for linked terms in other ontologies and objects associated with those terms, a tool that understood that a linked term in another ontology lead through a relationship in that ontology to other terms and so on but in the end I am only interested in like ExperimentDesigns not in the ontologies themselves.

This use case, which is what is happening ad hoc now, is what I would love to see the semantic web support initially.  I don't think the life sciences need complex reasoning enabled as of yet because it doesn't even have the simple cases hooked to the semantic web yet.

cheers,
Michael

Michael Miller
Lead Software Developer
Rosetta Biosoftware Business Unit
www.rosettabio.com

> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org 
> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of 
> Chimezie Ogbuji
> Sent: Thursday, July 13, 2006 11:44 AM
> To: Mark Wilkinson
> Cc: public-semweb-lifesci@w3.org
> Subject: Re: LSIDs and ontology segmentation
> 
> 
> 
> 
> > In a publication that will be available soon [1] we 
> (briefly) discuss
> > the problem of actually *using* the currently available 
> ontologies in a
> > "real" Semantic Web setting - i.e. dynamically downloading whatever
> > ontologies are necessary given the predicates that you find in some
> > discovered RDF instance document.
> >  The OWL representation of GO is over
> > 10 Meg... for heavens sake!... and GO is a small ontology 
> compared to
> > things like the NCI Metathesaurus.
> >
> > The problem with using document#fragment URLs to identify 
> ontology nodes
> > is that the defined behaviour for resolving such an identifier is to
> > drop the fragment (since that isn't available server-side 
> anyway) and to
> > return the entire document... all 10Meg's of GO... each time...  We
> > would argue, therefore, that the URL (if you adopt its default
> > behaviour) is not only a bit of a nuisance, it is a blocker 
> in some/many
> > cases.
> 
> I don't think this particular case has much to do with URLs 
> themselves but 
> as to how an ontology author wishes to distribute his/her 
> ontology.  The 
> behavior you mention is only the case if the ontology terms 
> are URLs - 
> i.e., they are locators as well as identifiers.  Even for 
> ontologies of 
> small size, I would consider this a bad practice for ontology 
> distribution.  There are many consequences for resolving 
> terms from an 
> ontology out of context, the primary one being that in doing 
> so you may 
> not have enough closure to faciliate reasoning.
> 
> Automatically attempting to dereference vocabulary terms in 
> an instance 
> graph in order to tie them in with their defining ontology is 
> one of many options. 
> In an earlier thread, it's been pointed out that more 'controlled' 
> mechanisms can be used to do this.  For one thing 
> interpreting a Semantic 
> Web in this way this assumes that the terms are URLs 
> specifically - which
> is not practical (for reasons you've pointed out as well as 
> the issues 
> with reasoning).
> 
> I would think that an author of an ontology of this size 
> would want to consider fragmenting the ontology (perhaps by 
> sub-domains) and linking them with owl:imports.  In such a 
> scenario, the 
> terms could simply be identifiers asserted within each 
> ontology fragment 
> and only the ontology fragments would need URLs for dynamic 
> resolution.
> 
> >
> > There's been some exciting work in the domain of ontology 
> segmentation
> > [2,3,4,5] that, we believe, is perhaps a more rational way 
> of working
> > with these massive ontologies when you need to get 
> on-the-fly access to
> > only the portions of the ontology that are relevant to your 
> Blackberry's
> > agent at that moment.
> 
> I think the combination of fragmenting ontologies using terms 
> that were meant to suite this purpose as well as more 
> controlled mechanisms for 
> calculating web closure address this issue.
> 
> >  I know that others (e.g. Damian Gessler and
> > collaborators at NCGR, but I don't have the reference to 
> his submitted
> > manuscript at hand right now... sorry Damian!) are also 
> working on the
> > problem of segmentation by passing a self-inflating 
> "flattened" ontology
> > fragment.  The problem is that there is no Semantic 
> Web-style protocol
> > available to specify that this is the behaviour you want, or for the
> > agent to know that this is the behaviour to expect.
> 
> I'm curious about how your thoughts on:
> 
> http://esw.w3.org/topic/HCLS/WebClosureSocialConvention
> 
> > Here is where I think the LSID could really shine!  Unlike 
> a URL, the
> > LSID does not have to return an entire document in response to a
> > getMetaData call.  Thus, if an LSID were used as the 
> identifier for an
> > ontology node, the behaviour of the getMetadata call could be, by
> > convention or by standard, to return only the relevant ontology
> > fragment, where that fragment was generated by e.g. the Rector
> > Segmentation generator in the background.
> 
> Determining such a fragment depends heavily on relationships between 
> terms as well as  decidability / complexity issues (some ontologies 
> specifically partition out parts that would cause the 
> ontologies to be 
> OWL-full).  Issues such as these are best addressed by the 
> author of an 
> ontology directly and there are existing tools for doing so - 
> just a lack 
> of any protocol to guide agents.
> 
> I'm not familiar with the full mechanics of LSID resolution, 
> but it sounds 
> to me like what you suggest could be the behavior for calling 
> getMetadata 
> on terms in an ontology can be addressed by distributing 
> fragments of an 
> ontology (grouped logically or for by levels of complexity - 
> OWL-DL/OWL-Lite, etc..), ontology linking terms, and a set of 
> protocols 
> for 'guided' web closure that agents can follow.
> 
> Chimezie Ogbuji
> Lead Systems Analyst
> Thoracic and Cardiovascular Surgery
> Cleveland Clinic Foundation
> 9500 Euclid Avenue/ W26
> Cleveland, Ohio 44195
> Office: (216)444-8593
> ogbujic@ccf.org
> 
> 
> >
> >
> > [1]	Good, B, Wilkinson, M. (in press). The Life Sciences 
> Semantic Web is
> > Full of Creeps!  Briefings in Bioinformatics.
> > [2]	Noy, N, Musen, M. Specifying Ontology Views by Traversal. 2004.
> > [3]	Alani, H, Harris, S, O'Neil, B. Ontology Winnowing: A 
> Case Study on
> > the AKT Reference Ontology. 2005.
> > [4]	Seidenberg, J, Rector, A (2006), 'Web Ontology Segmentation:
> > Analysis, Classification and Use', World Wide Web, ACM, Edinburgh,
> > Scotland.
> > [5]	Stuckenschmidt, H, Klein, M. Structure-Based 
> Partitioning of Large
> > Concept Hierarchies. 2004.
> >
> >
> >
> >
> > --
> > Mark Wilkinson
> > Asst. Professor, Dept. of Medical Genetics
> > University of British Columbia
> > PI in Bioinformatics, iCAPTURE Centre
> > St. Paul's Hospital, Rm. 166, 1081 Burrard St.
> > Vancouver, BC, V6Z 1Y6
> > tel: 604 682 2344 x62129
> > fax: 604 806 9274
> >
> > "Since the point of a definition is to explain the meaning 
> of a term to
> >   someone who is unfamiliar with its proper application, the use of
> > language that doesn't help such a person learn how to apply 
> the term is
> > pointless. Thus, "happiness is a warm puppy" may be a 
> lovely thought,
> >                     but it is a lousy definition."
> >                                                             
>    Köhler et al, 2006
> >
> >
> >
>
Received on Thursday, 13 July 2006 19:55:48 UTC