- From: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
- Date: Thu, 13 Jul 2006 14:43:40 -0400 (EDT)
- To: Mark Wilkinson <markw@illuminae.com>
- cc: public-semweb-lifesci@w3.org
- Message-ID: <Pine.GSO.4.60.0607131359220.11590@joplin.bio.ri.ccf.org>
> In a publication that will be available soon [1] we (briefly) discuss > the problem of actually *using* the currently available ontologies in a > "real" Semantic Web setting - i.e. dynamically downloading whatever > ontologies are necessary given the predicates that you find in some > discovered RDF instance document. > The OWL representation of GO is over > 10 Meg... for heavens sake!... and GO is a small ontology compared to > things like the NCI Metathesaurus. > > The problem with using document#fragment URLs to identify ontology nodes > is that the defined behaviour for resolving such an identifier is to > drop the fragment (since that isn't available server-side anyway) and to > return the entire document... all 10Meg's of GO... each time... We > would argue, therefore, that the URL (if you adopt its default > behaviour) is not only a bit of a nuisance, it is a blocker in some/many > cases. I don't think this particular case has much to do with URLs themselves but as to how an ontology author wishes to distribute his/her ontology. The behavior you mention is only the case if the ontology terms are URLs - i.e., they are locators as well as identifiers. Even for ontologies of small size, I would consider this a bad practice for ontology distribution. There are many consequences for resolving terms from an ontology out of context, the primary one being that in doing so you may not have enough closure to faciliate reasoning. Automatically attempting to dereference vocabulary terms in an instance graph in order to tie them in with their defining ontology is one of many options. In an earlier thread, it's been pointed out that more 'controlled' mechanisms can be used to do this. For one thing interpreting a Semantic Web in this way this assumes that the terms are URLs specifically - which is not practical (for reasons you've pointed out as well as the issues with reasoning). I would think that an author of an ontology of this size would want to consider fragmenting the ontology (perhaps by sub-domains) and linking them with owl:imports. In such a scenario, the terms could simply be identifiers asserted within each ontology fragment and only the ontology fragments would need URLs for dynamic resolution. > > There's been some exciting work in the domain of ontology segmentation > [2,3,4,5] that, we believe, is perhaps a more rational way of working > with these massive ontologies when you need to get on-the-fly access to > only the portions of the ontology that are relevant to your Blackberry's > agent at that moment. I think the combination of fragmenting ontologies using terms that were meant to suite this purpose as well as more controlled mechanisms for calculating web closure address this issue. > I know that others (e.g. Damian Gessler and > collaborators at NCGR, but I don't have the reference to his submitted > manuscript at hand right now... sorry Damian!) are also working on the > problem of segmentation by passing a self-inflating "flattened" ontology > fragment. The problem is that there is no Semantic Web-style protocol > available to specify that this is the behaviour you want, or for the > agent to know that this is the behaviour to expect. I'm curious about how your thoughts on: http://esw.w3.org/topic/HCLS/WebClosureSocialConvention > Here is where I think the LSID could really shine! Unlike a URL, the > LSID does not have to return an entire document in response to a > getMetaData call. Thus, if an LSID were used as the identifier for an > ontology node, the behaviour of the getMetadata call could be, by > convention or by standard, to return only the relevant ontology > fragment, where that fragment was generated by e.g. the Rector > Segmentation generator in the background. Determining such a fragment depends heavily on relationships between terms as well as decidability / complexity issues (some ontologies specifically partition out parts that would cause the ontologies to be OWL-full). Issues such as these are best addressed by the author of an ontology directly and there are existing tools for doing so - just a lack of any protocol to guide agents. I'm not familiar with the full mechanics of LSID resolution, but it sounds to me like what you suggest could be the behavior for calling getMetadata on terms in an ontology can be addressed by distributing fragments of an ontology (grouped logically or for by levels of complexity - OWL-DL/OWL-Lite, etc..), ontology linking terms, and a set of protocols for 'guided' web closure that agents can follow. Chimezie Ogbuji Lead Systems Analyst Thoracic and Cardiovascular Surgery Cleveland Clinic Foundation 9500 Euclid Avenue/ W26 Cleveland, Ohio 44195 Office: (216)444-8593 ogbujic@ccf.org > > > [1] Good, B, Wilkinson, M. (in press). The Life Sciences Semantic Web is > Full of Creeps! Briefings in Bioinformatics. > [2] Noy, N, Musen, M. Specifying Ontology Views by Traversal. 2004. > [3] Alani, H, Harris, S, O'Neil, B. Ontology Winnowing: A Case Study on > the AKT Reference Ontology. 2005. > [4] Seidenberg, J, Rector, A (2006), 'Web Ontology Segmentation: > Analysis, Classification and Use', World Wide Web, ACM, Edinburgh, > Scotland. > [5] Stuckenschmidt, H, Klein, M. Structure-Based Partitioning of Large > Concept Hierarchies. 2004. > > > > > -- > Mark Wilkinson > Asst. Professor, Dept. of Medical Genetics > University of British Columbia > PI in Bioinformatics, iCAPTURE Centre > St. Paul's Hospital, Rm. 166, 1081 Burrard St. > Vancouver, BC, V6Z 1Y6 > tel: 604 682 2344 x62129 > fax: 604 806 9274 > > "Since the point of a definition is to explain the meaning of a term to > someone who is unfamiliar with its proper application, the use of > language that doesn't help such a person learn how to apply the term is > pointless. Thus, "happiness is a warm puppy" may be a lovely thought, > but it is a lousy definition." > Köhler et al, 2006 > > >
Received on Thursday, 13 July 2006 18:43:56 UTC