LSIDs and ontology segmentation from Mark Wilkinson on 2006-07-13 (public-semweb-lifesci@w3.org from July 2006)

From: Mark Wilkinson <markw@illuminae.com>
Date: Thu, 13 Jul 2006 10:01:33 -0700
To: public-semweb-lifesci@w3.org
Message-Id: <1152810093.16226.109.camel@bioinfo.icapture.ubc.ca>
Hi all, 

I chatted with Sean Martin yesterday and he indicated that, on the last
SWLS teleconference, he mentioned one of the ideas that my lab and his
group have been tossing around for the past few weeks v.v. LSIDs
representing ontology nodes.  He asked me to fill-in some more detail in
a message to this mailing list.

In a publication that will be available soon [1] we (briefly) discuss
the problem of actually *using* the currently available ontologies in a
"real" Semantic Web setting - i.e. dynamically downloading whatever
ontologies are necessary given the predicates that you find in some
discovered RDF instance document.  The OWL representation of GO is over
10 Meg... for heavens sake!... and GO is a small ontology compared to
things like the NCI Metathesaurus.

The problem with using document#fragment URLs to identify ontology nodes
is that the defined behaviour for resolving such an identifier is to
drop the fragment (since that isn't available server-side anyway) and to
return the entire document... all 10Meg's of GO... each time...  We
would argue, therefore, that the URL (if you adopt its default
behaviour) is not only a bit of a nuisance, it is a blocker in some/many
cases.

There's been some exciting work in the domain of ontology segmentation
[2,3,4,5] that, we believe, is perhaps a more rational way of working
with these massive ontologies when you need to get on-the-fly access to
only the portions of the ontology that are relevant to your Blackberry's
agent at that moment.  I know that others (e.g. Damian Gessler and
collaborators at NCGR, but I don't have the reference to his submitted
manuscript at hand right now... sorry Damian!) are also working on the
problem of segmentation by passing a self-inflating "flattened" ontology
fragment.  The problem is that there is no Semantic Web-style protocol
available to specify that this is the behaviour you want, or for the
agent to know that this is the behaviour to expect.  Some of these
projects are setting up the ontology fragment-generator as a Web Service
(if I recall correctly, Rector's group does this [4]), however this
doesn't solve the SW problem either because we can't (easily) model a
Web Service invocation as a single URI (at least, not by any existing
standard or convention... I guess some long REST-style URLs could do
this...)

Here is where I think the LSID could really shine!  Unlike a URL, the
LSID does not have to return an entire document in response to a
getMetaData call.  Thus, if an LSID were used as the identifier for an
ontology node, the behaviour of the getMetadata call could be, by
convention or by standard, to return only the relevant ontology
fragment, where that fragment was generated by e.g. the Rector
Segmentation generator in the background.

These were just early thoughts we've been having, but Sean asked me to
share them with the group in hopes of fanning the flames of discussion
and debate.  It seems to me to be a "blocker" issue when it comes to
deploying SW applications in the wild, and I know that projects like
Damian's Semantic MOBY have hit this problem early and hard, as have I
in my own sandbox.  It's all well and good when we play SW on our own
local machine, but as soon as we try to play SW in the wi(l)der world
this problem cripples us almost instantly.  We think the LSID is (a/the)
solution to this problem, but no solution will be useful if it doesn't
have wider adoption, so... 

opinions?

Cheers all!

Mark



[1] Good, B, Wilkinson, M. (in press). The Life Sciences Semantic Web is
Full of Creeps!  Briefings in Bioinformatics.
[2] Noy, N, Musen, M. Specifying Ontology Views by Traversal. 2004.
[3] Alani, H, Harris, S, O'Neil, B. Ontology Winnowing: A Case Study on
the AKT Reference Ontology. 2005.
[4] Seidenberg, J, Rector, A (2006), 'Web Ontology Segmentation:
Analysis, Classification and Use', World Wide Web, ACM, Edinburgh,
Scotland.
[5] Stuckenschmidt, H, Klein, M. Structure-Based Partitioning of Large
Concept Hierarchies. 2004.




-- 
Mark Wilkinson
Asst. Professor, Dept. of Medical Genetics
University of British Columbia
PI in Bioinformatics, iCAPTURE Centre
St. Paul's Hospital, Rm. 166, 1081 Burrard St.
Vancouver, BC, V6Z 1Y6
tel: 604 682 2344 x62129
fax: 604 806 9274

"Since the point of a definition is to explain the meaning of a term to
   someone who is unfamiliar with its proper application, the use of
language that doesn't help such a person learn how to apply the term is
 pointless. Thus, "happiness is a warm puppy" may be a lovely thought,
                     but it is a lousy definition."
                                                                Köhler et al, 2006
Received on Thursday, 13 July 2006 17:01:39 UTC