RE: URLs/LSID/RDF etc. from Chris Wroe on 2004-04-22 (public-semweb-lifesci@w3.org from April 2004)

From: Chris Wroe <cwroe@cs.man.ac.uk>
Date: Thu, 22 Apr 2004 12:16:02 +0100
To: <public-semweb-lifesci@w3.org>
Message-Id: <E1BGc5j-0000N4-0i@mailhost>
>Some examples and further discussion on identifiers (LSID) for life

>sciences as they pertain to the semantic web would be great.

> 

>_greg

I may be able to help with a concrete example. I'm working within the myGrid
(http://www.mygrid.org.uk) project to build semantic webs of provenance and
we have adopted LSID. 

...(I must disclose here: Communication between LSID developers and myGrid
developers has been aided by some individuals having both roles. Also Carole
Goble PI of myGrid is also on the science and technology board of I3C.) ...

We have biologists / bioinformaticians running web service based workflows
to automate the detection of newly submitted and relevant sequence to genome
databases and subsequent annotation pipelines. We end up with a large
collection of interrelated results which causes a significant results
management problem. 

To address the issue, we use RDF based metadata to represent the
relationships between final data, intermediate results, logs of the service
calls, ontological descriptions of the data etc. 

We are implementing a distributed system (hence the grid in the title of the
project). Therefore we must have a mechanism for retrieving both data and
metadata for a distant resource from its identifier. 

We could have used URLs. However LSID has made our life easier. 

1) There is no convention (that I know about and my knowledge is limited)
that specifies exactly what you get when you resolve a URL. You get a
document that can represent metadata, data or a more likely a combination of
both. The most used model on the web is to incorporate some RDF metadata
inline into a document. We cannot use that model and need a clean separation
of data and metadata:

a) We are dealing with raw data rather than published documents. We need to
pipe this data to other services unadulterated by metadata annotations.

b) Data and metadata can reside in different locations.

c) We want to attach metadata to resources for which we have no control (3rd
party metadata)

d) Provenance becomes worthless if we can allow the data resolved by a URL
to change. However provenance becomes more valuable the more metadata is
attached and updated. 

2) Use of LSIDs is also an explicit social commitment to maintaining
immutable and permanent data. 

It is not technically challenging to implement a convention for how to
retrieve metadata and data from a URL but that would have been myGrid
specific. LSIDs provided us with a ready built protocol undergoing a
standardisation procedure. Sean and his colleagues have also backed that up
with client and server code which has proven fairly straightforward to
integrate. We now have a myGrid specific LSID authority to which the
workflow enactor publishes intermediate and final result data together with
the RDF based provenance metadata. 

Over time this is producing a local semantic web of who's done what in the
group. To be honest adoption of LSID has not been a big issue. Our biggest
headache is how to present this "semantic web" back to the user. Generic RDF
graph views just don't cut it. We are looking at Haystack
(http://haystack.lcs.mit.edu) as one mechanism to deal with the complexity.

Chris

 
Dr Chris Wroe
Clinical Research Fellow
Information Management Group
Dept of Computer Science
Manchester University 
UK
http://www.cs.man.ac.uk/~wroec
Received on Thursday, 22 April 2004 07:20:21 UTC