- From: Chris Wroe <cwroe@cs.man.ac.uk>
- Date: Thu, 22 Apr 2004 12:16:02 +0100
- To: <public-semweb-lifesci@w3.org>
- Message-Id: <E1BGc5j-0000N4-0i@mailhost>
>Some examples and further discussion on identifiers (LSID) for life >sciences as they pertain to the semantic web would be great. > >_greg I may be able to help with a concrete example. I'm working within the myGrid (http://www.mygrid.org.uk) project to build semantic webs of provenance and we have adopted LSID. ...(I must disclose here: Communication between LSID developers and myGrid developers has been aided by some individuals having both roles. Also Carole Goble PI of myGrid is also on the science and technology board of I3C.) ... We have biologists / bioinformaticians running web service based workflows to automate the detection of newly submitted and relevant sequence to genome databases and subsequent annotation pipelines. We end up with a large collection of interrelated results which causes a significant results management problem. To address the issue, we use RDF based metadata to represent the relationships between final data, intermediate results, logs of the service calls, ontological descriptions of the data etc. We are implementing a distributed system (hence the grid in the title of the project). Therefore we must have a mechanism for retrieving both data and metadata for a distant resource from its identifier. We could have used URLs. However LSID has made our life easier. 1) There is no convention (that I know about and my knowledge is limited) that specifies exactly what you get when you resolve a URL. You get a document that can represent metadata, data or a more likely a combination of both. The most used model on the web is to incorporate some RDF metadata inline into a document. We cannot use that model and need a clean separation of data and metadata: a) We are dealing with raw data rather than published documents. We need to pipe this data to other services unadulterated by metadata annotations. b) Data and metadata can reside in different locations. c) We want to attach metadata to resources for which we have no control (3rd party metadata) d) Provenance becomes worthless if we can allow the data resolved by a URL to change. However provenance becomes more valuable the more metadata is attached and updated. 2) Use of LSIDs is also an explicit social commitment to maintaining immutable and permanent data. It is not technically challenging to implement a convention for how to retrieve metadata and data from a URL but that would have been myGrid specific. LSIDs provided us with a ready built protocol undergoing a standardisation procedure. Sean and his colleagues have also backed that up with client and server code which has proven fairly straightforward to integrate. We now have a myGrid specific LSID authority to which the workflow enactor publishes intermediate and final result data together with the RDF based provenance metadata. Over time this is producing a local semantic web of who's done what in the group. To be honest adoption of LSID has not been a big issue. Our biggest headache is how to present this "semantic web" back to the user. Generic RDF graph views just don't cut it. We are looking at Haystack (http://haystack.lcs.mit.edu) as one mechanism to deal with the complexity. Chris Dr Chris Wroe Clinical Research Fellow Information Management Group Dept of Computer Science Manchester University UK http://www.cs.man.ac.uk/~wroec
Received on Thursday, 22 April 2004 07:20:21 UTC