Re: A precedent suggesting a compromise for the SWHCLS IG Best Practices (ARK) from Sean Martin on 2006-07-31 (public-semweb-lifesci@w3.org from July 2006)

From: Sean Martin <sjmm@us.ibm.com>
Date: Mon, 31 Jul 2006 05:49:40 -0400
To: public-semweb-lifesci@w3.org
Message-ID: <OF97D22876.6A4F660A-ON852571BC.002FD88D-852571BC.0035FCE3@us.ibm.com>
Mark wrote exactly what I would have (thanks :)  Note the version 
information in an LSID is optional but encouraged when appropriate. Mark 
describes one common scenario, where one uses an LSID (or http URI) as a 
"concept" (meta-data only) that in turn has LSID URIs that name actual 
bytes representing that concept in different contexts, formats and of 
course versions. These 'concrete' style LSIDs can also have meta-data 
associated with them. To get the latest version of something one might 
first go to the concept URI for that thing and look at its meta-data to 
find out which LSIDs to dereference to get the appropriate format and 
version. At this time there is nothing written in the standard about when 
to apply versioning or how different versions relate to one another, 
although it would be good if we could provide more guidance on this sort 
of thing when we address meta-data expectations.

> 
> LSID's contract seems more to do with persistence, mutability, 
> cacheability, and discoverability of byte sequences  - not around 
> issues of the identifiers and their relations making ontological sense.
>

Certainly these were requirements that were addressed. Not sure I would 
say if it has 'more to do with' though as different groups have found 
different aspects of LSID more useful than others depending on what it is 
they want to name. This is not surprising as there were lots of 
requirements from lots of groups and was what made reaching any kind of 
agreement so difficult.

> 
> While I understand that in some contexts the issues around data 
> management are central, they aren't in all contexts. Because I think 
> that optimization of the data management issues, while in some ways 
> elegantly handled by the LSID protocol, aren't central to the issue 
> of representation in the life sciences, and because I don't see LSID 
> addressing the representation issues, I worry that  imposing the use 

I really don't believe that anyone is suggesting the general imposition of 
LSIDs or that this would be a good idea!

> of the LSID protocol puts a burden on all, for the benefit of 
> relatively few.  And for those relatively few who are going to go out 
> of their way to have internal copies of data and the like, I don't 
> see why a custom system that is circumvents http for efficiency 
> reasons is too much of a burden.
> 

If it is not sensible to do in your own case, it is quite simple, don't 
use LSIDs to name that data. Use something else that suits your purposes 
better - perhaps plain old http URIs. Those that find value in what LSID 
provides will sensibly use them for that purpose and those that don't need 
them will not. I don?t see any serious issues with this and feel no need 
for religion on it. As I mentioned before, in our own developing systems 
we find LSID very useful for certain purposes (for example naming blobs 
intended for wide area distribution) but also use common http URIs for 
lots of other things. Yes indeed, both types of URIs happily co-mingle in 
our RDF named-graphs without strife! BTW, I can see us happily using DOIs 
as URIs when we come to naming publications down the line (with an 
appropriate transforms on their meta-data during dereference.) 

What I would like to see out of this HCLS forum is agreement with others 
on: 

1] What one can expect [best practices and/or standards] in the way of RDF 
meta-data when one dereferences LSIDs, which would allow us to do far more 
with them jointly than we can do today. We should of course do the same 
for regular http URIs also and it probably would be best if in fact this 
was common. Alan, I think this possibly is the area where your concerns 
about representations need to be addressed.

2] It would be very useful to add to the LSID standard an http URL style 
dereferencing scheme (e.g.  http://lsid.info/lsid:xxx) for data consumers 
along the lines that the ARK identifier does, as we can see multiple 
benefits including immediate web accessibility, performance improvements 
and a simpler access stack for data consuming applications and devices 
like phones, PDAs and other thin clients. It would also make the provision 
of a JavaScript library for Web 2.0 style applications simple. It is my 
belief that technically this can be added in a simple, non-disruptive 
manner.


Kindest regards, Sean

--
Sean Martin
IBM Corp.
Received on Monday, 31 July 2006 09:49:53 UTC