- From: Brian Gilman <gilmanb@jforge.net>
- Date: Tue, 20 Apr 2004 18:57:19 -0400
- To: public-semweb-lifesci@w3.org
- Message-Id: <13A7D311-931E-11D8-A5BB-000A95CA3D68@jforge.net>
Hello Everyone, I'm not an expert on URI's but, I am an author on the LSID specification and would like to clarify some issues. 1) URI's are a nightmare in the lifesciences. Particularly when used to encode semantic information about a particular entity that exists on the web. For example (from the DAS 1.0 specification): '/wormbase/das/elegans/features?segment=CHROMOSOME_I:1000,2000' This leads the programmer and biologist to certain conclusions about query semantics ie. what this URI encodes and (perhaps) what the programmer meant when using a certain encoding scheme. People infer meaning from a URI and learn this semantic very quickly. Some would argue that this is a good thing however, once the biologist trains themselves on this type of system, the developers of these systems are forever locked into this scheme of identification. This will forever become the identifier for this entity. In the case noted above, this is particularly cumbersome: If a researcher has started to annotate this region of the chromosome with metadata and the underlying data changes As with any scientific data, there must be a way to reasonably reproduce the evidence that lead to a particular result or hypothesis. By encoding things with URI's we do not guard against the fact that the underlying data may change. This leads me to a question about "persistent" URI's and URL's (PURLS's): How do you ensure that two URI's are pointing at the same object (bytes)? If we can collectively answer this question we can encode an LSID any way we please as long as we keep in mind that this information must persist as long as a journal or other well vetted scientific medium. 2) (sorry to be repetitive) Scientists typically perform research on the web as a supplemental exercise. By this, I mean that most researchers use data gathered from the web to enhance their knowledge about a certain gene, protein, transcript, chemical etc. This data is not typically referenced in a journal article etc. If we want to allow for the incorporation and dissemination of scientific information and knowledge across the internet as a common means of communication we need to ensure two things: a) Persistence b) Provenance Science requires that an experiment be reproducible by other researchers and that the discoverer/institution get credit for the discovery made or technique used to make the discovery. We must pay particular attention to this as we craft the LSID specification. 3) Browsers, HTTP semantics of query, RESTful interfaces, etc. are secondary to how data is used in the industry. Having a resolver to get at a particular piece of information should not be a huge barrier to the LSID specification's adoption. Case in point, IBM's implementation of LSID utilizes a COM plugin to allow users to perform LSID queries from a web browser. ie. lsid://<authority>:<namespace>:<identifier>:<version> I hope this helps. I'll be posting specific examples of LSID in RDF in the next few weeks which I hope will help clarify this issue further. Best, -B -- Brian Gilman President Panther Informatics Inc. 9 Acadia Park #2 Somerville, MA 02143 Phone 617-335-8276 E-Mail: gilmanb@pantherinformatics.com gilmanb@jforge.net AIM: gilmanb1 01000010 01101001 01101111 01001001 01101110 01100110 01101111 01110010 01101101 01100001 01110100 01101001 01100011 01101001 01100001 01101110 Confidentiality Notice This transmission and the documents contained herein are confidential and privileged. The transmission and the documents are intended only for the individuals or entities named above. If you are not the intended recipient, any disclosure, copying, distribution or use of this transmission is prohibited. If you received this transmission in error, please contact us immediately so that we may arrange for its return. On Apr 15, 2004, at 2:27 AM, Greg Tyrelle wrote: > > *** Marja-Riitta Koivunen wrote: > |> I am not sure how to answer this ultimate question. Perhaps I > need to > |>understand more about HTTP URIs in order to give comparison with > the URN > |>used in LSID spec. To be honest I have tried to find more and I > gave up > |>after reading very nice article about HTTP URIs by Tim Berners-Lee > |>(http://www.w3.org/DesignIssues/HTTP-URI.html) that gave me > feeling that I > |>am out of the league :-( > > I am by no means an expert on this either :) > > How URIs are used in the web architecture and the semantic web > architecture are contentious issues to say the least. Given the > importance of standardisation for the life sciences e.g. MAGE, I am > simply trying to understand how identifier schemes such as LSID fit > into the current thinking about the semantic web and URIs. > > |I think the question is mainly why reinvent a wheel that already > exixts. > > Precisely. > > |Using persistent HTTP URIs is a good goal because it is standard > and there > |exists a lot of HTTP based applications e.g. browsers that > understand HTTP > |URIs and can provide information of the resource on the Web without > |anything extra. > > This is the exactly the context in which I was trying to raise the > issue of using LSIDs in RDF. Technically speaking there is nothing > wrong with the current LSID specification IMO. However if I want to > allow other users to dereference LSIDs that my authority mints, it > requires me to maintain an LSID resolver. Clients must also have the > necessary libraries to dereference the LSID. > > Having this infrastructure in place adds a practical burden to using > LSIDs which would go away if LSIDs were to be specified in terms of > HTTP URIs. Which, if we assume persistence is an organisational issue, > then HTTP URIs are just as good as URNs. While the RDF spec says > nothing about URIs being dereferenced to provide representations, my > practical expectation is that they should i.e. it makes that resource > more useful if I can get either a representation of that resource or > metadata about that resource. > > The LSID does specify interfaces for retrieving metadata about a LSID > which is a good thing. However I'll leave the "how to get metadata > about a resource" question for later... > > This leads of course to the thorny issue of whether HTTP URIs are > names or locations or both. My simple view on this is that a HTTP URI > is a name in the same sense that the LSIDs are names, however it *may* > also be dereferenced to provide a representation of the resource that > it is naming (using the existing web infrastructure). > > |When somebody defines a URN they can usually as well reserve a HTTP > URI > |e.g. http://www.lsid.org/ and define URIs under it e.g. > |http://www.lsid.org/path.../enzyme1. > > This maybe one possible solution, however the LSID spec says nothing > about this. > > |Why we are now ALSO discussing of using UUID URNs is because we have > |problems with local file: type URIs (file: URIs e.g. > file//marja/myfile123) > |and we want to make them unambiguous when a user cannot do HTTP > URIs maybe > |because a user does not own a http:// domain name where to publish > it or > |for some other reason. > > I can see this as an issue for individuals creating local stores of > annotated bookmarks. In the case of the life sciences it would be > easier to control as any authority using HTTP URI based LSIDs would > need to have a http:// domain name to participate. > > |After publishing we mostly want to use the HTTP URIs to be able to > benefit > |from the common Web standards. But it is also possible for some > application > |to benefit from the URN bit of the information when so wished. > > Maybe there is a best of both worlds approach ? > > _greg > > -- > Greg Tyrelle >
Attachments
- text/enriched attachment: stored
Received on Tuesday, 20 April 2004 18:58:12 UTC