- From: Sean Martin <sjmm@us.ibm.com>
- Date: Wed, 21 Apr 2004 22:08:09 -0400
- To: public-semweb-lifesci@w3.org
- Cc: marja@annotea.org, greg@tyrelle.net, gilmanb@pantherinformatics.com, senger@ebi.ac.uk
- Message-ID: <OFE812A896.390AB7FE-ON85256E7E.0002772A-85256E7E.000BBBA4@us.ibm.com>
hi Everyone, As yet another person who worked on the LSID spec (dereferencing scheme and the addition of RDF meta-data discovery & retrieval), I have a supplement to Brian's question. BG>This leads me to a question about "persistent" URI's and URL's BG>(PURLS's): How do you ensure that two URI's are pointing at the same BG>object (bytes)? My question is how does one programmatically identify a persistent HTTP URI, as opposed to one that will retrieve tomorrow's weather or perhaps retrieve a file from a P2P network or one that returns dynamically changing content? Apologies in advance if there is an obvious answer to this question. As to the question Greg originally asked which is why invent anything new since we already have HTTP URI's, my short answer is that they did not seem to be sufficient in themselves to address the problems that the LSID scheme was designed for. The scheme devised does of course lean heavily on HTTP URI's as probably the primary method of retrieval of the data object or meta-data about that object - after all much of the public LS data is actually out there on the web already retrievable by HTTP URI. If HTTP URI's were sufficient today, we would not have need of the LSID. So perhaps the question you should ask your self is why are people not already widely using URL's for LS naming? For me the main points are: Location independence of the object named - the extra layer of indirection makes this flexibility possible - there is a starting assumption that users will make/exchange local copies of the objects and also that authority entities will at some point want to transfer the authority over a LSID to another authority entity - while potentially maintaining control of their domain name, sometimes the same data is served from more than one "official" place on the web(e.g. Swiss-Prot - Marja, how does Annotea deal with this situation?), having the option of not using domain names in the identifier at all; Providing/using LSID's for one's data establishes a "contract" in which certain properties can be assumed (beyond those of the HTTP URI "contract") of an LSID named object: defines what can safely be assumed about multiple copies of objects which have the same LSID name - i.e. that they are identical; clear definition of what persistence means [both availability and never modifying a named object]; a formal mechanism for retrieving data [never ever changes] over multiple protocols and discovering and retrieving meta-data [which can change] about that object and its relationship to other objects [from the original source of the object or from a third-party who has something to add of their own] all using a single globally unique name. One parting thought.. widespread adoption of LSID spec. across the industry will at the same time create a very large semantic web. Kindest regards, Sean -- Sean Martin IBM Corp. ---------- Forwarded message ---------- Date: Mon, 19 Apr 2004 12:42:46 -0400 From: Brian Gilman <gilmanb@pantherinformatics.com> To: Greg Tyrelle <greg@tyrelle.net> Cc: Martin Senger <senger@ebi.ac.uk>, public-semweb-lifesci@w3.org, Marja-Riitta Koivunen <marja@annotea.org> Subject: Re: Use of LSIDs in RDF Hello Everyone, I'm not an expert on URI's but, I am an author on the LSID specification and would like to clarify some issues. 1) URI's are a nightmare in the lifesciences. Particularly when used to encode semantic information about a particular entity that exists on the web. For example (from the DAS 1.0 specification): '/wormbase/das/elegans/features?segment=CHROMOSOME_I:1000,2000' This leads the programmer and biologist to certain conclusions about query semantics ie. what this URI encodes and (perhaps) what the programmer meant when using a certain encoding scheme. People infer meaning from a URI and learn this semantic very quickly. Some would argue that this is a good thing however, once the biologist trains themselves on this type of system, the developers of these systems are forever locked into this scheme of identification. This will forever become the identifier for this entity. In the case noted above, this is particularly cumbersome: If a researcher has started to annotate this region of the chromosome with metadata and the underlying data changes As with any scientific data, there must be a way to reasonably reproduce the evidence that lead to a particular result or hypothesis. By encoding things with URI's we do not guard against the fact that the underlying data may change. This leads me to a question about "persistent" URI's and URL's (PURLS's): How do you ensure that two URI's are pointing at the same object (bytes)? If we can collectively answer this question we can encode an LSID any way we please as long as we keep in mind that this information must persist as long as a journal or other well vetted scientific medium. 2) (sorry to be repetitive) Scientists typically perform research on the web as a supplemental exercise. By this, I mean that most researchers use data gathered from the web to enhance their knowledge about a certain gene, protein, transcript, chemical etc. This data is not typically referenced in a journal article etc. If we want to allow for the incorporation and dissemination of scientific information and knowledge across the internet as a common means of communication we need to ensure two things: a) Persistence b) Provenance Science requires that an experiment be reproducible by other researchers and that the discoverer/institution get credit for the discovery made or technique used to make the discovery. We must pay particular attention to this as we craft the LSID specification. 3) Browsers, HTTP semantics of query, RESTful interfaces, etc. are secondary to how data is used in the industry. Having a resolver to get at a particular piece of information should not be a huge barrier to the LSID specification's adoption. Case in point, IBM's implementation of LSID utilizes a COM plugin to allow users to perform LSID queries from a web browser. ie. lsid://<authority>:<namespace>:<identifier>:<version> I hope this helps. I'll be posting specific examples of LSID in RDF in the next few weeks which I hope will help clarify this issue further. Best, -B -- Brian Gilman President Panther Informatics Inc. 9 Acadia Park #2 Somerville, MA 02143 Phone 617-335-8276 E-Mail: gilmanb@pantherinformatics.com gilmanb@jforge.net AIM: gilmanb1 01000010 01101001 01101111 01001001 01101110 01100110 01101111 01110010 01101101 01100001 01110100 01101001 01100011 01101001 01100001 01101110 Confidentiality Notice This transmission and the documents contained herein are confidential and privileged. The transmission and the documents are intended only for the individuals or entities named above. If you are not the intended recipient, any disclosure, copying, distribution or use of this transmission is prohibited. If you received this transmission in error, please contact us immediately so that we may arrange for its return. On Apr 15, 2004, at 2:27 AM, Greg Tyrelle wrote: > > *** Marja-Riitta Koivunen wrote: > |> I am not sure how to answer this ultimate question. Perhaps I > need to > |>understand more about HTTP URIs in order to give comparison with > the URN > |>used in LSID spec. To be honest I have tried to find more and I > gave up > |>after reading very nice article about HTTP URIs by Tim Berners-Lee > |>(http://www.w3.org/DesignIssues/HTTP-URI.html) that gave me > feeling that I > |>am out of the league :-( > > I am by no means an expert on this either :) > > How URIs are used in the web architecture and the semantic web > architecture are contentious issues to say the least. Given the > importance of standardisation for the life sciences e.g. MAGE, I am > simply trying to understand how identifier schemes such as LSID fit > into the current thinking about the semantic web and URIs. > > |I think the question is mainly why reinvent a wheel that already > exixts. > > Precisely. > > |Using persistent HTTP URIs is a good goal because it is standard > and there > |exists a lot of HTTP based applications e.g. browsers that > understand HTTP > |URIs and can provide information of the resource on the Web without > |anything extra. > > This is the exactly the context in which I was trying to raise the > issue of using LSIDs in RDF. Technically speaking there is nothing > wrong with the current LSID specification IMO. However if I want to > allow other users to dereference LSIDs that my authority mints, it > requires me to maintain an LSID resolver. Clients must also have the > necessary libraries to dereference the LSID. > > Having this infrastructure in place adds a practical burden to using > LSIDs which would go away if LSIDs were to be specified in terms of > HTTP URIs. Which, if we assume persistence is an organisational issue, > then HTTP URIs are just as good as URNs. While the RDF spec says > nothing about URIs being dereferenced to provide representations, my > practical expectation is that they should i.e. it makes that resource > more useful if I can get either a representation of that resource or > metadata about that resource. > > The LSID does specify interfaces for retrieving metadata about a LSID > which is a good thing. However I'll leave the "how to get metadata > about a resource" question for later... > > This leads of course to the thorny issue of whether HTTP URIs are > names or locations or both. My simple view on this is that a HTTP URI > is a name in the same sense that the LSIDs are names, however it *may* > also be dereferenced to provide a representation of the resource that > it is naming (using the existing web infrastructure). > > |When somebody defines a URN they can usually as well reserve a HTTP > URI > |e.g. http://www.lsid.org/ and define URIs under it e.g. > |http://www.lsid.org/path.../enzyme1. > > This maybe one possible solution, however the LSID spec says nothing > about this. > > |Why we are now ALSO discussing of using UUID URNs is because we have > |problems with local file: type URIs (file: URIs e.g. > file//marja/myfile123) > |and we want to make them unambiguous when a user cannot do HTTP > URIs maybe > |because a user does not own a http:// domain name where to publish > it or > |for some other reason. > > I can see this as an issue for individuals creating local stores of > annotated bookmarks. In the case of the life sciences it would be > easier to control as any authority using HTTP URI based LSIDs would > need to have a http:// domain name to participate. > > |After publishing we mostly want to use the HTTP URIs to be able to > benefit > |from the common Web standards. But it is also possible for some > application > |to benefit from the URN bit of the information when so wished. > > Maybe there is a best of both worlds approach ? > > _greg > > -- > Greg Tyrelle >
Received on Wednesday, 21 April 2004 22:12:26 UTC