- From: Sean Martin <sjmm@us.ibm.com>
- Date: Thu, 29 Apr 2004 14:38:30 -0400
- To: Greg Tyrelle <greg@tyrelle.net>
- Cc: public-semweb-lifesci@w3.org
- Message-ID: <OF17605AC4.AADF6173-ON85256E85.00412BD7-85256E85.00666777@us.ibm.com>
Hi Greg, GT>I am not aware of a way to programmatically identify a persistent HTTP GT>URI. Making URIs persistent is largely a function of who is responsible GT>for maintaining that URI's authority. and GT> This "contract" is a social contract, persistence based on a social GT> contract can also be true of HTTP URIs. That is one of the main difficulties with the HTTP URI used on its own for our purpose - that it has to mix up unique naming, with the current location & access [and to only a single copy of the object]. If this were not problem enough, it has historically provided access to both unique objects and objects that change. Given any random URL, you have no idea how you can actually treat it in your program - there is just no way to tell that it is actually a name of something rather than just the network location of an object or concept that has perhaps a dynamic expression. There is also no way to determine which particular "social contract" applies to this particular HTTP URI. This seems to me to be partly because URL's have been used (abused?) in so many different ways in the past and partly because they try to do at least two or three things at once. I am not sure how one either could undo this past or provide the extra facilities required without introducing something new. In contrast, any LSID starts out unambiguously with both social and technical contracts that provide certainty both on what sort of thing you are getting and the multiple ways in which you might get it. You can actually code a program around its use and recognize it for what it is - a unique name for something - at first sight without accessing it. In most cases it maps down to HTTP for actual access, but this does not always need to be the case. It is important that first and foremost, the LSID is unambiguously a unique name for something that might have many copies stored around the network. I believe this is why a URN spec. was chosen. Resolution was secondary. The technical contract of the LSID primarily gives you a standard method to enquire for where those places to obtain an exact copy of this particular object are (and there could be many, including some place local to your organization) and secondly a standard method to enquire widely (at the original authority, at other trusted authorities, at your organization, or an organization you collaborate with) about those places where information about this object and its relationships to other objects can be retrieved. If there ever was any reason at all to create a URN, I believe that uniquely identifying life science information is a reasonable one. The earliest web standards define how URN's should be named and the RFC's also provide guidance on creation of methods for dereferencing them. There are also a whole bunch of more recent standards like SOAP and WSDL that seem to have gained wide acceptance. Why not use them? The use cases fit. The fact that this LS URN and its specification is backwardly compatible with the web, using its name resolution and access protocols as well as a future semantic web seems all the better to me. The alternative is to attempt to shoe horn the problem into a protocol that was not designed to meet the needs. GT>People are using URLs (HTTP URIs for naming), for example: GT>http://www.biomedcentral.com/pubmed/12225585 Is this really a name for something or just a convenient link to something? In the context that you give it, it appears to me to be more like a link. NCBI have an entirely different name(s) for this thing. A third party providing a similar convenient link would have created a third name. If all places had used the LSID (urn:lsid:ncbi.org:pubmed:12225585:1) we (and our software) would know they were all talking about the same thing without having to do a thing. Now if we actually want a copy of it, we dereference it to fetch one via of any of those three HTTP URI's. Similarly if we want to know more about it, we ask places that may have metadata for it. It seems to me that the LSID getAvailable method might be updated to make use of the URIQA protocol URL's as one possible way of implementing the get getMetadata method (URIQA style HTTP URI links could be provided as the port types in the returned WSDL). GT> However HTTP has the 3XX error codes to provide redirection etc. Which HTTP URI is the unique name now? The original or the new location? Furthermore, how can I tell these two names are equivalent and reference the same object, especially as some folks discover and link to the newer name? GT>One aspect of this that bothers me though, is GT>partitioning of the semantic web into domains based on their metadata GT>access interfaces. Access to metadata based on URIs alone only makes GT>sense to me if the mechanisms to get the me is general for the web.\ Good point. The metadata access mechanisms for LSID's are mapped down onto exactly the same HTTP URI's everyone uses today. No point in reinventing a wheel. GT>It is true that only the widespread adoption of LSID will make it GT>useful to the semantic web. I am guessing by default (laziness ?) GT>HTTP URIs will be used as resources identifier if a LSID is not GT>*easily* usable i.e. tools, tools, tools... Yup :-) but don't forget that LS information on the web tody is not generally in any kind of semantic web right now - it is just plain old web. Many people in the industry perceive the need for the LSID but have no particular interest in semantic web (yet!). However if adopting LSID means they now become part of a semantic web [because that's what the LSID spec. says to do], the semantic web folks might benefit. That is why it is important we get this right. PDB is definitely down at the moment. We are working with them to bring up the authority there again with both the latest code (their previous version was using an out of date spec.) as well as extensive meta-data - previously it was a bare bones amount. Unfortunately they are snowed under right now with a major release of new code for their web site and database. I am not sure why the NCBI authority is down for you. For examples of NCBI data try LSID's like urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank_gi:30350027 urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:pubmed:12225585 urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank:bm872070 Omim urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:omim:605956 (omimuser/omimpass) Kindest regards, Sean -- Sean Martin IBM Corp. Greg Tyrelle <greg@tyrelle.net> 04/29/2004 02:53 AM To Sean Martin/Cambridge/IBM@IBMUS cc public-semweb-lifesci@w3.org Subject Re: Fw: Use of LSIDs in RDF (fwd) *** Sean Martin wrote: |BG>This leads me to a question about "persistent" URI's and URL's | BG>(PURLS's): How do you ensure that two URI's are pointing at the same | BG>object (bytes)? | |My question is how does one programmatically identify a persistent HTTP |URI, as opposed to one that will retrieve tomorrow's weather or perhaps |retrieve a file from a P2P network or one that returns dynamically |changing content? Apologies in advance if there is an obvious answer to |this question. I am not aware of a way to programmatically identify a persistent HTTP URI. Making URIs persistent is largely a function of who is responsible for maintaining that URI's authority. If I understand correctly the question you are asking is "tell me something about the resource being identified by this URI ?". There are a number of approaches to this. In the case of LSID this would be the getMetaData interfaces. For HTTP URIs my current favourite is URIQA (MGET HTTP method extension i.e. metadata get) [1]. RDDL [2] is intended for this purpose but mainly for namespaces. |HTTP URI's as probably the primary method of retrieval of the data object |or meta-data about that object - after all much of the public LS data is |actually out there on the web already retrievable by HTTP URI. If HTTP |URI's were sufficient today, we would not have need of the LSID. So |perhaps the question you should ask your self is why are people not |already widely using URL's for LS naming? People are using URLs (HTTP URIs for naming), for example: http://www.biomedcentral.com/pubmed/12225585 is a 302 redirect to the NCBI URL http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12225585&dopt=Abstract&holding=f1000 Again, I believe it is how HTTP URIs are used or managed which is the problem, not that they are broken or insufficient technology for the purpose of naming. |For me the main points are: |Location independence of the object named - the extra layer of indirection |makes this flexibility possible - there is a starting assumption that |users will make/exchange local copies of the objects and also that |authority entities will at some point want to transfer the authority over |a LSID to another authority entity - while potentially maintaining control |of their domain name, sometimes the same data is served from more than one |"official" place on the web(e.g. Swiss-Prot - Marja, how does Annotea deal |with this situation?), having the option of not using domain names in the |identifier at all; Good points. However HTTP has the 3XX error codes to provide redirection etc. why invent a new protocol when these already exist ? |Providing/using LSID's for one's data establishes a "contract" in which |certain properties can be assumed (beyond those of the HTTP URI |"contract") of an LSID named object: |defines what can safely be assumed about multiple copies of objects which |have the same LSID name - i.e. that they are identical; clear definition |of what persistence means [both availability and never modifying a named |object]; This "contract" is a social contract, persistence based on a social contract can also be true of HTTP URIs. |a formal mechanism for retrieving data [never ever changes] over multiple |protocols and discovering and retrieving meta-data [which can change] |about that object and its relationship to other objects [from the original |source of the object or from a third-party who has something to add of |their own] all using a single globally unique name. I think the selling point of LSID (for me) is a standard interface for life sciences metadata. One aspect of this that bothers me though, is partitioning of the semantic web into domains based on their metadata access interfaces. Access to metadata based on URIs alone only makes sense to me if the mechanisms to get the me is general for the web. |One parting thought.. widespread adoption of LSID spec. across the |industry will at the same time create a very large semantic web. It is true that only the widespread adoption of LSID will make it useful to the semantic web. I am guessing by default (laziness ?) HTTP URIs will be used as resources identifier if a LSID is not *easily* usable i.e. tools, tools, tools... My limited testing of the perl LSID clent implementation, the only LSIDs I was able to resolve were from the North temperate Lakes [3] authority. Both the PDB and NCBI authority URLs were not working (or I couldn't get them to work with the perl client). _greg [1] http://sw.nokia.com/uriqa/URIQA.html [2] http://www.rddl.org/ [3] http://lsid.limnology.wisc.edu/ -- Greg Tyrelle
Received on Thursday, 29 April 2004 14:39:13 UTC