Re: 303 +1, WSDL -1


We will certainly want our implementation to reflect best practices 
on URIs that may be emerging from the community, and we would be 
interested in participating in that process as well (BTW--what time 
is the call today and what number?)

Are you summarizing the issues and any consensus statements somewhere 
on the HCLS wiki?

In terms of ontology content, our initial focus would be providing 
URIs for ontologies that do not already have them. The issue of 
multiple URIs for the same entity we believe is actually a closely 
related problem to that of recognizing the same (or similar) entities 
in different ontologies (a problem we already face since the NCBO 
ontologies overlap in content). We will be approaching this problem 
by hosting mappings among ontologies.



Daniel Rubin, MD, MS
Clinical Asst. Professor, Radiology
Research Scientist, Stanford Medical Informatics
Scientific Director, National Center of Biomedical Ontology
MSOB X-215
Stanford, CA 94305

At 09:08 AM 7/16/2007, Alan Ruttenberg wrote:

>Hi Daniel,
>I hope you'll postpone your implementation decisions until the HCLS
>URI recommendations are published, and that the NCBO would follow
>those recommendations when the time comes to implement your system.
>As you can see, there is still interesting debate, and the
>possibility of new insights. If there are things that the NCBO feels
>strongly about, or requirements that you have that Jonathan has not
>incorporated, then I'd urge you and other interesting parties to join
>the call today, and to participate in the document drafting that
>Jonathan is leading.
>I'll note a minor concern in your statement - a number of ontologies
>that you host are not the product of NCBO work - are you suggesting
>that you will be creating new URIs for all the entities in those
>ontologies, even if they already have URIs? If so, it would seem that
>this could exacerbate the problems we are having, rather than helping
>- we've noted that that the proliferation of different URIs that
>identify the same thing is problematic from a SW point of view.
>On Jul 16, 2007, at 11:54 AM, wrote:
>>Just to remind everyone--NCBO is planning on providing URIs for
>>entities in the breadth of biomedical ontologies it hosts at 
>>This group has previously gave us a good set of functional
>>requirements, and over the next few months we will be implementing
>>Daniel Rubin, MD, MS
>>Clinical Asst. Professor, Radiology
>>Research Scientist, Stanford Medical Informatics
>>Scientific Director, National Center of Biomedical Ontology
>>MSOB X-215
>>Stanford, CA 94305
>>Quoting "Balaji S. Srinivasan" <>:
>>>>WSDL is a widely accepted W3C spec that is becoming increasingly
>>>accepted worldwide (and is, generally, automatically generated
>>>based on
>>>your interface, so requires little or no manual construction), and
>>>which solves a problem that we *know without any doubt* URLs cannot
>>>I may be mistaken, but isn't WSDL just an XML format? I don't see how
>>>it solves a problem that URLs "cannot solve"...wouldn't the
>>>location of
>>>"foo.wsdl" be best specified as a URL?
>>>>in fact, they [WSDL] are currently MORE POPULAR than RDF itself,
>>>according to Google Trends
>>>But the appropriate comparison is to URLs, not RDF...and the
>>>of a URL is that there's tons of widely deployed, lightweight
>>>technology for requesting data from a given URL (e.g. w/ a browser as
>>>well as Perl/Python/etc. libraries) and for setting up web servers
>>>(e.g. Apache).
>>>I don't understand why it should be necessary to develop a
>>>parallel set
>>>of technologies (e.g. the Firefox LSID plugin, or HTTP proxies) for
>>>resolving LSIDs, particularly when most (all?) of these tools seem to
>>>be built on top of tools (such as Firefox) which can already do URL
>>>resolution without downloading anything.
>>>It would seem to me that the best way to get a reliable set of
>>>canonical URIs is to get NCBI involved. As soon as NCBI published
>>>a set
>>>of canonical URIs (e.g. for genes in Entrez Gene, compounds in
>>>etc.) then everyone could use them with confidence. Reasons:
>>>1) NCBI identifiers (even more so than EBI) are the de facto standard
>>>and can be mapped to anything.
>>>2) NCBI is well funded, has serious bandwidth, etc.
>>>3) NCBI can be trusted to stick around for a long time and to
>>>maintain/redirect old URLs, unlike a research lab or most companies.
>>>4) In terms of registering new URIs, NCBI is already a standard
>>>location for data submissions (w/ NCBI GEO, GAIN, etc.).
>>>5) People already use NCBI to get other kinds of data, so getting RDF
>>>data from them is not a serious paradigm shift.
>>>Perhaps there's someone from NCBI on the list; if not, it would be
>>>worthwhile to contact them. If NCBI adopted the standard that
>>> is using, with different suffixes for different
>>>formats (as per Eric Jain's email):
>>>....then I think people would adopt it immediately, especially if
>>>kept it on their front page for a month (like they do with other new
>>>services). Regarding the way UniProt is doing things, I think it
>>>was a
>>>particularly good design decision to have the de-facto suffix be
>>>so that you can get a sense of what the URI represents by looking
>>>at it
>>>in a browser.
>>>Also, from Matthias' recent email:
>>>>You should not try to pack ANY information about the 'resolution' of
>>>a Semantic Web resource into its URI, quite to the contrary. Make
>>>it as
>>>meaningless and generic as possible, in the best case it should
>>>just be
>>>a large random alphanumeric string, e.g. tag:uri:a938fjhsdcHSDu39. If
>>>all URIs look like this, nobody will be detered from re-using a URI
>>>just because of how it looks.
>>>I don't know if this is such a good idea -- when debugging, you
>>>want to
>>>have some information about what the URIs represent (e.g. the
>>>"" prefix tells you that you're
>>>at a UniProt protein with the given ID number). If URIs are just
>>>alphanumeric strings, you need to constantly be doing lookups to
>>>yourself of what a particular object means.
>>>Balaji S. Srinivasan, Ph.D.
>>>Stanford University
>>>Lecturer, Depts. of Statistics and Computer Science
>>>318 Campus Drive, Clark Center S251
>>>(650) 380-0695
>>>On Jul 14, 2007, at 10:30 PM, Mark Wilkinson wrote:
>>>>Well... I apologize in advance, but I'm going to be
>>>>*insultingly*  blunt because I'm quite honestly losing interest
>>>>in this seemingly  pre-destined discussion...
>>>>"blinkers, are a piece of equipment used on a horse's face that
>>>>restrict the horse's vision. They usually compose of leather or
>>>>plastic cups that are places on either side of the eye, so that
>>>>the  horse can not see to his sides. Many racehorse trainers
>>>>believe  this keeps the horse focused on what is in front of
>>>>him,  encouraging him to pay attention to the race rather than
>>>>other  distractions, such as crowds" ( 
>>>>WSDL is a widely accepted W3C spec that is becoming increasingly
>>>>accepted worldwide (and is, generally, automatically generated
>>>>based on your interface, so requires little or no manual
>>>>construction), and which solves a problem that we *know without
>>>>any  doubt* URLs cannot solve.  I really don't see an advantage
>>>>in  trying to ignore them, circumvent them, or otherwise relegate
>>>>them  to a secondary lookup, in the base spec for the Semantic
>>>>Web, when  we know that we are going to have to deal with them at
>>>>some point  (and in fact, they are currently MORE POPULAR than
>>>>RDF itself,  according to Google Trends: 
>>>>I really don't see the point in trying to build the Semantic Web
>>>>by  specifically avoiding acknowledgement of one of the most
>>>>popular  trends on the Web, when we already know that the vast
>>>>majority of  information we need to access as bioinformaticians
>>>>is available  through web forms or web services!
>>>>I'm sorry for being rude and disrespectful - I'm honestly quite
>>>>embarrassed to be saying these things so harshly -  but I think
>>>>this discussion has started to become a singularity around 
>>>>a  pre- contrived end-point, rather than a discussion of what the Web
>>>>(and the Semantic Web) really is/can be!
>>>>WSDL -1 if you wish, but that puts you in opposition to the
>>>>majority of the world, where WSDL (thanks to Ajax) is finally
>>>>starting to make it's mark!
>>>>Again, I apologize for being disrespectful and rude... it really
>>>>isn't personal and I feel truly awful about writing this so
>>>>harshly!  I'm just losing patience with a discussion that
>>>>doesn't  seem to be a discussion, but rather a shoe-horn into a
>>>>pre-destined  end point.
>>>>You are all free to crucify me the next time one of my grants
>>>>comes  to you for review ;-)
>>>>On Fri, 13 Jul 2007 20:19:41 -0700, Alan Ruttenberg
>>>><> wrote:
>>>>>On Jul 13, 2007, at 12:20 AM, Mark Wilkinson wrote:
>>>>>>>>What worries me about the 303 solution (other than that we
>>>>>>>>are  not using it for
>>>>>>>>it's primary purpose [1]) is that the redirection can only
>>>>>>>>be  to a *single* resource, specified in the Location header.
>>>>>>>On Thu, 12 Jul 2007 03:57:34 -0700, Jonathan Rees
>>>>>>><> wrote:
>>>>>>>If this is an important functionality then it can be provided
>>>>>>>in a
>>>>>>>variety of ways - a mere matter of programming. LSID resolver
>>>>>>>to be the only way that comes ready made. But the functionality
>>>>>>>doesn't need to be tied to the use of LSIDs.
>>>>>>If there is an alternative solution that provides the same
>>>>>>functionality, and that can be applied universally to all
>>>>>>existing URIs (URLs), then I'm all for it!  To be honest, this
>>>>>>is  my *primary* objection to moving to a URL solution vs an
>>>>>>LSID  solution... if you can solve that problem, then I am
>>>>>>*almost* in  the URL camp.
>>>>>Here is an alternative:
>>>>>Problem statement:
>>>>>Enable third parties to register the fact that they have
>>>>>additional statements to provide about something that a URI
>>>>>denotes, in such a way as to make it easy for anyone to
>>>>>discover  this fact. Do this in a way which requires minimal
>>>>>coordination  (ideally none) between the minter of the original
>>>>>URI, the  provider of the additional statements, and the
>>>>>consumer of all the  statements.
>>>>>For a given URI http://a.b/c/d/e, construct a new URI   http:// 
>>>>>Configure the purl server so 
>>>>>that a.b/c/d/e redirects to 
>>>>>something  akin to a structured wiki page
>>>>>or a REST service (let us assume  for the moment that whoever
>>>>>currently provides the LSID WSDL that  contains this information
>>>>>currently is the provider of this  service).
>>>>>This page may be edited (manually or programmatically) to
>>>>>include  a description (suitable for a machine to understand) of
>>>>>how to  access the resource and what sort of resource it is, and
>>>>>perhaps  some additional useful information (what predicates
>>>>>does the  resource provide). This information rendered as RDF
>>>>>using a  standard vocabulary and saved.
>>>>>Configure the purl server so that 
>>>>>e  retrieves the RDF that was constructed (or a 404 if there is
>>>>>none). Semantic web agents then interpret this RDF and go fetch
>>>>>what they want or need.
>>>>>We all agree that 303s redirect to a human readable html
>>>>>document,  that this document uses a REL link to an RDF document
>>>>>that says  what the provider wishes to say and that the RDF also
>>>>>states that may have more
>>>>>information.  (suitable shortcuts are provided to make bulk
>>>>>retrievals more  efficient - we've already discussed such
>>>>>This can be done now, with effort analogous to what is being
>>>>>done  with LSIDS. Let me point out some obvious advantages: 1)
>>>>>No  requirement to use web services (though web services *could*
>>>>>be  described as ways of accessing further statements using
>>>>>this  scheme) 2) Requires *less* manual intervention than is
>>>>>currently  required to maintain the WSDL. 3) Re-uses purl, which
>>>>>is based on  HTTP, which everyone knows how to use already 4)
>>>>>Makes clear that  the description of these additional resources
>>>>>for statements are  to be in RDF, and requires that one
>>>>>advertises what to expect if  you go to the resource (will you
>>>>>get an RDF document, a SPARQL  endpoint, a Web service set of
>>>>>With a bit more effort expended on extending the purl server
>>>>>code  we can get some more leverage - we enhance it so that
>>>>>retrieving actually merges the
>>>>>RDF result of  retrieving each of*/a.b/
>>>>>Where the about* top level domain indicates that the
>>>>>information  about covers all URIs that start with the indicated
>>>>>In this way different providers can note that they have
>>>>>additional  statements about URIs located in varying amounts of
>>>>>With some coordination among us, we could even decide to
>>>>>dedicate  a server to hosting the whole mess of this information
>>>>>(I don't  expect that it needs too large a resource) so as to
>>>>>make the  service more efficient in answering queried, and
>>>>>making it easy to  provide, to whoever wishes, a snapshot that
>>>>>they can host  themselves.
>>>>>May I now count you among those *almost* in the URL camp? ;-)

Received on Monday, 16 July 2007 16:36:46 UTC