Re: 303 +1, WSDL -1 from dlrubin@stanford.edu on 2007-07-16 (public-semweb-lifesci@w3.org from July 2007)

From: <dlrubin@stanford.edu>
Date: Mon, 16 Jul 2007 08:54:42 -0700
To: "Balaji S. Srinivasan" <balajis@stanford.edu>
Cc: public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-ID: <20070716085442.1dex4k0g2olc084c@webmail.stanford.edu>
Just to remind everyone--NCBO is planning on providing URIs for  
entities in the breadth of biomedical ontologies it hosts at  
http://bioportal.bioontology.org.
This group has previously gave us a good set of functional  
requirements, and over the next few months we will be implementing this.

Daniel

___

Daniel Rubin, MD, MS
Clinical Asst. Professor, Radiology
Research Scientist, Stanford Medical Informatics
Scientific Director, National Center of Biomedical Ontology
MSOB X-215
Stanford, CA 94305
650-725-5693


Quoting "Balaji S. Srinivasan" <balajis@stanford.edu>:

>
> Hi,
>
>> WSDL is a widely accepted W3C spec that is becoming increasingly
> accepted worldwide (and is, generally, automatically generated based on
> your interface, so requires little or no manual construction), and
> which solves a problem that we *know without any doubt* URLs cannot
> solve.
>
> I may be mistaken, but isn't WSDL just an XML format? I don't see how
> it solves a problem that URLs "cannot solve"...wouldn't the location of
> "foo.wsdl" be best specified as a URL?
>
>> in fact, they [WSDL] are currently MORE POPULAR than RDF itself,
> according to Google Trends
>
> But the appropriate comparison is to URLs, not RDF...and the advantage
> of a URL is that there's tons of widely deployed, lightweight
> technology for requesting data from a given URL (e.g. w/ a browser as
> well as Perl/Python/etc. libraries) and for setting up web servers
> (e.g. Apache).
>
> I don't understand why it should be necessary to develop a parallel set
> of technologies (e.g. the Firefox LSID plugin, or HTTP proxies) for
> resolving LSIDs, particularly when most (all?) of these tools seem to
> be built on top of tools (such as Firefox) which can already do URL
> resolution without downloading anything.
>
> It would seem to me that the best way to get a reliable set of
> canonical URIs is to get NCBI involved. As soon as NCBI published a set
> of canonical URIs (e.g. for genes in Entrez Gene, compounds in Pubchem,
> etc.) then everyone could use them with confidence. Reasons:
>
> 1) NCBI identifiers (even more so than EBI) are the de facto standard
> and can be mapped to anything.
> 2) NCBI is well funded, has serious bandwidth, etc.
> 3) NCBI can be trusted to stick around for a long time and to
> maintain/redirect old URLs, unlike a research lab or most companies.
> 4) In terms of registering new URIs, NCBI is already a standard
> location for data submissions (w/ NCBI GEO, GAIN, etc.).
> 5) People already use NCBI to get other kinds of data, so getting RDF
> data from them is not a serious paradigm shift.
>
> Perhaps there's someone from NCBI on the list; if not, it would be
> worthwhile to contact them. If NCBI adopted the standard that
> beta.uniprot.org is using, with different suffixes for different
> formats (as per Eric Jain's email):
>
>> http://beta.uniprot.org/uniprot/P12345
>> http://beta.uniprot.org/uniprot/P12345.xml
>> http://beta.uniprot.org/uniprot/P12345.rdf
>> http://beta.uniprot.org/uniprot/P12345.fasta
>
> ....then I think people would adopt it immediately, especially if they
> kept it on their front page for a month (like they do with other new
> services). Regarding the way UniProt is doing things, I think it was a
> particularly good design decision to have the de-facto suffix be HTML,
> so that you can get a sense of what the URI represents by looking at it
> in a browser.
>
> Also, from Matthias' recent email:
>
>> You should not try to pack ANY information about the 'resolution' of
> a Semantic Web resource into its URI, quite to the contrary. Make it as
> meaningless and generic as possible, in the best case it should just be
> a large random alphanumeric string, e.g. tag:uri:a938fjhsdcHSDu39. If
> all URIs look like this, nobody will be detered from re-using a URI
> just because of how it looks.
>
> I don't know if this is such a good idea -- when debugging, you want to
> have some information about what the URIs represent (e.g. the
> "http://beta.uniprot.org/uniprot/" prefix tells you that you're looking
> at a UniProt protein with the given ID number). If URIs are just
> alphanumeric strings, you need to constantly be doing lookups to remind
> yourself of what a particular object means.
>
> --B
>
> --
> Balaji S. Srinivasan, Ph.D.
> Stanford University
> Lecturer, Depts. of Statistics and Computer Science
> 318 Campus Drive, Clark Center S251
> (650) 380-0695
> balajis@stanford.edu
> http://jinome.stanford.edu
>
>
> On Jul 14, 2007, at 10:30 PM, Mark Wilkinson wrote:
>
>>
>> Well... I apologize in advance, but I'm going to be *insultingly*   
>> blunt because I'm quite honestly losing interest in this seemingly   
>> pre-destined discussion...
>>
>> "blinkers, are a piece of equipment used on a horse's face that   
>> restrict the horse's vision. They usually compose of leather or   
>> plastic cups that are places on either side of the eye, so that the  
>>  horse can not see to his sides. Many racehorse trainers believe   
>> this keeps the horse focused on what is in front of him,   
>> encouraging him to pay attention to the race rather than other   
>> distractions, such as crowds" (http://en.wikipedia.org/wiki/Blinders)
>>
>> WSDL is a widely accepted W3C spec that is becoming increasingly   
>> accepted worldwide (and is, generally, automatically generated   
>> based on your interface, so requires little or no manual   
>> construction), and which solves a problem that we *know without any  
>>  doubt* URLs cannot solve.  I really don't see an advantage in   
>> trying to ignore them, circumvent them, or otherwise relegate them   
>> to a secondary lookup, in the base spec for the Semantic Web, when   
>> we know that we are going to have to deal with them at some point   
>> (and in fact, they are currently MORE POPULAR than RDF itself,   
>> according to Google Trends:   
>> http://www.google.com/trends?q=WSDL%2C+RDF&ctab=0&geo=all&date=all&sort=0
>>
>> I really don't see the point in trying to build the Semantic Web by  
>>  specifically avoiding acknowledgement of one of the most popular   
>> trends on the Web, when we already know that the vast majority of   
>> information we need to access as bioinformaticians is available   
>> through web forms or web services!
>>
>> I'm sorry for being rude and disrespectful - I'm honestly quite   
>> embarrassed to be saying these things so harshly -  but I think   
>> this discussion has started to become a singularity around a   
>> pre-contrived end-point, rather than a discussion of what the Web   
>> (and the Semantic Web) really is/can be!
>>
>> WSDL -1 if you wish, but that puts you in opposition to the   
>> majority of the world, where WSDL (thanks to Ajax) is finally   
>> starting to make it's mark!
>>
>> Again, I apologize for being disrespectful and rude... it really   
>> isn't personal and I feel truly awful about writing this so   
>> harshly!  I'm just losing patience with a discussion that doesn't   
>> seem to be a discussion, but rather a shoe-horn into a pre-destined  
>>  end point.
>>
>> You are all free to crucify me the next time one of my grants comes  
>>  to you for review ;-)
>>
>> M
>>
>>
>>
>>
>> On Fri, 13 Jul 2007 20:19:41 -0700, Alan Ruttenberg   
>> <alanruttenberg@gmail.com> wrote:
>>
>>>
>>>
>>> On Jul 13, 2007, at 12:20 AM, Mark Wilkinson wrote:
>>>
>>>
>>>>>> What worries me about the 303 solution (other than that we are   
>>>>>> not using it for
>>>>>> it's primary purpose [1]) is that the redirection can only be   
>>>>>> to a *single* resource, specified in the Location header.
>>>>
>>>>> On Thu, 12 Jul 2007 03:57:34 -0700, Jonathan Rees   
>>>>> <jonathan.rees@gmail.com> wrote:
>>>>> If this is an important functionality then it can be provided in a
>>>>> variety of ways - a mere matter of programming. LSID resolver happens
>>>>> to be the only way that comes ready made. But the functionality
>>>>> doesn't need to be tied to the use of LSIDs.
>>>>
>>>> If there is an alternative solution that provides the same   
>>>> functionality, and that can be applied universally to all   
>>>> existing URIs (URLs), then I'm all for it!  To be honest, this is  
>>>>  my *primary* objection to moving to a URL solution vs an LSID   
>>>> solution... if you can solve that problem, then I am *almost* in   
>>>> the URL camp.
>>>
>>> Here is an alternative:
>>>
>>> Problem statement:
>>>
>>> Enable third parties to register the fact that they have   
>>> additional statements to provide about something that a URI   
>>> denotes, in such a way as to make it easy for anyone to discover   
>>> this fact. Do this in a way which requires minimal coordination   
>>> (ideally none) between the minter of the original URI, the   
>>> provider of the additional statements, and the consumer of all the  
>>>  statements.
>>>
>>> Solution:
>>>
>>> For a given URI http://a.b/c/d/e, construct a new URI    
>>> http://purl.org/about/a.b/c/d/e
>>>
>>> Configure the purl server so that   
>>> http://purl.org/provide-about/a.b/c/d/e redirects to something   
>>> akin to a structured wiki page or a REST service (let us assume   
>>> for the moment that whoever currently provides the LSID WSDL that   
>>> contains this information currently is the provider of this   
>>> service).
>>>
>>> This page may be edited (manually or programmatically) to include   
>>> a description (suitable for a machine to understand) of how to   
>>> access the resource and what sort of resource it is, and perhaps   
>>> some additional useful information (what predicates does the   
>>> resource provide). This information rendered as RDF using a   
>>> standard vocabulary and saved.
>>>
>>> Configure the purl server so that http://purl.org/about/a.b/c/d/e   
>>> retrieves the RDF that was constructed (or a 404 if there is   
>>> none). Semantic web agents then interpret this RDF and go fetch   
>>> what they want or need.
>>>
>>> We all agree that 303s redirect to a human readable html document,  
>>>  that this document uses a REL link to an RDF document that says   
>>> what the provider wishes to say and that the RDF also states that   
>>> http://purl.org/about/a.b/c/d/e may have more information.   
>>> (suitable shortcuts are provided to make bulk retrievals more   
>>> efficient - we've already discussed such mechanisms)
>>>
>>> This can be done now, with effort analogous to what is being done   
>>> with LSIDS. Let me point out some obvious advantages: 1) No   
>>> requirement to use web services (though web services *could* be   
>>> described as ways of accessing further statements using this   
>>> scheme) 2) Requires *less* manual intervention than is currently   
>>> required to maintain the WSDL. 3) Re-uses purl, which is based on   
>>> HTTP, which everyone knows how to use already 4) Makes clear that   
>>> the description of these additional resources for statements are   
>>> to be in RDF, and requires that one advertises what to expect if   
>>> you go to the resource (will you get an RDF document, a SPARQL   
>>> endpoint, a Web service set of methods?)
>>>
>>> ---
>>>
>>> With a bit more effort expended on extending the purl server code   
>>> we can get some more leverage - we enhance it so that retrieving   
>>> http://purl.org/about/a.b/c/d/e actually merges the RDF result of   
>>> retrieving each of http://purl.org/about*/a.b/
>>> http://purl.org/about*/a.b/c
>>> http://purl.org/about*/a.b/c/d
>>> http://purl.org/about/a.b/c/d/e
>>>
>>> Where the about* top level domain indicates that the information   
>>> about covers all URIs that start with the indicated path.
>>>
>>> In this way different providers can note that they have additional  
>>>  statements about URIs located in varying amounts of namespace.
>>>
>>> With some coordination among us, we could even decide to dedicate   
>>> a server to hosting the whole mess of this information (I don't   
>>> expect that it needs too large a resource) so as to make the   
>>> service more efficient in answering queried, and making it easy to  
>>>  provide, to whoever wishes, a snapshot that they can host   
>>> themselves.
>>>
>>> ---
>>>
>>> May I now count you among those *almost* in the URL camp? ;-)
>>>
>>> -Alan
>>>
>>>
>>>
>>>
>>
>>
>>
Received on Monday, 16 July 2007 15:54:52 UTC