Re: 303 +1, WSDL -1 from Balaji S. Srinivasan on 2007-07-16 (public-semweb-lifesci@w3.org from July 2007)

From: Balaji S. Srinivasan <balajis@stanford.edu>
Date: Sun, 15 Jul 2007 21:11:36 -0700
To: public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-Id: <7E19E481-DAF7-4CE1-9C19-3ACDB1519929@stanford.edu>
Hi,

 > WSDL is a widely accepted W3C spec that is becoming increasingly  
accepted worldwide (and is, generally, automatically generated based  
on your interface, so requires little or no manual construction), and  
which solves a problem that we *know without any doubt* URLs cannot  
solve.

I may be mistaken, but isn't WSDL just an XML format? I don't see how  
it solves a problem that URLs "cannot solve"...wouldn't the location  
of "foo.wsdl" be best specified as a URL?

 > in fact, they [WSDL] are currently MORE POPULAR than RDF itself,  
according to Google Trends

But the appropriate comparison is to URLs, not RDF...and the  
advantage of a URL is that there's tons of widely deployed,  
lightweight technology for requesting data from a given URL (e.g. w/  
a browser as well as Perl/Python/etc. libraries) and for setting up  
web servers (e.g. Apache).

I don't understand why it should be necessary to develop a parallel  
set of technologies (e.g. the Firefox LSID plugin, or HTTP proxies)  
for resolving LSIDs, particularly when most (all?) of these tools  
seem to be built on top of tools (such as Firefox) which can already  
do URL resolution without downloading anything.

It would seem to me that the best way to get a reliable set of  
canonical URIs is to get NCBI involved. As soon as NCBI published a  
set of canonical URIs (e.g. for genes in Entrez Gene, compounds in  
Pubchem, etc.) then everyone could use them with confidence. Reasons:

1) NCBI identifiers (even more so than EBI) are the de facto standard  
and can be mapped to anything.
2) NCBI is well funded, has serious bandwidth, etc.
3) NCBI can be trusted to stick around for a long time and to  
maintain/redirect old URLs, unlike a research lab or most companies.
4) In terms of registering new URIs, NCBI is already a standard  
location for data submissions (w/ NCBI GEO, GAIN, etc.).
5) People already use NCBI to get other kinds of data, so getting RDF  
data from them is not a serious paradigm shift.

Perhaps there's someone from NCBI on the list; if not, it would be  
worthwhile to contact them. If NCBI adopted the standard that  
beta.uniprot.org is using, with different suffixes for different  
formats (as per Eric Jain's email):

> http://beta.uniprot.org/uniprot/P12345
> http://beta.uniprot.org/uniprot/P12345.xml
> http://beta.uniprot.org/uniprot/P12345.rdf
> http://beta.uniprot.org/uniprot/P12345.fasta

....then I think people would adopt it immediately, especially if they  
kept it on their front page for a month (like they do with other new  
services). Regarding the way UniProt is doing things, I think it was  
a particularly good design decision to have the de-facto suffix be  
HTML, so that you can get a sense of what the URI represents by  
looking at it in a browser.

Also, from Matthias' recent email:

 > You should not try to pack ANY information about the 'resolution'  
of a Semantic Web resource into its URI, quite to the contrary. Make  
it as meaningless and generic as possible, in the best case it should  
just be a large random alphanumeric string, e.g.  
tag:uri:a938fjhsdcHSDu39. If all URIs look like this, nobody will be  
detered from re-using a URI just because of how it looks.

I don't know if this is such a good idea -- when debugging, you want  
to have some information about what the URIs represent (e.g. the  
"http://beta.uniprot.org/uniprot/" prefix tells you that you're  
looking at a UniProt protein with the given ID number). If URIs are  
just alphanumeric strings, you need to constantly be doing lookups to  
remind yourself of what a particular object means.

--B

--
Balaji S. Srinivasan, Ph.D.
Stanford University
Lecturer, Depts. of Statistics and Computer Science
318 Campus Drive, Clark Center S251
(650) 380-0695
balajis@stanford.edu
http://jinome.stanford.edu


On Jul 14, 2007, at 10:30 PM, Mark Wilkinson wrote:

>
> Well... I apologize in advance, but I'm going to be *insultingly*  
> blunt because I'm quite honestly losing interest in this seemingly  
> pre-destined discussion...
>
> "blinkers, are a piece of equipment used on a horse's face that  
> restrict the horse's vision. They usually compose of leather or  
> plastic cups that are places on either side of the eye, so that the  
> horse can not see to his sides. Many racehorse trainers believe  
> this keeps the horse focused on what is in front of him,  
> encouraging him to pay attention to the race rather than other  
> distractions, such as crowds" (http://en.wikipedia.org/wiki/Blinders)
>
> WSDL is a widely accepted W3C spec that is becoming increasingly  
> accepted worldwide (and is, generally, automatically generated  
> based on your interface, so requires little or no manual  
> construction), and which solves a problem that we *know without any  
> doubt* URLs cannot solve.  I really don't see an advantage in  
> trying to ignore them, circumvent them, or otherwise relegate them  
> to a secondary lookup, in the base spec for the Semantic Web, when  
> we know that we are going to have to deal with them at some point  
> (and in fact, they are currently MORE POPULAR than RDF itself,  
> according to Google Trends: http://www.google.com/trends?q=WSDL%2C 
> +RDF&ctab=0&geo=all&date=all&sort=0
>
> I really don't see the point in trying to build the Semantic Web by  
> specifically avoiding acknowledgement of one of the most popular  
> trends on the Web, when we already know that the vast majority of  
> information we need to access as bioinformaticians is available  
> through web forms or web services!
>
> I'm sorry for being rude and disrespectful - I'm honestly quite  
> embarrassed to be saying these things so harshly -  but I think  
> this discussion has started to become a singularity around a pre- 
> contrived end-point, rather than a discussion of what the Web (and  
> the Semantic Web) really is/can be!
>
> WSDL -1 if you wish, but that puts you in opposition to the  
> majority of the world, where WSDL (thanks to Ajax) is finally  
> starting to make it's mark!
>
> Again, I apologize for being disrespectful and rude... it really  
> isn't personal and I feel truly awful about writing this so  
> harshly!  I'm just losing patience with a discussion that doesn't  
> seem to be a discussion, but rather a shoe-horn into a pre-destined  
> end point.
>
> You are all free to crucify me the next time one of my grants comes  
> to you for review ;-)
>
> M
>
>
>
>
> On Fri, 13 Jul 2007 20:19:41 -0700, Alan Ruttenberg  
> <alanruttenberg@gmail.com> wrote:
>
>>
>>
>> On Jul 13, 2007, at 12:20 AM, Mark Wilkinson wrote:
>>
>>
>>>>> What worries me about the 303 solution (other than that we are  
>>>>> not using it for
>>>>> it's primary purpose [1]) is that the redirection can only be  
>>>>> to a *single* resource, specified in the Location header.
>>>
>>>> On Thu, 12 Jul 2007 03:57:34 -0700, Jonathan Rees  
>>>> <jonathan.rees@gmail.com> wrote:
>>>> If this is an important functionality then it can be provided in a
>>>> variety of ways - a mere matter of programming. LSID resolver  
>>>> happens
>>>> to be the only way that comes ready made. But the functionality
>>>> doesn't need to be tied to the use of LSIDs.
>>>
>>> If there is an alternative solution that provides the same  
>>> functionality, and that can be applied universally to all  
>>> existing URIs (URLs), then I'm all for it!  To be honest, this is  
>>> my *primary* objection to moving to a URL solution vs an LSID  
>>> solution... if you can solve that problem, then I am *almost* in  
>>> the URL camp.
>>
>> Here is an alternative:
>>
>> Problem statement:
>>
>> Enable third parties to register the fact that they have  
>> additional statements to provide about something that a URI  
>> denotes, in such a way as to make it easy for anyone to discover  
>> this fact. Do this in a way which requires minimal coordination  
>> (ideally none) between the minter of the original URI, the  
>> provider of the additional statements, and the consumer of all the  
>> statements.
>>
>> Solution:
>>
>> For a given URI http://a.b/c/d/e, construct a new URI  http:// 
>> purl.org/about/a.b/c/d/e
>>
>> Configure the purl server so that http://purl.org/provide-about/ 
>> a.b/c/d/e redirects to something akin to a structured wiki page or  
>> a REST service (let us assume for the moment that whoever  
>> currently provides the LSID WSDL that contains this information  
>> currently is the provider of this service).
>>
>> This page may be edited (manually or programmatically) to include  
>> a description (suitable for a machine to understand) of how to  
>> access the resource and what sort of resource it is, and perhaps  
>> some additional useful information (what predicates does the  
>> resource provide). This information rendered as RDF using a  
>> standard vocabulary and saved.
>>
>> Configure the purl server so that http://purl.org/about/a.b/c/d/e  
>> retrieves the RDF that was constructed (or a 404 if there is  
>> none). Semantic web agents then interpret this RDF and go fetch  
>> what they want or need.
>>
>> We all agree that 303s redirect to a human readable html document,  
>> that this document uses a REL link to an RDF document that says  
>> what the provider wishes to say and that the RDF also states that  
>> http://purl.org/about/a.b/c/d/e may have more information.  
>> (suitable shortcuts are provided to make bulk retrievals more  
>> efficient - we've already discussed such mechanisms)
>>
>> This can be done now, with effort analogous to what is being done  
>> with LSIDS. Let me point out some obvious advantages: 1) No  
>> requirement to use web services (though web services *could* be  
>> described as ways of accessing further statements using this  
>> scheme) 2) Requires *less* manual intervention than is currently  
>> required to maintain the WSDL. 3) Re-uses purl, which is based on  
>> HTTP, which everyone knows how to use already 4) Makes clear that  
>> the description of these additional resources for statements are  
>> to be in RDF, and requires that one advertises what to expect if  
>> you go to the resource (will you get an RDF document, a SPARQL  
>> endpoint, a Web service set of methods?)
>>
>> ---
>>
>> With a bit more effort expended on extending the purl server code  
>> we can get some more leverage - we enhance it so that retrieving  
>> http://purl.org/about/a.b/c/d/e actually merges the RDF result of  
>> retrieving each of http://purl.org/about*/a.b/
>> http://purl.org/about*/a.b/c
>> http://purl.org/about*/a.b/c/d
>> http://purl.org/about/a.b/c/d/e
>>
>> Where the about* top level domain indicates that the information  
>> about covers all URIs that start with the indicated path.
>>
>> In this way different providers can note that they have additional  
>> statements about URIs located in varying amounts of namespace.
>>
>> With some coordination among us, we could even decide to dedicate  
>> a server to hosting the whole mess of this information (I don't  
>> expect that it needs too large a resource) so as to make the  
>> service more efficient in answering queried, and making it easy to  
>> provide, to whoever wishes, a snapshot that they can host themselves.
>>
>> ---
>>
>> May I now count you among those *almost* in the URL camp? ;-)
>>
>> -Alan
>>
>>
>>
>>
>
>
>
Received on Monday, 16 July 2007 14:04:55 UTC