Re: Issue 10 -- Hash vs. Slash from Ivan Herman on 2011-02-23 (public-rdb2rdf-wg@w3.org from February 2011)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 23 Feb 2011 11:53:12 +0100
To: Juan Sequeda <juanfederico@gmail.com>
Cc: ashok.malhotra@oracle.com, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <09152EA6-0220-4D33-8DAD-DE7AECC6B870@w3.org>
On Feb 23, 2011, at 07:10 , Juan Sequeda wrote:

> I believe the issue is the following:
> 
> 1) hash - http://foo.example/DB/People/ID=7#_
> 
> vs
> 
> 2) slash - http://foo.example/DB/People/ID=7
> 
> For option 1) you would actually have to retrieve the whole graph while for option 2) you would do a 303 ( a la linked data) and just get the particular triples needed.
> 
> Eric, is this right?

I am not sure that is so clear cut. 

Juan, I know that much of what I write below is known to you, so it is not meant as a direct answer to you. However I thought writing down the issue more in details makes sense for others, mainly for those of us who are more on the 'database' side rather than the Linked Data/RDF side. Sorry if it is a bit longish (and I am sure that Richard, who is much more picky on these things than I am, will correct me if needed:-)

We are talking about the URI identifying the subject for a row. For the sake of discussion, let us say this URI is <lala>. Note that <lala> does not represent a 'real' thing _in_ the database, it rather refers to a kind of an abstract, conceptual thing.

The question for a linked data person is: what do I expect to receive when I do an HTTP GET request on <lala>? Note that we have _not_ yet defined that, but I think a fairly safe answer is to say that somehow the triples that have <lala> as subject, ie, the triples for the row, are returned in some way. And it also depends on the type of information I expect; ie, what are the preferred media types I add to my HTTP GET request. If I expect HTML, then I may get an HTML page with a one row table with headers and the values. If I expect RDF in some encoding format, I may get all the triples that are generated for that row and which have <lala> as subject. This is the magic of content negotiations performed on the server side. In both cases, we have to realize that the returned information is NOT <lala>; it is a _representation_ thereof. Ie, the information that is returned should have a different URI, say, <lala-r>. The question of course is what is then <lala-r> if I know <lala>. (One can go a step further and have a <lala-r-html> and <lala-r-rdf>, a bit like the dbpedia URI-s are used, but we may not want to go that far.)

Let us say <lala> is </People/ID=7#_>. Per HTTP spec, what goes to the server in terms of a GET URI is _not_ </People/ID=7#_>; it is </People/ID=7>. This is what the fragid of URI+HTTP tells us to do. (There is no need for any client side trick; if I copy paste that URI into my browser's address bar, this is what should happen for a well behaving browser). In other terms we can safely assume/define <lala-r> = </People/ID=7>, this is what is sent to the client, the content negotiation occurs, I will get back information under the URI </People/ID=7> which I can consider to be the representation of the (abstract) URI </People/ID=7#>. I am done. (There is of course a trick here: indeed, there is nothing _meaningful_ after the hash, it is just a trick to automatically differentiate between <lala> and <lala-r>. In more general cases that might be a load because the information being returned may not have the right granularity, and this is what we referred to on the call that 'the whole graph has to be downloaded'. But this is not relevant here.)

Let us say <lala> is </People/ID=7>. In this case the client has no other choice then to send </People/ID=7>, ie, <lala> to the server. So the client has to have its own setup on what <lala-r> is. This URI has to be communicated back to the client which, in a second HTTP round, will ask for <lala-r>. This is the dreaded 303 response mechanism: the server sends back a message saying: "<lala> is not something I can return, but you may want to look at <lala-r> which is a good representation of <lala>", so the client will then issue a second GET on <lala-r>. 

Ie, there is no issue Javascript here. How and where SQL comes into the picture is also immaterial in this sense. I believe that, at the end of the day, we can safely assume that the graph being returned is the same in both cases. But setting up the slash case seems to be more demanding on the server side and increases the necessary HTTP requests. So, personally, I think that that the hash version has a lower load.

Actually:

- It is a mini-minor issue, but I am not even sure that the '_' character is necessary at the end of the URI. AFAIK, everything works equally well if we use </People/ID=7#>. But I may miss something...
- I wonder whether we should describe somewhere (maybe not as a required feature but an advised one) that the client, when getting a GET request on <lala>, is expected to return a graph containing all triples in a row in the requested media type. 


> 
> Ashok, do you mean that it doesn't really matter? Are you saying that when that URI is dereferenced, let it be hash or slash, that it would always get translated into a SQL query and just get the triples that are needed?
> 
> The issue that I see with this URI is the following... consider the prefix
> 
> PREFIX ex: <http://foo.example/DB/People/ID=>
> 
> for slash you would have
> 
> ex:7
> 
> for hash it would be
> 
> ex:7#_
> 
> For ex:7, that works, right? But ex:7#_ is not allowed.

I find this argument compelling indeed. And maybe this is something that we will have to live with.

Sorry for the long email...

Ivan


> If this is true, I would rather have the slash uri.
> 
> Thoughts?
> 
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com
> 
> 
> On Tue, Feb 22, 2011 at 6:05 PM, ashok malhotra <ashok.malhotra@oracle.com> wrote:
> I'm trying to understand the issue.
> 
> Are we discussing how to identify the RDF node that corresponds to a row in a table
> with a primary key?  If so, we should create a URI that, when dereferenced, performs a
> SQL query and gets the row.  A bit of JavaScript on the client or server can turn the
> row into an RDF node with properties.   Not sure why we need the 303
> 
> Is this the right question?
> 
> -- 
> All the best, Ashok
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Wednesday, 23 February 2011 10:52:03 UTC