Data loading cases from Kjetil Kjernsmo on 2009-05-25 (public-rdf-dawg@w3.org from April to June 2009)

From: Kjetil Kjernsmo <Kjetil.Kjernsmo@computas.com>
Date: Mon, 25 May 2009 13:18:32 +0200
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <200905251318.32816.Kjetil.Kjernsmo@computas.com>

All,

I just sat down and had a bit of thinking about how we load data into the quad 
store, and figured I would articulate four cases of data loading. In these 
cases, no WHERE clause is required, it is just plain data loading. It is 
quite clear that whenever a WHERE-clause is required, we are need the 
SPARQL/Update language, but in the following cases, it may be relevant to 
discuss whether they can and should be supported on the protocol level.

The cases are quite straightforward:

1) New data are added to the graph and the graph URI is dereferenceable and 
the user is authorized to use the service.
2) New data replaces the data contained in the graph, and the graph URI is 
dereferencable and the user is authorized to use the service.
3) New data are added to the graph, and the graph URI is not dereferanceable.
4) New data replaces the data contained in the graph and the graph URI is not 
dereferanceable. 

Obviously, if the user is not authorized to use a service, it will receive a 
401 (or 403) response. 

The two first cases can be supported in the protocol by POST and PUT 
respectively, which is very pretty and RESTful. 

In our current system, the graph URIs were never designed to be 
dereferenceable, there is no server there now that could accept a PUT or 
POST, thus motivating cases 3 and 4. One could of course argue this is simply 
bad practise, but with current SPARQL I can't see any reason why graph names 
should be dereferenceable. Also, we found it kinda nice to not have to set 
base all the time to the box we were developing on at the time. If important 
update methods relied on graph names being dereferenceable, it would be close 
to a requirement for an application to have that, I think.

Currently, all my use cases could be satisfied with a requirement that graph 
names should be dereferenceable, but perhaps there are cases where that 
shouldn't be a requirement? If we decide it is just bad practise to not have 
dereferenceable graph names, I think we should articulate why.

I would like to see all four cases supported in some form. It is also 
important that the implementation can be done in such a way that any of the 
methods can be used to insert large datasets.

It seems that all four cases can be supported by essentially the same feature 
in the SPARQL/Update language: INSERT DATA INTO doesn't care if the graph 
name is dereferenceable, and if one wants to replace the data in the graph, 
one can do a CLEAR GRAPH before the INSERT. 

I would certainly like to see 1 and 2 supported by a simple POST and PUT, but 
I don't know if 3 and 4 should be supported on the protocol level if it is 
supported by the language?

Kind regards 

Kjetil Kjernsmo
-- 
Senior Knowledge Engineer
Mobile: +47 986 48 234
Email: kjetil.kjernsmo@computas.com   
Web: http://www.computas.com/

|  SHARE YOUR KNOWLEDGE  |

Computas AS  PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783 
1001

Received on Monday, 25 May 2009 11:20:33 UTC