Re: Data loading cases

Kjetil Kjernsmo wrote:
> All,
>
> I just sat down and had a bit of thinking about how we load data into 
> the quad store, and figured I would articulate four cases of data 
> loading. In these cases, no WHERE clause is required, it is just plain 
> data loading. It is quite clear that whenever a WHERE-clause is 
> required, we are need the SPARQL/Update language, but in the following 
> cases, it may be relevant to discuss whether they can and should be 
> supported on the protocol level.
>
> The cases are quite straightforward:
>
> 1) New data are added to the graph and the graph URI is 
> dereferenceable and the user is authorized to use the service.
> 2) New data replaces the data contained in the graph, and the graph 
> URI is dereferencable and the user is authorized to use the service.
> 3) New data are added to the graph, and the graph URI is not 
> dereferanceable.
> 4) New data replaces the data contained in the graph and the graph URI 
> is not dereferanceable.   
We have similar use cases for QDOS and foaf.qdos.com.

In the case of foaf.qdos.com, we PUT (I don't think we support POST) 
data to an endpoint which is on a different domain to the graph URI.  We 
do this by passing the graph URI to the endpoint in the following (I 
suppose rather ugly) way:

http://example.com/sparql/http://plugin.org.uk/swh.xrdf

Where example.com is the endpoint and plugin.org.uk/swh.xrdf is the 
graph URI.  It would be nice if there was an HTTP friendly way of doing 
this.
> Obviously, if the user is not authorized to use a service, it will 
> receive a 401 (or 403) response.
> The two first cases can be supported in the protocol by POST and PUT 
> respectively, which is very pretty and RESTful.
> In our current system, the graph URIs were never designed to be 
> dereferenceable, there is no server there now that could accept a PUT 
> or POST, thus motivating cases 3 and 4. One could of course argue this 
> is simply bad practise, but with current SPARQL I can't see any reason 
> why graph names should be dereferenceable. Also, we found it kinda 
> nice to not have to set base all the time to the box we were 
> developing on at the time. If important update methods relied on graph 
> names being dereferenceable, it would be close to a requirement for an 
> application to have that, I think.
>   Currently, all my use cases could be satisfied with a requirement 
> that graph names should be dereferenceable, but perhaps there are 
> cases where that shouldn't be a requirement? If we decide it is just 
> bad practise to not have dereferenceable graph names, I think we 
> should articulate why. 
foaf.qdos.com contains harvested FOAF files whose graph names are on all 
sorts of domains which aren't usually SPARQL endpoints with 
dereferencable URIs.

Even if they were, we want to PUT to update the copy of a given graph in 
our store, not the store that we harvested it from.  In either case, we 
need some way of PUTing - or POSTing, using an as yet undefined protocol 
- to an endpoint on a different domain to the graph URI.
> I would like to see all four cases supported in some form. It is also 
> important that the implementation can be done in such a way that any 
> of the methods can be used to insert large datasets.
>
> It seems that all four cases can be supported by essentially the same 
> feature in the SPARQL/Update language: INSERT DATA INTO doesn't care 
> if the graph name is dereferenceable, and if one wants to replace the 
> data in the graph, one can do a CLEAR GRAPH before the INSERT.
> I would certainly like to see 1 and 2 supported by a simple POST and 
> PUT, but I don't know if 3 and 4 should be supported on the protocol 
> level if it is supported by the language?
>
> Kind regards
> Kjetil Kjernsmo
>   

Received on Monday, 1 June 2009 16:41:26 UTC