Re: Endpoint connected to Web update then query from Arthur Keen on 2012-08-24 (public-rdf-dawg@w3.org from July to September 2012)

From: Arthur Keen <AKeen@algebraixdata.com>
Date: Fri, 24 Aug 2012 17:33:12 +0000
To: Steve Harris <steve.harris@garlik.com>
CC: W3C SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <DA7AAFD4-0006-4310-BCD6-1A681C559175@algebraixdata.com>

Steve,

Thanks for explaining the nuances of the issue of interacting with data that is beyond the span of control of your own data system. We are focused on implementing SPARQL query at the moment and this is going to be very helpful when we implement update.

On Aug 24, 2012, at 7:58 AM, Steve Harris <steve.harris@garlik.com<mailto:steve.harris@garlik.com>>
wrote:

On 23 Aug 2012, at 23:39, Arthur Keen wrote:

I promised feedback to the WG on the issue that Andy brought up, where an endpoint is serving up RDF from the web and receives an update followed by a query. Current implementations have three different ways of handling this:

1) Allow the update and execute the query on the updated data,
2) Allow the update and execute the query on the original data from the web,
3) Refuse the update since the data is from the web and therefore read only.

We have not started implementing update yet and here is my take on this:

If an endpoint is configured to retrieve data from the web, then it should be treated as read-only and the endpoint should not allow local updates to that data. The user should make their own local copy of the data and update that local copy. Unless I am missing some important use cases, for example the endpoint owns the web data and is staging updates for the web.

One use case is that the data "on the web"* has changed, and the local copy needs updating.

Good point, it is not read only, it is under the control of another system.

Rather than just replacing the whole thing you might choose to apply a patch to minimise the writes… just for example :)

You are right: the server will have a local copy of the external data and need to keep it in synch. We do (in our SQL implementation) nondestructive recording inside our data algebra implementation, so that the new sets that are created by update are expressed (by intension using a set expression) in terms of the original set and only materialized (extension) if needed by a query, so this minimizes the write to disk. We will use the same approach when we implement SPARQL Update later in the year. It is going to be interesting to see how well this works for SPARQL Update.

If however, the SPARQL WG decides to allows local updates to that read only web data data and the administrator of the endpoint configures this behavior, then our endpoint will process the update query (nondestructively) and the subsequent query will be executed on the newest version of the data.

If the standard allows these 3 different behaviors, then a user should be able to discover which behavior is being followed.

Ah, it's never that simple… one approach would be that some users, or some endpoints, or some combination of both are allowed to update, but others are not.

This really is a can of worms! :)

Regards,
Steve

* noting that "on the web" is really a very vague term, as you're beholden to caches, bugs, outages, CDNs, client detection, round robin DNS, and all manner of craziness that makes it hard to pin down.

--
Steve Harris, CTO
Garlik, a part of Experian
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203 http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, NG2 Business Park, Nottingham, Nottinghamshire, England NG80 1ZZ

Received on Friday, 24 August 2012 17:35:04 UTC