Re: Reflections on Update

On Fri, May 15, 2009 at 5:01 AM, Seaborne, Andy <andy.seaborne@hp.com> wrote:
<snip/>
> Servers should be allowed to (and encouraged?) to reject "file:".  Technically, much like "FROM <file:...>" but with worse consequences.

I agree, but this is still a useful feature if security is configured
appropriately. We've regularly been in a position of loading multi-GB
files, and this is handled much better from the local file system.
Also, if you have permission to write to graphs on the server then
that will often correspond to permission to write to the host's file
system.

>> So that's file: URLs. What about http: ?
>>
>> If we are using the http protocol in the URL  to be loaded then
>> everything is being transferred by http protocol anyway, so why
>> confuse the issue by including a "command" in the transfer? It also
>> makes it awkward for the client that wants to send a file to the
>> server, but doesn't have an HTTP server on hand to respond to an HTTP
>> request for the file to be loaded. (This particular scenario also
>> requires two connections, when one would suffice)
>
> The number of web hops the data takes is important.  With LOAD <url> the data flows from the URL to the server, and does not flow via the client.

Sorry, I wasn't clear.

We never want the data to move more than once. The server should
receive an http: URL and retrieve that on its own.

My first point was really a matter of style. Since I was already
thinking of protocol-based load operations, I was just suggesting that
maybe a text-based command would be overkill. Looking back at it, it's
a strange objection, and I withdraw the suggestion. I was only playing
devil's advocate anyway. :-)

My second point was the case where a user has a file that they want to
upload. If we only support "load <http://....>" then this means that
the user must have access to write the required file to a web server
somewhere. If that's a separate server, then they have to move the
file there first before it can be loaded, meaning 2 transfers of the
data. Otherwise, the web server must be on the client's host, which
creates the bizarre case of the sparql server connecting back to the
client that jsut issued the request in order to get the file.

Of course, this has the obvious solution of allowing the client to
send the data up with a POST, as was discussed later.

> A use case I have in mind is the ability to collect data from a number of places with an update script of
>
> LOAD <url1>
> LOAD <url2>
> LOAD <url3>
> ...

Certainly, and I support this. The difficulty I'm pointing out is the
case where the data to be loaded is being held by the client (this
forms the majority of our use cases).

Even if these LOAD commands were not available, they could be
simulated with (for the first url):

INSERT { ?s ?p ?o } WHERE { GRAPH <url1> {?s ?p ?o} }

Incidentally, I'd also like to see the LOAD command updated to have an
optional [INTO <uri>] at the end of the command. So the following
would be equivalent:

LOAD <url1> INTO <uri2>

INSERT INTO <uri2> { ?s ?p ?o } WHERE { GRAPH <url1> {?s ?p ?o} }


>> I'd like to see a standard for POSTing a file to a graph on a server,
>> as this can be done easily with code or even a web form. Personally, I
>> also like having a command that does a load (we have one in Mulgara)
>> but the issues that I described make it seem difficult to standardize
>> in a way that will be suitable for any type of configuration.
>>
>> Please feel free to correct me on any of the above points. :-)
>>
>> Regards,
>> Paul Gearon
>
> Good point about the use case for a simple POST-data and POST-from-form which look compelling so we're being to tease out the requirements for update.

Thanks. I implemented this before having a use case, just because it
seemed to make sense. But since we've had it we're finding that it's
become one of the most popular ways to load data (I use it exclusively
now).

Regards,
Paul Gearon

Received on Friday, 15 May 2009 15:54:23 UTC