RE: Default and named datasets in federated queries

Hello Carlos

Thank you for your reply, and sorry for the delayed response.

Even though the GRAPH clause can be used to select a single graph among the set of available graphs to the remote endpoint (and as an extension a serial sequence of GRAPH clauses can select a serial set of data sources), it does not allow the same flexibility as if you would allow multiple default and named datasets to be given in the SERVICE clause. It does not allow for GRAPH ?x queries on the remote end point either.

We use SPARQL 1.0 to bridge data from different types of data sets (object database, event logs, real-time sensor data, xml files, etc.) on a device (PLC or computer system). Since we build grids of such devices in large networks, our hopes are to use SPARQL 1.1 as a means to be able to query using a single query the entire grid, including joins between data from different devices in the grid. To do this efficiently, we need to be able to specify datasets in federated queries. So, in our implementation we allow the FROM and FROM NAMED clauses after the SERVICE <endpoint> statement to optionally specify this. However, it would be good if the WG could standardize the way datasets are specified in federated queries, if such a specification is desired by the caller.

Sincerely,
Peter Waher



From: Carlos Buil Aranda [mailto:cbuil@fi.upm.es]
Sent: den 11 juli 2012 15:40
To: Peter Waher
Cc: public-rdf-dawg-comments@w3.org
Subject: Re: Default and named datasets in federated queries


2012/6/3 Peter Waher <Peter.Waher@clayster.com<mailto:Peter.Waher@clayster.com>>
Hello

When reading the grammar rules for federated queries [60], I cannot see a way to specify default and/or named datasets in the federated query:
http://www.w3.org/TR/sparql11-query/#sparqlGrammar

Cannot see it here either:
http://www.w3.org/TR/sparql11-federated-query/

Nor in this document (where the rule is now [59]):
http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#sparqlGrammar

As it is today, they only reasonable interpretation I can see is to make the federated query against any default datasets defined by the remote server. Is this correctly understood?

The other possibility, which I see as not particularly suitable, is to send the default and named datasets defined by the main query. This could however cause a lot of problems. Say the main query uses a series of datasets which it fetches (or holds internally), and are only used by the main query. Sending these to the remote server, could cause that server to load the datasets from the first, since it might not know if they're useful or not.

Is this issue discussed/solved already? I.e. Is there a coming solution where you can specify default and named datasets in federated queries?

Sincerely,
Peter Waher


Dear Peter,

The point about SERVICE is to direct a specific part of the SPARQL query to a remote dataset because that dataset most probably will contain the data that the user needs. The dataset description is used to select graphs from a local pool, not from a remote one. In this sense, the use of GRAPH maybe enough to allow the sender to select the graphs of interest. Below you can find an example of this GRAPH ussage.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
WHERE {
 SERVICE <http://example.org/sparql> {
   GRAPH <http://example.org/graph>
    { ?person foaf:name ?name ; foaf:mbox ?mbox }
 }
}
Using GRAPH, the only case missing is a merge of graphs to be the default graph. If an application wants to access a general purpose SPARQL processor, then it uses the SPARQL protocol - not as part of a query because, after all, SERVICE happens inside a local query execution.


I'd appreciate if you could briefly confirm that this addresses your comment,

best regards,

Carlos

Received on Monday, 16 July 2012 09:16:08 UTC