- From: Arjohn Kampman <arjohn.kampman@aduna.biz>
- Date: Wed, 30 Mar 2005 18:10:21 +0200
- To: public-rdf-dawg-comments@w3.org
- Cc: Jeen Broekstra <jeen@aduna.biz>, "Seaborne, Andy" <andy.seaborne@hp.com>
(follow-up from an off-list discussion between Andy, Jeen and myself) The current SPARQL document defines the concept of "RDF Datasets". An RDF Datasets is defined as a union of RDF graphs: a single, unnamed background graph and zero or more named graphs. The SPARQL document explicitly mentions that the relationship between named and background graphs is not defined and actually mentions two useful arrangements (section 7.1 in the 2005/03/24 version of the editors draft): 1. place provenance information about the named graphs in the background graphs 2. include the RDF merge of the named graphs in the background graph. The first point refers to an arrangement where the background graph is separate from the named graphs. SPARQL further allows one to query the names of graphs by using the GRAPH keyword. Using this GRAPH keyword will result in the querying of triples from named graphs, omitting it will result in the querying of the background graph. If the first arrangement of named and background graphs is considered, then this query mechanism essentially is a mechanism for querying quads, not triples! The graph name is no longer just an ignorable attribute of triples, but is now an essential part of it. It appears to me that there is a mismatch between RDF and SPARQL here. The second arrangement is an entirely different story. Here, graph names are "merely" a triple grouping mechanism; the set of triples that is queried does not depend on whether a query asks for the name of the graph. My main concern with the current spec is that it leaves the choice of the arrangement for RDF Datasets up to the implementer of the query engine. These two arrangements seem to be largely incompatible with each other and as such has the potential to split the RDF community in two camps: an "RDF is quads" camp and an "RDF is triples plus context" camp. I see two possible ways to solve this issue: 1/ standardize on a single arrangement (preferably the latter), or 2/ move the choice of arrangement into the query language. Option 2 is probably the best solution, considering the many use cases requiring different arrangements (trust issues and all...). Also, it only requires small modification to the current SPARQL spec in order to allow the query writer to specify whether the query involves a specific named graph, all named graphs, only the background graph, or the union of it all. These required modifications are: - Make the GRAPH attribute truly optional so that it no longer influences which (set of) graph(s) is queried. - Allow variables in GRAPH attributes to be left unbound for triples that are in the background graph. With these modifications, SPARQL would again be usable as a true triple query language, while still offering the choice of graph arrangement but without risking the interoperability of SPARQL-aware tools. The following queries explain the effect of the proposed modifications: Q1 will evaluate against the union of all graphs: Q1: SELECT * WHERE {?s ?p ?o} Q2 will evaluate against the graph with name <URI>: Q2: SELECT * WHERE GRAPH <URI> {?s ?p ?o} Q3 will again evaluate against the union of all graphs, leaving ?g unbound for triples in the background graph and including multiple solutions for triples that are in more than one graph: Q3: SELECT * WHERE GRAPH ?g {?s ?p ?o} Q4 will evaluate against all named graphs: Q4: SELECT * WHERE GRAPH ?g {?s ?p ?o} FILTER bound(?g) Q5 will evaluate against the background graph: Q5: SELECT * WHERE GRAPH ?g {?s ?p ?o} FILTER !bound(?g) Optionally, a syntactical shortcut could be introduced for restricting queries to the background graph (cf Q5), e.g.: Q6: SELECT * WHERE GRAPH BACKGROUND {?s ?p ?o} I would like the DAWG to seriously consider this proposal. Keeping the SPARQL spec as it is today can have disastrous effects on the interoperability of SPARQL-aware tools. Regards, Arjohn Kampman -- arjohn.kampman@aduna.biz Aduna BV - http://aduna.biz/ Prinses Julianaplein 14-b, 3817 CS Amersfoort, The Netherlands tel. +31-(0)33-4659987 fax. +31-(0)33-4659987
Received on Wednesday, 30 March 2005 16:10:33 UTC