- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Fri, 08 May 2009 00:40:38 -0400
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
(Not sure if the email tag identifying the issue is useful. What do you think?) This email discharges my action http://www.w3.org/2009/sparql/track/actions/21 SPARQL queries are executed against an RDF dataset, which contains a default graph (possibly constructed from the RDF merge of multiple graphs) and zero or more named graphs. In SPARQL/Query 1.0 (nee SPARQL?), a query's RDF dataset is determined by the first of these that applies: 1) The dataset specified in the protocol (via default-graph-uri and named-graph-uri) 2) The dataset specified in the query (via FROM and FROM NAMED) 3) implementation defined At the F2F, we heard 2 different designs for determining what RDF dataset a subquery should be executed against. ARQ - subqueries always use the same dataset used for executing the parent (container) query. Virtuoso - subqueries can specify their own dataset. I wasn't totally clear on what the priority of that is with respect to the protocol. Ignoring the protocol for a second, I think there are two possibilities. # Select details of all recent posts, given a graph which enumerates # which posts those are. (This is a poor example, it could be done with # regular GRAPH clauses and doesn't need a subquery.) SELECT * FROM ex:all_posts { ?post dc:title ?title . { SELECT ?post FROM ex:recent_posts { ?post a ex:Post } } } Option 1: The subquery executes against ex:recent_posts - The rule would be that a subquery can specify its own dataset which trumps the parent's dataset Option 2: subqueries can't specify their own dataset - in this case, I'd suggest this should be an explicit error With the protocol, it's a little murkier. The reason the protocol trumps the query is so that queries can be easily re-targeted against other graphs, without having to parse out any dataset given in the query. If we go with Option 2 above then this is still easy, since the subquery can't specify a dataset. If we allow subqueries to specify a dataset, and the protocol also specifies a dataset, it's unclear what should happen: Option A: Protocol trumps dataset. This seems inconsistent since we're allowing subqueries to have different datasets then their parent, but now all of a sudden the protocol forces both parent & child to share the same subquery. It's hard to imagine a situation where # something useful by sub-querying from g2 instead of g1 SELECT * FROM :g1 { ... { SELECT * FROM :g2 { ... } } } all of a sudden makes sense when the protocol forces both parts to be issued against g3. That is, there's no way (right now) for the protocol to say, override g1 with g3 and g2 with g4. Option B: Protocol trumps dataset for main query, but datasets explicitly in subqueries trump all. This is inconsistent because now the protocol can partially retarget a query but can't touch subqueries. That's weird. Option C: Explicitly prohibit the case where the protocol supplies a dataset for a query that contains a subquery that explicitly specifies its dataset. This works around the problem but is sort of a strange prohibition. It seems to me that ARQ's behavior is simple and avoids this problem, but I'm not sure at what cost. My natural inclination is that its valuable for queries & their subqueries to be able to target different graphs. Current recommendation? Unsure. Suggested next steps? Determine whether we have reasonable use cases to require that subqueries can target different datastes from parent queries. Lee
Received on Friday, 8 May 2009 04:41:25 UTC