- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Fri, 08 May 2009 00:40:38 -0400
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
(Not sure if the email tag identifying the issue is useful. What do you
think?)
This email discharges my action
http://www.w3.org/2009/sparql/track/actions/21
SPARQL queries are executed against an RDF dataset, which contains a
default graph (possibly constructed from the RDF merge of multiple
graphs) and zero or more named graphs.
In SPARQL/Query 1.0 (nee SPARQL?), a query's RDF dataset is determined
by the first of these that applies:
1) The dataset specified in the protocol (via default-graph-uri and
named-graph-uri)
2) The dataset specified in the query (via FROM and FROM NAMED)
3) implementation defined
At the F2F, we heard 2 different designs for determining what RDF
dataset a subquery should be executed against.
ARQ - subqueries always use the same dataset used for executing the
parent (container) query.
Virtuoso - subqueries can specify their own dataset. I wasn't totally
clear on what the priority of that is with respect to the protocol.
Ignoring the protocol for a second, I think there are two possibilities.
# Select details of all recent posts, given a graph which enumerates
# which posts those are. (This is a poor example, it could be done with
# regular GRAPH clauses and doesn't need a subquery.)
SELECT *
FROM ex:all_posts
{
?post dc:title ?title .
{
SELECT ?post FROM ex:recent_posts { ?post a ex:Post }
}
}
Option 1: The subquery executes against ex:recent_posts - The rule would
be that a subquery can specify its own dataset which trumps the parent's
dataset
Option 2: subqueries can't specify their own dataset - in this case, I'd
suggest this should be an explicit error
With the protocol, it's a little murkier. The reason the protocol trumps
the query is so that queries can be easily re-targeted against other
graphs, without having to parse out any dataset given in the query.
If we go with Option 2 above then this is still easy, since the subquery
can't specify a dataset.
If we allow subqueries to specify a dataset, and the protocol also
specifies a dataset, it's unclear what should happen:
Option A: Protocol trumps dataset. This seems inconsistent since we're
allowing subqueries to have different datasets then their parent, but
now all of a sudden the protocol forces both parent & child to share the
same subquery. It's hard to imagine a situation where
# something useful by sub-querying from g2 instead of g1
SELECT * FROM :g1 { ... { SELECT * FROM :g2 { ... } } }
all of a sudden makes sense when the protocol forces both parts to be
issued against g3. That is, there's no way (right now) for the protocol
to say, override g1 with g3 and g2 with g4.
Option B: Protocol trumps dataset for main query, but datasets
explicitly in subqueries trump all. This is inconsistent because now the
protocol can partially retarget a query but can't touch subqueries.
That's weird.
Option C: Explicitly prohibit the case where the protocol supplies a
dataset for a query that contains a subquery that explicitly specifies
its dataset. This works around the problem but is sort of a strange
prohibition.
It seems to me that ARQ's behavior is simple and avoids this problem,
but I'm not sure at what cost. My natural inclination is that its
valuable for queries & their subqueries to be able to target different
graphs.
Current recommendation? Unsure.
Suggested next steps? Determine whether we have reasonable use cases to
require that subqueries can target different datastes from parent queries.
Lee
Received on Friday, 8 May 2009 04:41:25 UTC