- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Tue, 10 Oct 2006 09:08:25 +0100
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
I'm following: http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Apr/ 0000.html http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Jun/ 0008.html And section 4 of the Semantics and Complexity of SPARQL (SCS). This may correspond with some of Fred's issues and I apologize for not reconciling with his texts and email. Sorry about length. I'm trying to be clear but still cutting out a lot. ================================= Some high level points: Point 1: SCS (and the two email above) assert that it is strongly desirable that the semantics of the SPARQL algebra have normal features one expects, e.g., compositional semantics, commutativity of conjunction, etc. I agree with these. They are very important for optimization, for example, and for end users. Point 2: There are possible *ways of evaluating a SPARQL parse tree (or abstract syntax tree)* that do not support these features (as shown in SCS). The way to evaluate a SPARQL parse tree (oh, let's call it a "query plan" ok?) is not clear, from what I can tell, in the current spec. 0000.html suggests this. We should definitely *nail down* a semantics for the evaluation of query plans that is *unambiguous*. I would prefer that it met the criteria in point 1 as well. Point 3: There are related issues with the scope of variables. (I think.) ===== Here's a little explication of proposition of SCS. None of the content is my work. In SCS, there is defined what I shall call UNION-normal form. That is, query plans may be normalized to a UNION of UNION free query plans. Obviously this relies on the distributivity of UNION over all the other operators. Drawing from SCS, proposition 1, we see, for example, for query plans P1, P2, and P3: (P1 OPT (P2 UNION P3)) <==> ((P1 OPT P2) UNION (P1 OPT P3)) The rest of proposition 1 (aside from (1) which makes AND and UNION associative and communtative) states similar equivalences. One nice thing about this is it allows us to focus on *UNION-free* query plans (UFQ) ...since we can always push the UNIONs "outward". (The commutativity of UNION means we can evaluate the component query plans in any order, as one should expect) I think that that proposition 1 should hold. We should want these equivalences and other properties. ====== This is an explication of section 4 of SCS, specifically 4.1. Still not my work :) SCS claims that there are two *semantically* distinct ways of evaluating UFQs: One that obeys Proposition 1, and one which does not (the latter they observe as the behavior of ARQ). I'm going to call these E1 (for the compositional evaluation) and E2 (for the ARQ/ operational/navigational/depth-first way). (I'm *so* good at naming.) For each case, let's ignore FILTER and focus on OPT (and obviously AND :)). For E1, I refer to the formal semantics given in SCS. Bascially, AND is a join, and OPT is a left-outer-join. The paper even defines these for ya! (Terminology point: think of "mapping" as being an answer (i.e., set of bindings). "compatibility" just means that two answers/rows that you are, e.g., joining have the same binding for shared variables.) For E2, things are a bit more complicated to state. E2 is a *depth first* evaluation of a plan. So, let's consider the following plan: ((BGP1 AND BGP2) OPT BGP3) Parse tree: OPT AND BGP1 BGP2 BGP3 So, we go deep, the first directly evaluable node is BGP1, then BGP2, then the AND, then BGP3 (sibling of the AND) then the OPT (the OPT is tricky). Let E2 be the evaluation operator with an implicit Dataset argument and two arguments: A pattern to be evaluated, and a current set of results): So, BGP1 is evaluated against the empty set, E2 (BGP1, {}). BGP2 is evaluated against those results, etc., i.e.: E2(BGP2, E2(BGP1, {}) (Let's call this A1). Ok, so far so good. But how to evaluate the optional? I shall use J for join, and LOJ for left outer join. The intution behind E2 is that for OPT(P1, P2), *first* you get the answers for P1, then check which of the answers may be extended with binding for P2. Thus, for our OPT(R1, BGP3), we perform LOJ(A1, E2(BGP3, A1)) Sigh. not so clear, but let's compare it to a direct LOJ reading of OPT/J reading of And: LOJ(J(BGP1, BGP2), BGP3) Their examples are a bit different, but let's consider example 4, which shows that on E2 AND is not (always) communitative: P1 = ((?X :name :paul) AND ((?Y :name :george) OPT (?X :email ?Z))) P2 = (((?Y :name :george) OPT (?X :email ?Z)) AND (?X :name :paul)) Data: _:B1 :name :paul. _:B3 :name :george. So, let T1 = (?X :name :paul) T2 = (?Y :name :george) T3 = (?X :email ?Z) we have: P1 = (T1 AND (T2 OPT T3)) P2 = ((T2 OPT T3) and T1) By E1 evaluation, we have: E1(P1) = J(T1, LOJ(T2, T3)) E1(P2) = J(LOJ(T2, T3), T1) E2 goes like (and this is killing me): E2(P1) = E2(LOJ( E2(T2, E2(T1, {}), E2(T3, E2(T2, E2(T1,{}))), E2(T1, {})) E2(P2) = E2(T1, LOJ(E2(T2, {}), E2(T3, E2(T2, {}))) Ok, so, what are the results. For the E1 cases, we get no results because LOJ(T1, T2) yields ?Y ?X ?Z _:B3 UB UB (I.e., ?x, and ?z are unbound) But then J(T1, LOJ(T1, T2)) is empty since _:B1 and UB don't match. But E2(P1) yields: ?X ?y ?Z _:B1 _:B3 UB Whereas E2(P2) give no answers. Sigh. Now, there's a quite natural *mis*reading where E2(P1) gives us the right answers. If we made UB match anything, I think we'd get the right answers all around for this query (but spooooooky). However, losing commutativity of AND is a high price to pay. I say misreading because SCS points out that there's something fishy about the way the variables are arranged. It seems like T3, positionwise, is giving optional info about george, but, variablewise it's about paul. They forbid such variable jumping in one restriction, and when you do that E1 and E2 coincide. I *think* that's my preference. The above is a stupid query, very counterintuitive. I personally would still like to just the compositional semantics explicitly, but I think forbiding such "ill- defined" queries is also a very good idea. Whew. Only 00:40 here. I want to re-re-reemphasize again and again that I'm totally regurgitating the published work of Jorge Perez, Marcelo Arenas, and Claudio Gutierrez. This example conclusively, to my mind, shows that we need a clear and formal specification of the algebra. As an implementor, I find the normal form, and the various properties extremely useful. The direct mapping to Join and Left Outer Join is very nice (even if you don't use SQL for BGP matching, it's nice to be able to use a query planner with what one might call "virtual tables"). The current formal definition of OPTIONAL isn't very formal or very definitional. Let's take the first clause: """An optional graph pattern is a combination of a pair of graph patterns. The second pattern modifies pattern solutions of the first pattern but does not fail matching of the overall optional graph pattern.""" The first sentence is a bit vague (I would prefer something like "an ordered pair of graph patterns", if we continue with the non-operator oriented approach we see in UNION). The second is, on the one hand, written like informative text, but, on the other, suggests that E2 is the mode of evaluation. """If Opt(A, B) is an optional graph pattern, where A and B are graph patterns, then S is a solution of optional graph pattern if S is a pattern solution of A and of B otherwise if S is a solution to A, but not to A and B.""" This seems quite close to the definition of left outer join. However, I don't see any mention of unboundedness, so it's a little weird. (Thanks to Jorge for reviewing this message and helping in general. There are other issues (sigh), but other posts, eh?) Cheers,. Biijan.
Received on Tuesday, 10 October 2006 08:08:43 UTC