- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Sat, 19 Aug 2006 18:52:30 +0100
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>

I believe that the qualms Andy had about what happens in intermediate steps was do the different interpretations of what the minimized final answer set would look like. That is, his understanding was close to 1 (like pat) and mine were entirely different. However, algorithms for computing 2 and 3 are easy enough, so I'm doing the action item anyway. So, first qualification: These algorithms are only minimizing with respect to BNodes. You have to plug in your own account of literals. I use "row" for "answer" avoid confusion: DEFINITION 1: Answer graph template Let A be an answer set and Avar be the set of column headings of A. The answer graph template of A is the set of triple patterns, such that: {tp | _:row ('http://var.org/#" ++ var) var. & var \in Avar} DEFINITION 2: Answer set graph. An answer set graph ASG for answer set A applying the CONSTRUCT operation on A with the answer graph template of A. DEFINITION 3: REALLYREALLYDISTINCT An answer set is REALLYREALLYDISTINCT just in case all its corresponding answer set graphs are lean. DEFINITION4: PAIRWISE-DISTINCT An answer set, A, is PAIRWISE-DISTINCT just in case for every pair of rows, Ri, Rj, in A, the answer set graph for Ri does not simply entail the answer set graph for Rj. In other words, that their union is lean. I trust algorithms for computing REALLYREALLYDISTINCT and PAIRWISE- DISTINCT answer sets is obvious. I don't like PAIRWISE-DISTINCTness, but it's cheaper to check, in general. A REALLYREALLYDISTINCT answer set need not be PAIRWISE-DISTINCT. For example: ?x ?y _:a "1" _:a "2" b "1" This is not PAIRWISE-DISTINCT ({_:row var:x b. _:row var:y "1"} simply entails {_:row var:x _:a. _:row var:y "1".}), but it is REALLYREALLYDISTINCT ( {_:row1 var:x _:a. "_:row1 var:y "1" . _:row2 var:x _:a. _:row2 var:y "2" . _:row3 var:x b. _:row var:y "1".} is lean). This shows that coreference of BNodes between answer sets is significant to REALLYREALLYDISTINCT but not to PAIRWISE-DISTINCT. In general, either will be generally cheaper than Pat's proposal, assuming you have to lean the data graph first (or maybe even check leanness). Answer sets tend to be smaller than graphs and you only have to check relevant data (though, as we've seen, there are opposite cases; e.g., the graph is lean and the answer set not either of the distincts defined above). Also, the data is more structured. Implementation burden, well, for the raw functionality, it's roughly the same since you have to implement leaning. I think leaning answers will be easier to implement for the scale and kind of queries people do. There is also a close correspondence between a REALLYREALLYDISTINCT answer set and a CONSTRUCT of that answer set. I conjecture that if the answer set is REALLYREALLYDISTINCT, that any CONSTRUCTion of it on a lean template will be lean. Oh, hmm. There are additional conditions. You have to use all the variables (or it's like a projection) and maybe only once? Cheers, Bijan.

Received on Saturday, 19 August 2006 17:57:41 UTC