- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 21 Aug 2006 10:53:34 +0100
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Bijan Parsia wrote: > I believe that the qualms Andy had about what happens in intermediate > steps was do the different interpretations of what the minimized > final answer set would look like. That is, his understanding was > close to 1 (like pat) and mine were entirely different. However, > algorithms for computing 2 and 3 are easy enough, so I'm doing the > action item anyway. My reservations are: 0/ BNodes over the graph matter - not answer sets 1/ Lack of streaming 2/ Additional processing cost [*] 3/ Non-standard processing (i.e. can't be, say SQL DISTINCT). [*] Even assuming you want to do it at all, graphs can be leaned once, then queried many times. Answer sets have to be leaned on every query result. > > So, first qualification: These algorithms are only minimizing with > respect to BNodes. You have to plug in your own account of literals. > I use "row" for "answer" avoid confusion: > > DEFINITION 1: Answer graph template > Let A be an answer set and Avar be the set of column headings of A. > > The answer graph template of A is the set of triple patterns, such > that: > {tp | _:row ('http://var.org/#" ++ var) var. & var \in Avar} ^^^ value? (Just to show I'm reading this all!) Andy > > > DEFINITION 2: Answer set graph. > An answer set graph ASG for answer set A applying the CONSTRUCT > operation on A with the answer graph template of A. > > DEFINITION 3: REALLYREALLYDISTINCT > An answer set is REALLYREALLYDISTINCT just in case all its > corresponding answer set graphs are lean. > > DEFINITION4: PAIRWISE-DISTINCT > An answer set, A, is PAIRWISE-DISTINCT just in case for every pair > of rows, Ri, Rj, in A, the answer set graph for Ri does not simply > entail the answer set graph for Rj. In other words, that their union > is lean. > > I trust algorithms for computing REALLYREALLYDISTINCT and PAIRWISE- > DISTINCT answer sets is obvious. I don't like PAIRWISE-DISTINCTness, > but it's cheaper to check, in general. > > A REALLYREALLYDISTINCT answer set need not be PAIRWISE-DISTINCT. For > example: > > ?x ?y > _:a "1" > _:a "2" > b "1" > > This is not PAIRWISE-DISTINCT ({_:row var:x b. _:row var:y "1"} > simply entails {_:row var:x _:a. _:row var:y "1".}), but it is > REALLYREALLYDISTINCT ( > {_:row1 var:x _:a. "_:row1 var:y "1" . > _:row2 var:x _:a. _:row2 var:y "2" . > _:row3 var:x b. _:row var:y "1".} > is lean). This shows that coreference of BNodes between answer sets > is significant to REALLYREALLYDISTINCT but not to PAIRWISE-DISTINCT. > > In general, either will be generally cheaper than Pat's proposal, > assuming you have to lean the data graph first (or maybe even check > leanness). Answer sets tend to be smaller than graphs and you only > have to check relevant data (though, as we've seen, there are > opposite cases; e.g., the graph is lean and the answer set not either > of the distincts defined above). Also, the data is more structured. > > Implementation burden, well, for the raw functionality, it's roughly > the same since you have to implement leaning. I think leaning answers > will be easier to implement for the scale and kind of queries people do. > > There is also a close correspondence between a REALLYREALLYDISTINCT > answer set and a CONSTRUCT of that answer set. I conjecture that if > the answer set is REALLYREALLYDISTINCT, that any CONSTRUCTion of it > on a lean template will be lean. Oh, hmm. There are additional > conditions. You have to use all the variables (or it's like a > projection) and maybe only once? > > Cheers, > Bijan. >
Received on Monday, 21 August 2006 09:54:35 UTC