W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2006

Re: ACTION item(s): definition(s) and an algorithm for postprocessing minimization

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 21 Aug 2006 10:53:34 +0100
Message-ID: <44E9829E.1040906@hp.com>
To: Bijan Parsia <bparsia@cs.man.ac.uk>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>



Bijan Parsia wrote:
> I believe that the qualms Andy had about what happens in intermediate  
> steps was do the different interpretations of what the minimized  
> final answer set would look like. That is, his understanding was  
> close to 1 (like pat) and mine were entirely different. However,  
> algorithms for computing 2 and 3 are easy enough, so I'm doing the  
> action item anyway.

My reservations are:
0/ BNodes over the graph matter - not answer sets
1/ Lack of streaming
2/ Additional processing cost [*]
3/ Non-standard processing (i.e. can't be, say SQL DISTINCT).

[*] Even assuming you want to do it at all, graphs can be leaned once, then 
queried many times.  Answer sets have to be leaned on every query result.

> 
> So, first qualification: These algorithms are only minimizing with  
> respect to BNodes. You have to plug in your own account of literals.  
> I use "row" for "answer" avoid confusion:
> 
> DEFINITION 1: Answer graph template
> 	Let A be an answer set and Avar be the set of column headings of A.
> 
> 	The answer graph template of A is the set of triple patterns, such  
> that:
> 		{tp | _:row ('http://var.org/#" ++ var) var. & var \in Avar}
                                                         ^^^
value?

(Just to show I'm reading this all!)

	Andy

> 	
> 
> DEFINITION 2: Answer set graph.
> 	An answer set graph ASG for answer set A applying the CONSTRUCT  
> operation on A with the answer graph template of A.
> 
> DEFINITION 3: REALLYREALLYDISTINCT
> 	An answer set is REALLYREALLYDISTINCT just in case all its  
> corresponding answer set graphs are lean.
> 
> DEFINITION4: PAIRWISE-DISTINCT
> 	An answer set, A,  is PAIRWISE-DISTINCT just in case for every pair  
> of rows, Ri, Rj, in A, the answer set graph for Ri does not simply  
> entail the answer set graph for Rj. In other words, that their union  
> is lean.
> 
> I trust algorithms for computing REALLYREALLYDISTINCT and PAIRWISE- 
> DISTINCT answer sets is obvious. I don't like PAIRWISE-DISTINCTness,  
> but it's cheaper to check, in general.
> 
> A REALLYREALLYDISTINCT answer set need not be PAIRWISE-DISTINCT. For  
> example:
> 
> 	?x	?y
> 	_:a	"1"
> 	_:a	"2"
> 	b	"1"
> 
> This is not PAIRWISE-DISTINCT ({_:row var:x b. _:row var:y "1"}  
> simply entails {_:row var:x _:a. _:row var:y "1".}), but it is   
> REALLYREALLYDISTINCT (
> 	{_:row1 var:x _:a. "_:row1 var:y "1" .
> 	_:row2 var:x _:a. _:row2 var:y "2" .
> 	_:row3 var:x b. _:row var:y "1".}
> is lean). This shows that coreference of BNodes between answer sets  
> is significant to REALLYREALLYDISTINCT but not to PAIRWISE-DISTINCT.
> 
> In general, either will be generally cheaper than Pat's proposal,  
> assuming you have to lean the data graph first (or maybe even check  
> leanness). Answer sets tend to be smaller than graphs and you only  
> have to check relevant data (though, as we've seen, there are  
> opposite cases; e.g., the graph is lean and the answer set not either  
> of the distincts defined above). Also, the data is more structured.
 >
> Implementation burden, well, for the raw functionality, it's roughly  
> the same since you have to implement leaning. I think leaning answers  
> will be easier to implement for the scale and kind of queries people do.
> 
> There is also a close correspondence between a REALLYREALLYDISTINCT  
> answer set and a CONSTRUCT of that answer set. I conjecture that if  
> the answer set is REALLYREALLYDISTINCT, that any CONSTRUCTion of it  
> on a lean template will be lean. Oh, hmm. There are additional  
> conditions. You have to use all the variables (or it's like a  
> projection) and maybe only once?
> 
> Cheers,
> Bijan.
> 
Received on Monday, 21 August 2006 09:54:35 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:27 GMT