# Re: ACTION item(s): definition(s) and an algorithm for postprocessing minimization

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 21 Aug 2006 10:53:34 +0100
Message-ID: <44E9829E.1040906@hp.com>
To: Bijan Parsia <bparsia@cs.man.ac.uk>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>


Bijan Parsia wrote:
> I believe that the qualms Andy had about what happens in intermediate
> steps was do the different interpretations of what the minimized
> final answer set would look like. That is, his understanding was
> close to 1 (like pat) and mine were entirely different. However,
> algorithms for computing 2 and 3 are easy enough, so I'm doing the
> action item anyway.

My reservations are:
0/ BNodes over the graph matter - not answer sets
1/ Lack of streaming
3/ Non-standard processing (i.e. can't be, say SQL DISTINCT).

[*] Even assuming you want to do it at all, graphs can be leaned once, then
queried many times.  Answer sets have to be leaned on every query result.

>
> So, first qualification: These algorithms are only minimizing with
> respect to BNodes. You have to plug in your own account of literals.
> I use "row" for "answer" avoid confusion:
>
> DEFINITION 1: Answer graph template
> 	Let A be an answer set and Avar be the set of column headings of A.
>
> 	The answer graph template of A is the set of triple patterns, such
> that:
> 		{tp | _:row ('http://var.org/#" ++ var) var. & var \in Avar}
^^^
value?

(Just to show I'm reading this all!)

Andy

>
>
> DEFINITION 2: Answer set graph.
> 	An answer set graph ASG for answer set A applying the CONSTRUCT
> operation on A with the answer graph template of A.
>
> DEFINITION 3: REALLYREALLYDISTINCT
> 	An answer set is REALLYREALLYDISTINCT just in case all its
> corresponding answer set graphs are lean.
>
> DEFINITION4: PAIRWISE-DISTINCT
> 	An answer set, A,  is PAIRWISE-DISTINCT just in case for every pair
> of rows, Ri, Rj, in A, the answer set graph for Ri does not simply
> entail the answer set graph for Rj. In other words, that their union
> is lean.
>
> I trust algorithms for computing REALLYREALLYDISTINCT and PAIRWISE-
> DISTINCT answer sets is obvious. I don't like PAIRWISE-DISTINCTness,
> but it's cheaper to check, in general.
>
> A REALLYREALLYDISTINCT answer set need not be PAIRWISE-DISTINCT. For
> example:
>
> 	?x	?y
> 	_:a	"1"
> 	_:a	"2"
> 	b	"1"
>
> This is not PAIRWISE-DISTINCT ({_:row var:x b. _:row var:y "1"}
> simply entails {_:row var:x _:a. _:row var:y "1".}), but it is
> REALLYREALLYDISTINCT (
> 	{_:row1 var:x _:a. "_:row1 var:y "1" .
> 	_:row2 var:x _:a. _:row2 var:y "2" .
> 	_:row3 var:x b. _:row var:y "1".}
> is lean). This shows that coreference of BNodes between answer sets
> is significant to REALLYREALLYDISTINCT but not to PAIRWISE-DISTINCT.
>
> In general, either will be generally cheaper than Pat's proposal,
> assuming you have to lean the data graph first (or maybe even check
> leanness). Answer sets tend to be smaller than graphs and you only
> have to check relevant data (though, as we've seen, there are
> opposite cases; e.g., the graph is lean and the answer set not either
> of the distincts defined above). Also, the data is more structured.
>
> Implementation burden, well, for the raw functionality, it's roughly
> the same since you have to implement leaning. I think leaning answers
> will be easier to implement for the scale and kind of queries people do.
>
> There is also a close correspondence between a REALLYREALLYDISTINCT
> answer set and a CONSTRUCT of that answer set. I conjecture that if
> the answer set is REALLYREALLYDISTINCT, that any CONSTRUCTion of it
> on a lean template will be lean. Oh, hmm. There are additional
> conditions. You have to use all the variables (or it's like a
> projection) and maybe only once?
>
> Cheers,
> Bijan.
>

Received on Monday, 21 August 2006 09:54:35 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:27 GMT