# ACTION item(s): definition(s) and an algorithm for postprocessing minimization

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Sat, 19 Aug 2006 18:52:30 +0100
Message-Id: <816A1377-D8DD-4857-B443-F93509A02C32@cs.man.ac.uk>
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>

I believe that the qualms Andy had about what happens in intermediate
steps was do the different interpretations of what the minimized
final answer set would look like. That is, his understanding was
close to 1 (like pat) and mine were entirely different. However,
algorithms for computing 2 and 3 are easy enough, so I'm doing the
action item anyway.

So, first qualification: These algorithms are only minimizing with
respect to BNodes. You have to plug in your own account of literals.
I use "row" for "answer" avoid confusion:

Let A be an answer set and Avar be the set of column headings of A.

The answer graph template of A is the set of triple patterns, such
that:
{tp | _:row ('http://var.org/#" ++ var) var. & var \in Avar}

An answer set graph ASG for answer set A applying the CONSTRUCT
operation on A with the answer graph template of A.

DEFINITION 3: REALLYREALLYDISTINCT
An answer set is REALLYREALLYDISTINCT just in case all its
corresponding answer set graphs are lean.

DEFINITION4: PAIRWISE-DISTINCT
An answer set, A,  is PAIRWISE-DISTINCT just in case for every pair
of rows, Ri, Rj, in A, the answer set graph for Ri does not simply
entail the answer set graph for Rj. In other words, that their union
is lean.

I trust algorithms for computing REALLYREALLYDISTINCT and PAIRWISE-
DISTINCT answer sets is obvious. I don't like PAIRWISE-DISTINCTness,
but it's cheaper to check, in general.

A REALLYREALLYDISTINCT answer set need not be PAIRWISE-DISTINCT. For
example:

?x	?y
_:a	"1"
_:a	"2"
b	"1"

This is not PAIRWISE-DISTINCT ({_:row var:x b. _:row var:y "1"}
simply entails {_:row var:x _:a. _:row var:y "1".}), but it is
REALLYREALLYDISTINCT (
{_:row1 var:x _:a. "_:row1 var:y "1" .
_:row2 var:x _:a. _:row2 var:y "2" .
_:row3 var:x b. _:row var:y "1".}
is lean). This shows that coreference of BNodes between answer sets
is significant to REALLYREALLYDISTINCT but not to PAIRWISE-DISTINCT.

In general, either will be generally cheaper than Pat's proposal,
assuming you have to lean the data graph first (or maybe even check
leanness). Answer sets tend to be smaller than graphs and you only
have to check relevant data (though, as we've seen, there are
opposite cases; e.g., the graph is lean and the answer set not either
of the distincts defined above). Also, the data is more structured.

Implementation burden, well, for the raw functionality, it's roughly
the same since you have to implement leaning. I think leaning answers
will be easier to implement for the scale and kind of queries people do.

There is also a close correspondence between a REALLYREALLYDISTINCT
answer set and a CONSTRUCT of that answer set. I conjecture that if
the answer set is REALLYREALLYDISTINCT, that any CONSTRUCTion of it
on a lean template will be lean. Oh, hmm. There are additional
conditions. You have to use all the variables (or it's like a
projection) and maybe only once?

Cheers,
Bijan.

Received on Saturday, 19 August 2006 17:57:41 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:00:51 UTC