- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Wed, 16 Aug 2006 12:57:40 +0100
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
(As I understand them.)
This is so y'all down have to plow through the debate. In general, I
presume that an answer set is not correct as an answer to a DISTINCT
query if it is redundant with respect to the notion of redundancy the
particular DISTINCT key answers to.
==================
BNODE REDUNDANCY:
1) An answer set is redundant (with respect to BNodes) iff the
original graph is not lean.
That is, algebraic operations never introduce BNode redundancy, thus
no minimization after operations needs to attend to BNodes.
This is roughly the position that I understand Andy and Pat to hold.
There is a different formulation:
1') An answer set is redundant iff an answer doesn't tell us
something distinct about the graph.
They may want to restrict this to BNodes only, but I'm hard pressed
to see why. On this reading, actually, there is no difference between
lean and non-lean. If we have a graph redundancy then the "redundant"
answer tell us something about the graph.
======
2) An answer set is redundant (with respect to BNode) iff some answer
entails another answer
3) An answer set is redundant (with respect to Bnodes and
coreference) iff some "co-referenced" set of answers entails another
"co-referenced" set of answers.
So there are two possibilities: We treat each answer independently
and look for subsumptions (one answer subsumes another iff the first
entails the second; tricky issues with empty bindings to consider)
pairwise. Or, we have to consider *sets* of answers, because
coreference allows us to see *in the answer set* that an answer isn't
redundant. The first is obviously easier to compute, but the answers
*might* be a bit hard to interpret. The latter is more difficult to
compute (an specify) but the answers might be more "pleasing".
Obviously, the notion of "co-referenced" set of answers (a horrid
term) must be defined clearly, but in this message I want to just
give some examples and use cases to give folks the gist.
=====================================
Lean1 :bijan :loves :mochi.
:bijan :eats :mochi.
_:x :hates :mochi.
_:x :scorns :mochi.
i) SELECT DISTINCT ?x ?p ?y {?x ?p ?y}
?x ?p ?y
:bijan :loves :mochi
:bijan :eats :mochi.
_:x3 :hates :mochi.
_:x3 :scorns :mochi.
This is the same answer for 1, 2 and 3. The second variable is unique
in every row, thus each answer is different.
ii) SELECT DISTINCT ?x ?y {?x ?p ?y}
a. ?x ?y
:bijan :mochi
:bijan :mochi.
_:x3 :mochi.
_:x3 :mochi.
This is what we get if we take definition 1' as holding uniformly.
There are distinct reasons for each line.
b. ?x ?y
:bijan :mochi
_:x3 :mochi.
_:x3 :mochi.
This is what we get if take definition 1. There are distinct reasons
for each BNode (but the coreference is significant). If you change
_:x in the last line of Lean1 to _:y, you get the less distasteful:
c. ?x ?y
:bijan :mochi
_:x3 :mochi.
_:x4 :mochi.
So is ii.b redundant or not? It's clearly *lexically* redundant. If
we allow that why not ii.a? Won't this be horribly confusing?
Remember we have no *access* to the *reason* an answer exists, though
we could always ask a *different* query (like i).
d. ?x ?y
:bijan :mochi
This is the answer I expect from 2 and 3.
=========
Lean2 :bijan :loves :zoe.
:zoe :eats :mochi.
_:x :hates _:y.
_:y :scorns :mochi.
iii) SELECT DISTINCT ?x ?p ?y {?x ?p ?y}
(Same as i, but my numbering is getting confusing enough :))
?x ?p ?y
:bijan :loves :zoe
:zoe :eats :mochi.
_:x3 :hates _:x4.
_:x4 :scorns :mochi.
Same for all for the same reasons as before.
iv) SELECT DISTINCT ?x ?y {?x ?p ?y}
a. ?x ?y
:bijan :zoe
:zoe :mochi.
_:x3 _:x4.
_:x4 :mochi.
I think on 1 and 1' this is the correct answer.
b. ?x ?y
:bijan :zoe
:zoe :mochi.
I believe on 2 or 3, this is the correct answer. Consider pairwise
subsumption. Line 1 of iv.a simple entails line 3. Line 2 entails
line 4. Thus, lines 3 and 4 are redundant. If we take the coreference
into account, lines 3 and 4 *together* are subsumed by lines 1 and 2.
So, I'm introducing a notion of entailment here to define
subsumption, in some sense. Let's turn an answer set into a graph
using the following template:
Each row gets a fresh bnode.
Each column header gets a uri by a base concated with #variable + the
variable name
Each value is itself.
Each row is translated as follows:
rowurr columnuri value.
So,
iv.a'
_:row1 :variable?x :bijan.
_:row1 :variable?y :zoe.
_:row2 :variable?x :zoe.
_:row2 :variable?y :mochi.
_:row3 :variable?x _:x3.
_:row3 :variable?y _:x4.
_:row4 :variable?x _:x4.
_:row4 :variable?y :mochi.
_:row1 :variable?x :bijan.
_:row1 :variable?y :zoe.
_:row2 :variable?x :zoe
_:row2 :variable?y :mochi.
and
iv.b'
_:row1 :variable?x :zoe.
_:row1 :variable?y :zoe.
_:row2 :variable?x :zoe.
_:row2 :variable?y :mochi.
Is iv.a' lean? No. Is iv.b'? Yes. They are also simple equivalent in
that each simply entails the other.
iv.a' entails iv.b' because iv.b' is a subgraph of iv.a'. (Subgraph
lemma). Obvious since the first 4 lines in the same.
iv.b' entails iv.a' because iv.b' is an instance of iv.a' (Instance
lemma) with the following substituion:
_:row1 for _:row3
_:row2 for _:row4
:zoe for _:x4
:bijan for _:x3 .
Which is what we'd expect.
I hope I've made clear the various positions and provided enough test
cases to help people understand the various distinctions.
Questions, corrections, and clarifications welcome.
I don't believe this fully discharges my action items, but it may
handle """*ACTION:* bijan to show that the "strong" version of
DISTNCT doesn't interfere with intermittent algebraic operations
[recorded in
http://www.w3.org/2006/08/15-dawg-minutes.html#action01]"""
I mean, it's not a proof, but I can't see how any of my definitions,
partial though they may be, would ever require touching intermittent
operations. A counterexample would be helpful! (Or an example where
the postprocessing approach has unexpected results.) Blanks have to
be taken into account, but I don't know that they are on any of the
above views thus far.
Cheers,
Bijan.
Received on Wednesday, 16 August 2006 11:57:58 UTC