Re: [ENT] Review comments on the SPARQL 1.1. Entailment regime document from Birte Glimm on 2010-05-23 (public-rdf-dawg@w3.org from April to June 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Sun, 23 May 2010 14:23:41 +0100
To: Chimezie Ogbuji <ogbujic@ccf.org>
Cc: Axel Polleres <axel.polleres@deri.org>, Sandro Hawke <sandro@w3.org>, Ivan Herman <ivan@w3.org>, W3C SPARQL WG <public-rdf-dawg@w3.org>
Message-ID: <AANLkTimdkMM6i2Ccl-G9XJd3_Xizy2J6UoPqYBFaOzxe@mail.gmail.com>
Hi Chime,
somehow the mail got lost in all the other SPARQL mails, so I am a bit
late with answering it. I only comment on what needs clarification.
For all the comments that are already addressed by changes in the
document I am happy with the outcome.

2010/5/20 Chimezie Ogbuji <ogbujic@ccf.org>:
> Birte, thanks for the comments.  I intend to address them (hopefully by the
> end of the night) but there are some principled concerns that I wanted to
> raise ASAP (because I believe they warrant an editorial note) so there is
> enough time to work through the changes we resolved to make with your
> approval prior to publication.

[snip]
>> There are still several places where we seem to enforce RIF strongly
>> safe core documents, which is no longer necessary given the relaxed
>> conditions on extensions to BGP matching.
>
> This is the part that is problematic for me.  For instance, RDF entailment
> *enforces* finite answers and appeals to the relaxation of the restrictions
> by (essentially) categorizing entailment from BNodes and axiomatic triples
> as 'trivial' (I put trivial in quotes because - as Andy mentioned - the word
> is problematic and often used subjectively as is the case here).
>
> It seems to me like the same standard is being applied differently and
> justified by a subjective determination of what kind of infiniteness is
> appropriate to rule out.

I consider them trivial, because all answers that are omitted from the
actual solutions can be reconstructed from the solutions
without even knowing the queried graph (at least under RDF/RDFS
entailments). E.g., if a query
SELECT ?x WHERE { ?x ex:b ex:c }
gives the answers ?x/ex:a1 and ?x/ex:a2, then I can get all the
infinitely many possible solutions by replacing ex:a1 and ex:a2 with
all (infinitely many) blank nodes. Each so obtained answer, e.g.,
_:xxx ex:b ex:c is entailed by the queried graph, although you don't
know which triples were in the graph. Thus, computing also all
possible answers can be done by a post-procssor that only has to know
the query and the returned answers. Giving you all the (infinite)
answers straight away is really not necessary and I could even start
returning binding of the for ?x/_:bnode1, ?x/_:bnode2, etc which means
you never get to see the important 2 answers in a finite amount of
time.

I believe a similar argument can be given for the axiomatic triples.
With rules, however, this is not the case, i.e., there are rule sets
that cause infinite answers, but I cannot construct all the answers
from a finite subset of the answers and the infinite answer set has no
redundancies. I am still not sure about how to writ RIF rules
properly, but imagine a rule that generates odd numbers (I believe
that can be done given Odd(1) as a fact and then a rule that uses
built-ns to increment a given x by two like Odd(x) implies Odd(x+2)).
Now if you query for odd numbers and cut of the solutions at any
point, you cannot say without knowing the rue set whether the returned
solutions come from asserted fact or from inferences and whether
somehow the inferences stopped because a rule was no longer applicable
or whether the solutions have just been omitted because of some
cut-off that is used to guarantee finiteness.

Maybe we should put in something stronger in condition 4 that really
properly defines some notion of redundancy in solutions, but that
requires than proof that the regimes meet the defined redundancy
condition and it will take some time to work this out properly.

BTW, another problem the you easily get, when you don't filter out
bnode redundancies is that condition 3 is violated. E.g., if you have
SG:
ex:a ex:b ex:c .
ex:d ex:e ex:f .
and the query
SELECT ?x WHERE { ex:a ex:b ?x . ?y ex:e ex:f . }
Then without filtering bnodes in any way, you get among other answers
mu1: ?x/_:bnode1, ?y/_:bnode2
but also
mu2: ?x/_:bnode2, ?y/_:bnode1
Condition 3 requires that
SG entails SG union P1(BGP) union P2(BGP)
Note that since the BGP of the query has no bnodes no renaming into
BGP1 and BGP2 is really required and even the empty RDF instance
mapping sigma can be used both for P1 and P2 to give:
SG entails ex:a ex:b ex:c . ex:d ex:e ex:f . ex:a ex:b _:bnode1 .
_:bnode2 ex:e ex:f . ex:a ex:b _:bnode2 . _:bnode1 ex:e ex:f .
most interesting here is
SG entails ex:a ex:b _:bnode1 . _:bnode1 ex:e ex:f .
and
SG entails ex:a ex:b _:bnode2 . _:bnode2 ex:e ex:f .
These two entailments do NOT hold. Reusing the same bnode in different
query answers, where the bnodes refer to different elements in the
graph causes these problems. Each query answer by itself is entailed,
but putting them all together as condition 3 requires is not possibe.
Thus, filtering out bnode redundancies is not only required for
finiteness.

This is what C1 does and although you define the Skolemisation as I
do, you don't have C1 any more, which requires in particular that
Skolemizing P(BGP) results in ground triples, i.e., all bnodes in
P(BGP) are from the known bnodes in SG.


>> ..snip.. Ideally, I would just
>> declare this as solutions without messing around, but unfortunately
>> blank nodes and axiomatic triples can introduce infinitely many
>> redundant answers that nobody really wants to see. This infiniteness
>> is quite different than those that you get from recursive rules say.
>
> I don't think this form of infiniteness is different from those you get from
> recursive rules.  An infinite solution set is an infinite solution set
> regardless of how you come about it, and if the general goal was to allow
> infinite solution sets then we should just remove the restriction altogether
> rather than introducing ambiguity in order to justify removing some forms of
> infiniteness.  Unfortunately, I wasn't able to participate in the F2F to
> argue this point.

see above

>> ..snip..constant in question cannot have been in the queried graph. Anyway,
>> long answer with mostly theoretical concerns ;-), but I would prefer
>> to use sk as in the other regimes or, if you don't want that, at least
>> say that sk is defined for any bnode in any BGP.
>
> I appreciate the long answer, it does help me understand skolemization as it
> relates to how we reduce infiniteness via Bnode.  However, I'm not sure
> exactly where my use of sk differs from the other regimes since I
> essentially copied the text from that of the other regimes.  Can you show me
> exactly where the difference is?

as pointed out above the keyword in C1 (which you don't have) is that
sk(P(BGP)) are ground, i.e., bnode free triples, which means all
bnodes are from the known ones in the scoping graph. For other bnodes
sk is not defined and maps them to themselves.


>> 7.3
>> ... This entailment regime restricts the legal graphs to only those
>> that refer to strongly safe RIF core documents. ...
>> That is no longer required. We can now allow also not strongly safe
>> RIF core documents,
>
> Yes, but as I've mentioned before, by that same logic RDF entailment (for
> example) should not include provisions to rule out infiniteness by relying
> only on a subjective interpretation of what is trivial.  If we want to apply
> the standard in the same way, then RDF entailment (and the others) should be
> defined such that infiniteness is *not* ruled and an informative section
> should be added where the conditions that rule them out are placed.
> Otherwise, it is a double standard.

Not having C1 means that any answer set is infinite and full of
redundancy. This would render the regimes pretty much useless and
condition 3 cannot be satisfied either. One could argue that
infiniteness from axiomatic triples is slightly different, but all
these triples are know and do not really depend on the queried graph,
so adding them to the answer does again not really give you anything
that you didn't know before, so for me there is a difference.

>> but in that case, finiteness is not guaranteed.
>> What the changed condition on extensions to BGP matching requires is
>> that trivial sources of infinite answers should be defined and
>> excluded by the regime. This is deliberately a bit vague,
>
> Yes, very vague.  So much so, that I wonder about the value of using
> 'trivial' at all.

For me the previous condition was pretty ok and if you have anything
that is close to the original one, but works for RIF, I am happy to
strengthen it. Loosen it by even dropping trivial is not an option for
me.
Birte
Received on Sunday, 23 May 2010 13:24:15 UTC