[TF-ENT] Condition C2 modifications from Birte Glimm on 2010-02-24 (public-rdf-dawg@w3.org from January to March 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Wed, 24 Feb 2010 20:02:28 +0000
To: Axel Polleres <axel.polleres@deri.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <492f2b0b1002241202n21d6454du2884518dbad07746@mail.gmail.com>
Hi Axel,
here a few arguments for why I prefer the proposed/modified C2 version
and how that relates to data values.

Current C2:
(C2) Each variable x that occurs in the subject position of a triple in
       BGP is such that sk(μ(x)) occurs in sk(SG).

This C2 is not enough for OWL RDF-Based Semantics since as soon as I
have an individual in my queried graph, e.g., just by having a triple
such as
ex:a ex:b ex:c .
this entails
ex:a owl:topDataProperty "1"^^xsd:integer .
ex:a owl:topDataProperty "2"^^xsd:integer .
ex:a owl:topDataProperty "3"^^xsd:integer .
etc. And a query with BGP
ex:a ?x ?y
has infinite answers. Similar effects happen with number restrictions
(owl:minCardinality etc). You could strengthen the condition such that
it applies to variable in both the subject and object position. Even
with conditions on subject and object positions, that C2 is not
sufficient if we assume that even for an inconsistent graph SG is
defined as graph equivalent to the active graph. E.g., if I have
ex:a ex:b "<"^^rdf:XMLLiteral .
ex:b rdfs:range rdfs:Literal .
then I should explicitly say that SG is not graph equivalent and the
answers are not defined by the spec, but they are system dependant and
systems have to guarantee finiteness. Obviously systems will just work
as if there was no inconsistency and you'll only notice the difference
once the inconsistency is encountered.

Since systems don't necessarily check consistency, it is, however,
natural to assume that they just use the definition of answers and the
conditions indiscriminately. As soon as I allow SG to be graph
equivalent to AG also for inconsistent AGs, I can, however, also
imagine an evil system that just says "No matter what, SG=AG" and
starts generating all kind of shit:
ex:a ex:aa ex:a .
ex:a ex:aaa ex:a .
ex:a ex:aaaa ex:a .
and both subject and object occur in the input and that's all
entailed. My worry is just that I have to prove finiteness in any case
and that's not so easy in this situation. Even if subject and object
bindings are limited, you still have to be sure that under no
circumstances you can have infinite predicates. I also don't like that
in the presence of inconsistencies everything is left to the system. I
had one variant, where I defined SG for an inconsistent AG as
"repaired" AG, where malformed XML literals are replaced by
well-formed ones, but that is also not nice since we don't want to
return corrected values as answers and it is not really possible to
repair inconsistencies in OWL Full.

I then used the condition that you proposed, where C2 applies to
variables in all positions. That gives you an easy proof of finiteness
and works also for AG graph equivalent to SG in any case, but it has
some not nice side effects in particular for OWL RDF Based
Semantics/OWL RL. E.g., you have
ex:a ex:b ex:c .
and a query with BGP:
ex:a ?pred ex:a
which has no answer. Then you add
ex:somethingelse owl:sameAs ex:somethingelse .
and for the same query you suddenly get
?pred/owl:sameAs
which is not nice since the added triple had nothing to do with ex:a.

This is what lead to the proposed C2:
(C2) For each variable x in V(BGP), sk(μ(x)) occurs in sk(SG) or in Vocab.
where vocab is defined as the reserved vocabulary for the entailment
regime (e.g., the RDF vocabulary for RDF entailment) minus terms of
the form rdf:_n with n in {1, 2, …}.

Here you can safely say that SG is graph equivalent to AG in any case
and the condition works for all RDF based regimes. The proof that
answers will be finite is easy in any case because the vocabulary from
which answers are taken is finite, because for integers and other data
values you still get a restriction of infinite answers because data
values are not in vocab.

So the question is whether there are objections to the changed C2 and
if so, why and what would be a better alternative.

Birte

-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Wednesday, 24 February 2010 20:03:01 UTC