Re: [TF-ENT] Condition C2 modifications from Axel Polleres on 2010-03-04 (public-rdf-dawg@w3.org from January to March 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Thu, 4 Mar 2010 23:36:55 +0000
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Cc: "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-Id: <5EFFE5F5-FC12-4D2B-8B09-2987F682F8A0@deri.org>
I had promised to follow up on this in the last TF-ENT telecon...
Bottomline: thanks for the explanation and no further objections to the changed C2!

... but some more questions inline below.

cheers,
Axel


On 24 Feb 2010, at 20:02, Birte Glimm wrote:

> Hi Axel,
> here a few arguments for why I prefer the proposed/modified C2 version
> and how that relates to data values.
> 
> Current C2:
> (C2) Each variable x that occurs in the subject position of a triple in
>        BGP is such that sk(μ(x)) occurs in sk(SG).
> 
> This C2 is not enough for OWL RDF-Based Semantics since as soon as I
> have an individual in my queried graph, e.g., just by having a triple
> such as
> ex:a ex:b ex:c .
> this entails
> ex:a owl:topDataProperty "1"^^xsd:integer .
> ex:a owl:topDataProperty "2"^^xsd:integer .
> ex:a owl:topDataProperty "3"^^xsd:integer .

I don't really understand the motivation for owl:topDataProperty yet, but ok, 
I understand this is obviously another source for infinite triples, yes.

> etc. And a query with BGP
> ex:a ?x ?y
> has infinite answers. Similar effects happen with number restrictions
> (owl:minCardinality etc). You could strengthen the condition such that
> it applies to variable in both the subject and object position. Even
> with conditions on subject and object positions, that C2 is not
> sufficient if we assume that even for an inconsistent graph SG is
> defined as graph equivalent to the active graph. E.g., if I have
> ex:a ex:b "<"^^rdf:XMLLiteral .
> ex:b rdfs:range rdfs:Literal .
> then I should explicitly say that SG is not graph equivalent and the
> answers are not defined by the spec, but they are system dependant and
> systems have to guarantee finiteness.

Hmmm, re-reading condition 4

"4 -- Each SPARQL extension must provide conditions on answer sets which guarantee that every BGP and AG has a finite set of answers which is unique up to RDF graph equivalence."

I just realise that, strictly read, this condition actually is not restricted to consistent AGs... so it seems that this condition does actually say that a 
unique set of answers also has to be guaranteed for inconsistent graphs? Does that rule out that we can leave the behavior on inconsistent graphs dependent on the implementation? If so, I'd strongly prefer to reformulate:

"4 -- Each SPARQL extension must provide conditions on answer sets which guarantee that every BGP and *consistent* AG has a finite set of answers which is unique up to RDF graph equivalence."


> Obviously systems will just work
> as if there was no inconsistency and you'll only notice the difference
> once the inconsistency is encountered.
> 
> Since systems don't necessarily check consistency, it is, however,
> natural to assume that they just use the definition of answers and the
> conditions indiscriminately. As soon as I allow SG to be graph
> equivalent to AG also for inconsistent AGs, I can, however, also
> imagine an evil system that just says "No matter what, SG=AG" and
> starts generating all kind of shit:
> ex:a ex:aa ex:a .
> ex:a ex:aaa ex:a .
> ex:a ex:aaaa ex:a .
> and both subject and object occur in the input and that's all
> entailed. My worry is just that I have to prove finiteness in any case
> and that's not so easy in this situation. Even if subject and object
> bindings are limited, you still have to be sure that under no
> circumstances you can have infinite predicates. I also don't like that
> in the presence of inconsistencies everything is left to the system. I
> had one variant, where I defined SG for an inconsistent AG as
> "repaired" AG, where malformed XML literals are replaced by
> well-formed ones, but that is also not nice since we don't want to
> return corrected values as answers and it is not really possible to
> repair inconsistencies in OWL Full.
> 
> I then used the condition that you proposed, where C2 applies to
> variables in all positions. That gives you an easy proof of finiteness
> and works also for AG graph equivalent to SG in any case, but it has
> some not nice side effects in particular for OWL RDF Based
> Semantics/OWL RL. E.g., you have
> ex:a ex:b ex:c .
> and a query with BGP:
> ex:a ?pred ex:a
> which has no answer. Then you add
> ex:somethingelse owl:sameAs ex:somethingelse .
> and for the same query you suddenly get
> ?pred/owl:sameAs
> which is not nice since the added triple had nothing to do with ex:a.

I see.

> This is what lead to the proposed C2:
> (C2) For each variable x in V(BGP), sk(μ(x)) occurs in sk(SG) or in Vocab.
> where vocab is defined as the reserved vocabulary for the entailment
> regime (e.g., the RDF vocabulary for RDF entailment) minus terms of
> the form rdf:_n with n in {1, 2, …}.
> 
> Here you can safely say that SG is graph equivalent to AG in any case
> and the condition works for all RDF based regimes. The proof that
> answers will be finite is easy in any case because the vocabulary from
> which answers are taken is finite, because for integers and other data
> values you still get a restriction of infinite answers because data
> values are not in vocab.

Sounds good then, thanks for all the explanations.
> 
> So the question is whether there are objections to the changed C2 and
> if so, why and what would be a better alternative.

No objections, but one additional side question: 

Do we have an issue with systems that use canonical forms of datatype literals internally?

Say you have:

 :s :p "1.000"^^xsd:decimal

is a Datatype-aware system really supposed to return 

 "1.000"^^xsd:decimal

on { :s :p ?O} 

but not it's internal representation?


best, 
Axel
> 
> Birte
> 
> --
> Dr. Birte Glimm, Room 306
> Computing Laboratory
> Parks Road
> Oxford
> OX1 3QD
> United Kingdom
> +44 (0)1865 283529
>
Received on Thursday, 4 March 2010 23:38:14 UTC