Re: [TF-ENT] Entailment regimes doc update from Birte Glimm on 2010-02-18 (public-rdf-dawg@w3.org from January to March 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Thu, 18 Feb 2010 12:18:13 +0000
To: Ivan Herman <ivan@w3.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>, Chimezie Ogbuji <ogbujic@ccf.org>
Message-ID: <492f2b0b1002180418l9c7157cle851544917f4d333@mail.gmail.com>
[snip]
>> I am not 100% happy. I think it is better because I can imagine tools
>> that allow you to choose between getting all triples and getting only
>> those that use the input vocabulary. Thinking even more about this,
>> here is another refinement that might capture even better what most
>> users would expect.
>> (C2) For each variable x in V(BGP), sk(μ(x)) occurs in sk(SG) or in Vocab.
>> (C3) For each triple s p o in P(BGP), either s or o occurs in sk(SG).
>>
> So... I wonder whether we are not going down the false route here. The
> _only_ requirement we have is to ensure finiteness. (C2) ensures that,
> but we could, maybe, make it even more lax by concentrating only on the
> rdf:_i properties and disallow any of those in the conclusion that do
> not appear in the original graph. (This is what ter Horst does.). That
> might ensure finiteness.

C2 is that lax at the moment since we said vocab is the reserved
vocabulary for the regime you are using minus the rdf:_i and variable
binding come either from there or from the queried graph, so if the
graph talks about rdf:_3, then rdf:_3 is allowed. Ter Horst goes even
a little bit further if I remember correctly by taking the largest n
such that rdf:_n occurs in the input and allows all rdf:_m with m<=n.

> The result set will be large? Sure it will. But this _is_ what the
> RDF/OWL semantics dictates, isn't it, so why would we want to second
> guess the user? Does he/she wants to control the size of the output?
> Well, that is why, in their infinite wisdom, the authors of SPARQL
> invented FILTER-s...:-)

That's a good point. It might be a bit slower to first add all the
results and then remove them again, but in the end there are not that
many. (More than a user might have expected, but in total numbers not
many).

> If I look at the particular example that you gave, here is a modified
> version:
>
> SELECT ?ind ?class
> WHERE {
>  ?ind a ?class .
>  FILTER(
>     !regex( str(?ind), "^http://www.w3.org/2000/01/rdf-schema#" ) &&
>     !regex( str(?ind), "^http://www.w3.org/1999/02/22-rdf-syntax-ns#")
>  )
> }
>
> with the data:
>
> ex:a a ex:C .
>
> and, I believe, with RDFS entailment we would get what we want.

True.

> So I believe we may want to examine an alternative route:
>
> - define (C2) to be the absolute strict minimum to ensure a finite solution
> - look at the existing FILTER possibilities to see if we can handle
> those common use cases. We may want to propose some shortcuts, like the
> one above, ie, some sort of an operation which says
>
> inNamespace( ?x, URI )
>
> which means that ?x as a term starts with the URI.
>
> What do you (and others!) think?

I think that could work out very nicely :-)

Birte

> I have the gut feeling it will also help in defining the RIF
> alternative, b.t.w.
>
> Ivan
>
>
>> Here, C2 is very lax and it allows almost all axiomatic triples, but
>> still guarantees finiteness of the answers in all cases. Now C3
>> basically says that if the instantiated BGP has nothing to do with
>> your data, then omit it. I omit the predicate as an option in C3
>> since predicates such as rdf:type occur in most graphs (possibly by
>> implicit triples as in OWL) and will then let through most axiomatic
>> triples. Answers that are then filtered out are basically axiomatic
>> triples that you didn't mention at all. This could be relaxed to (not
>> sure which one makes more sense, needs more thinking)
>> (C3) There is at least one triple s p o in P(BGP) such that either s
>> or o occurs in sk(SG).
>>
>> Going back to the examples, if we have the query:
>> SELECT ?ind ?class WHERE { ?ind a ?class }
>> and data:
>> [[
>> ex:a a ex:b .
>> ]]
>> as in you generated output, then you would get under OWL RL
>> ?ind/ex:a, ?class/ex:b
>>
>> For the query
>> SELECT ?r WHERE { ex:a ?r ex:a }
>> over
>> [[
>> ex:a ex:b ex:c .
>> ]]
>> you would get
>> ?r/owl:sameAs
>> no mater whether there is ex:something owl:sameAs ex:something or not.
>>
>> It is still possible to generate artificial examples, but less so. E.g.,
>> SELECT ?type WHERE { rdf:type a ?type }
>> over the empty graph gives you nothing since C3 cannot be satisfied in
>> an empty graph. If we query over
>> [[
>> ex:b a ex:c.
>> ]]
>> we get
>> ?type/rdf:Property
>> C2 holds because rdf:Property is in vocab (assuming we do RDF(S) entailment)
>> C3 holds because the instantiated BGP is rdf:type a rdf:Property and
>> the subject rdf:type occurs in its abbreviated form in your data. That
>> seems a nicer compromise. If we don't exclude axiomatic triples, we
>> get infinite answers, and if we exclude some, we get some non-local
>> side effects, but that cannot be totally avoided.
>>
>>> Well... this is clearly not our decision. To be formal, we should
>>> definitely flag that as an issue to be discussed. In some ways, the
>>> question is: is it better to have many potential responses (ie, the user
>>> will have to filter things out) or a small number though some expected
>>> results will not be returned (only via ugly tricks).
>>
>> Yes.
>> I would prefer better too many than too few. Tools will hopefully
>> provide a way to be configured such that they hide what I don't want
>> to see, while still giving me the chance to see it if I want to.
>>
>> [snip]
>>>>                                         Another would be to define
>>>> something like "extensible entailment regimes", where the entailment
>>>> regime is a combination of one of the defined semantics plus some
>>>> rules/axioms that the endpoint will always apply, e.g., the SKOS
>>>> axioms are always assumed to be present.
>>>
>>> I think the answer is: this is where RIF comes in. That gives me the
>>> necessary flexibility (and I can always simulate OWL 2 RL level with a
>>> RIF rule set).
>>
>> Yes. That would be nice.
>>
>>> B.t.w.: a point here for the future RIF discussion:
>>>
>>>  - we have something for OWL 2 RL
>>>  - we will have something for RIF (hopefully)
>>>  - there is a RIF document that, essentially, defines a RIF rule set for
>>> OWL 2 RL
>>>
>>> Surely the result of the entailment should be the same whether I use the
>>> OWL 2 RL entailment definition or the RIF one and use the official rule
>>> set...
>>
>> Yes. It would be not nice at all if that were not the case.
>>
>> Cheers,
>> Birte
>>
>>> Cheers
>>>
>>> Ivan
>>>
>>>> Birte
>>>>
>>>>>> We might still want to exclude axiomatic triples unless they occur in
>>>>>> the input because they potentially add lots of answers that you don't
>>>>>> really want.
>>>>>
>>>>> See above. But if I am wrong, I do not mind that either.
>>>>>
>>>>>> Another possibility would be to apply C2 only to variables in subject
>>>>>> and object position and allow anything in the predicate position. That
>>>>>> still can cause counterintuitive side effects, but many more cases are
>>>>>> covered. In that case, inconsistencies need extra care because in
>>>>>> principle this would still allow infinite answers if the given graph
>>>>>> is inconsistent and we just assume the scoping graph to be equivalent
>>>>>> to the queried graph no matter what.
>>>>>
>>>>> I guess you had that in one of the first drafts and we did not really
>>>>> like it:-(
>>>>> [snip]
>>>>>>>
>>>>>>> Sigh.
>>>>>>
>>>>>> Sigh too.
>>>>>
>>>>> :-)
>>>>>
>>>>> Ivan
>>>>>
>>>>>> Birte
>>>>>>
>>>>>>>
>>>>>>> Ivan
>>>>>>>
>>>>>>> [1]
>>>>>>> http://www.w3.org/TR/2009/REC-owl2-rdf-based-semantics-20091027/#Appendix:_Axiomatic_Triples_.28Informative.29
>>>>>>> [2]
>>>>>>> http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/#Entity_Declarations_and_Typing
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2010-2-16 18:42 , Birte Glimm wrote:
>>>>>>>> Hi all,
>>>>>>>> I have committed a new version of the entailment regimes document:
>>>>>>>> http://www.w3.org/2009/sparql/docs/entailment/xmlspec.xml
>>>>>>>>
>>>>>>>> There is now a description of the OWL RDF-Based Semantics incl. the
>>>>>>>> OWL 2 RL profile. The OWL 2 RL profile can also be used with Direct
>>>>>>>> Semantics, so I have added that there too. Further I have added a
>>>>>>>> section about aggregates with RDF(S) entailment, addressing at least
>>>>>>>> parts of Axel's comments (no owl:sameAs discussion yet for
>>>>>>>> aggregation). I also defined the behaviour for inconsistent graphs
>>>>>>>> more clearly because the previous spec didn't define the scoping graph
>>>>>>>> in the case of inconsistencies. It was rather assumed that the scoping
>>>>>>>> graph is still equivalent to the active graph, so that systems can
>>>>>>>> just use the graph as is modulo bnode renaming, but that allowed
>>>>>>>> infinite answers for inconsistent graphs. I now use Axel's suggestion
>>>>>>>> for condition C2 and require not only bindings for variables inn
>>>>>>>> subject position to occur in the input, but require this for all
>>>>>>>> variables. This also solves the OWL RDF-Based semantics problem where
>>>>>>>> you can have infinite answers from owl:topDataProperty, which relates
>>>>>>>> an individual to all data values. Now all RDF-Based regimes (RDF,
>>>>>>>> RDFS, OWL 2 RDF-Based (for OWL Full and OWL RL)) use the same
>>>>>>>> definitions, which is nice IMO.
>>>>>>>>
>>>>>>>> Birte
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>> mobile: +31-641044153
>>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>> Home: http://www.w3.org/People/Ivan/
>>>>> mobile: +31-641044153
>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>
>>>
>>
>>
>>
>
> --
>
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF   : http://www.ivan-herman.net/foaf.rdf
> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>
>


--
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Thursday, 18 February 2010 12:18:46 UTC