Re: [TF-ENT] Entailment regimes doc update from Birte Glimm on 2010-02-19 (public-rdf-dawg@w3.org from January to March 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Fri, 19 Feb 2010 10:57:18 +0000
To: Ivan Herman <ivan@w3.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>, Chimezie Ogbuji <ogbujic@ccf.org>, Axel Polleres <axel.polleres@deri.org>
Message-ID: <492f2b0b1002190257r36916e7dy7a5884f29359a24b@mail.gmail.com>
On 19 February 2010 08:26, Ivan Herman <ivan@w3.org> wrote:
> I am not sure, procedurally, what the next step is. Would we need a nod
> from the WG to make these changes on your draft, or would you just go
> ahead and do it? It affects the (C2) condition on previously published
> entailment regimes, too.

You know the official process much better than I do I guess ;-)

> Personally, I would prefer some other 'go-ahead' reactions from all
> parties interested... You can also raise this shortly on the next telco
> if there is time enough (noting that this document still has the
> 'time-permitting' label attached to it...)

How about putting that on the agenda of our (RIF) entailment regimes
teleconf. At least there we'll have a bigger group and everybody cares
about entailments. If we all agree, then I would just go ahead and
change it unless there are official requirements that require a thumbs
up from the whole group.

Birte

> Ivan
>
> On 2010-2-18 13:18 , Birte Glimm wrote:
>> [snip]
>>>> I am not 100% happy. I think it is better because I can imagine tools
>>>> that allow you to choose between getting all triples and getting only
>>>> those that use the input vocabulary. Thinking even more about this,
>>>> here is another refinement that might capture even better what most
>>>> users would expect.
>>>> (C2) For each variable x in V(BGP), sk(μ(x)) occurs in sk(SG) or in Vocab.
>>>> (C3) For each triple s p o in P(BGP), either s or o occurs in sk(SG).
>>>>
>>> So... I wonder whether we are not going down the false route here. The
>>> _only_ requirement we have is to ensure finiteness. (C2) ensures that,
>>> but we could, maybe, make it even more lax by concentrating only on the
>>> rdf:_i properties and disallow any of those in the conclusion that do
>>> not appear in the original graph. (This is what ter Horst does.). That
>>> might ensure finiteness.
>>
>> C2 is that lax at the moment since we said vocab is the reserved
>> vocabulary for the regime you are using minus the rdf:_i and variable
>> binding come either from there or from the queried graph, so if the
>> graph talks about rdf:_3, then rdf:_3 is allowed. Ter Horst goes even
>> a little bit further if I remember correctly by taking the largest n
>> such that rdf:_n occurs in the input and allows all rdf:_m with m<=n.
>>
>>> The result set will be large? Sure it will. But this _is_ what the
>>> RDF/OWL semantics dictates, isn't it, so why would we want to second
>>> guess the user? Does he/she wants to control the size of the output?
>>> Well, that is why, in their infinite wisdom, the authors of SPARQL
>>> invented FILTER-s...:-)
>>
>> That's a good point. It might be a bit slower to first add all the
>> results and then remove them again, but in the end there are not that
>> many. (More than a user might have expected, but in total numbers not
>> many).
>>
>>> If I look at the particular example that you gave, here is a modified
>>> version:
>>>
>>> SELECT ?ind ?class
>>> WHERE {
>>>  ?ind a ?class .
>>>  FILTER(
>>>     !regex( str(?ind), "^http://www.w3.org/2000/01/rdf-schema#" ) &&
>>>     !regex( str(?ind), "^http://www.w3.org/1999/02/22-rdf-syntax-ns#")
>>>  )
>>> }
>>>
>>> with the data:
>>>
>>> ex:a a ex:C .
>>>
>>> and, I believe, with RDFS entailment we would get what we want.
>>
>> True.
>>
>>> So I believe we may want to examine an alternative route:
>>>
>>> - define (C2) to be the absolute strict minimum to ensure a finite solution
>>> - look at the existing FILTER possibilities to see if we can handle
>>> those common use cases. We may want to propose some shortcuts, like the
>>> one above, ie, some sort of an operation which says
>>>
>>> inNamespace( ?x, URI )
>>>
>>> which means that ?x as a term starts with the URI.
>>>
>>> What do you (and others!) think?
>>
>> I think that could work out very nicely :-)
>>
>> Birte
>>
>>> I have the gut feeling it will also help in defining the RIF
>>> alternative, b.t.w.
>>>
>>> Ivan
>>>
>>>
>>>> Here, C2 is very lax and it allows almost all axiomatic triples, but
>>>> still guarantees finiteness of the answers in all cases. Now C3
>>>> basically says that if the instantiated BGP has nothing to do with
>>>> your data, then omit it. I omit the predicate as an option in C3
>>>> since predicates such as rdf:type occur in most graphs (possibly by
>>>> implicit triples as in OWL) and will then let through most axiomatic
>>>> triples. Answers that are then filtered out are basically axiomatic
>>>> triples that you didn't mention at all. This could be relaxed to (not
>>>> sure which one makes more sense, needs more thinking)
>>>> (C3) There is at least one triple s p o in P(BGP) such that either s
>>>> or o occurs in sk(SG).
>>>>
>>>> Going back to the examples, if we have the query:
>>>> SELECT ?ind ?class WHERE { ?ind a ?class }
>>>> and data:
>>>> [[
>>>> ex:a a ex:b .
>>>> ]]
>>>> as in you generated output, then you would get under OWL RL
>>>> ?ind/ex:a, ?class/ex:b
>>>>
>>>> For the query
>>>> SELECT ?r WHERE { ex:a ?r ex:a }
>>>> over
>>>> [[
>>>> ex:a ex:b ex:c .
>>>> ]]
>>>> you would get
>>>> ?r/owl:sameAs
>>>> no mater whether there is ex:something owl:sameAs ex:something or not.
>>>>
>>>> It is still possible to generate artificial examples, but less so. E.g.,
>>>> SELECT ?type WHERE { rdf:type a ?type }
>>>> over the empty graph gives you nothing since C3 cannot be satisfied in
>>>> an empty graph. If we query over
>>>> [[
>>>> ex:b a ex:c.
>>>> ]]
>>>> we get
>>>> ?type/rdf:Property
>>>> C2 holds because rdf:Property is in vocab (assuming we do RDF(S) entailment)
>>>> C3 holds because the instantiated BGP is rdf:type a rdf:Property and
>>>> the subject rdf:type occurs in its abbreviated form in your data. That
>>>> seems a nicer compromise. If we don't exclude axiomatic triples, we
>>>> get infinite answers, and if we exclude some, we get some non-local
>>>> side effects, but that cannot be totally avoided.
>>>>
>>>>> Well... this is clearly not our decision. To be formal, we should
>>>>> definitely flag that as an issue to be discussed. In some ways, the
>>>>> question is: is it better to have many potential responses (ie, the user
>>>>> will have to filter things out) or a small number though some expected
>>>>> results will not be returned (only via ugly tricks).
>>>>
>>>> Yes.
>>>> I would prefer better too many than too few. Tools will hopefully
>>>> provide a way to be configured such that they hide what I don't want
>>>> to see, while still giving me the chance to see it if I want to.
>>>>
>>>> [snip]
>>>>>>                                         Another would be to define
>>>>>> something like "extensible entailment regimes", where the entailment
>>>>>> regime is a combination of one of the defined semantics plus some
>>>>>> rules/axioms that the endpoint will always apply, e.g., the SKOS
>>>>>> axioms are always assumed to be present.
>>>>>
>>>>> I think the answer is: this is where RIF comes in. That gives me the
>>>>> necessary flexibility (and I can always simulate OWL 2 RL level with a
>>>>> RIF rule set).
>>>>
>>>> Yes. That would be nice.
>>>>
>>>>> B.t.w.: a point here for the future RIF discussion:
>>>>>
>>>>>  - we have something for OWL 2 RL
>>>>>  - we will have something for RIF (hopefully)
>>>>>  - there is a RIF document that, essentially, defines a RIF rule set for
>>>>> OWL 2 RL
>>>>>
>>>>> Surely the result of the entailment should be the same whether I use the
>>>>> OWL 2 RL entailment definition or the RIF one and use the official rule
>>>>> set...
>>>>
>>>> Yes. It would be not nice at all if that were not the case.
>>>>
>>>> Cheers,
>>>> Birte
>>>>
>>>>> Cheers
>>>>>
>>>>> Ivan
>>>>>
>>>>>> Birte
>>>>>>
>>>>>>>> We might still want to exclude axiomatic triples unless they occur in
>>>>>>>> the input because they potentially add lots of answers that you don't
>>>>>>>> really want.
>>>>>>>
>>>>>>> See above. But if I am wrong, I do not mind that either.
>>>>>>>
>>>>>>>> Another possibility would be to apply C2 only to variables in subject
>>>>>>>> and object position and allow anything in the predicate position. That
>>>>>>>> still can cause counterintuitive side effects, but many more cases are
>>>>>>>> covered. In that case, inconsistencies need extra care because in
>>>>>>>> principle this would still allow infinite answers if the given graph
>>>>>>>> is inconsistent and we just assume the scoping graph to be equivalent
>>>>>>>> to the queried graph no matter what.
>>>>>>>
>>>>>>> I guess you had that in one of the first drafts and we did not really
>>>>>>> like it:-(
>>>>>>> [snip]
>>>>>>>>>
>>>>>>>>> Sigh.
>>>>>>>>
>>>>>>>> Sigh too.
>>>>>>>
>>>>>>> :-)
>>>>>>>
>>>>>>> Ivan
>>>>>>>
>>>>>>>> Birte
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ivan
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> http://www.w3.org/TR/2009/REC-owl2-rdf-based-semantics-20091027/#Appendix:_Axiomatic_Triples_.28Informative.29
>>>>>>>>> [2]
>>>>>>>>> http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/#Entity_Declarations_and_Typing
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2010-2-16 18:42 , Birte Glimm wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>> I have committed a new version of the entailment regimes document:
>>>>>>>>>> http://www.w3.org/2009/sparql/docs/entailment/xmlspec.xml
>>>>>>>>>>
>>>>>>>>>> There is now a description of the OWL RDF-Based Semantics incl. the
>>>>>>>>>> OWL 2 RL profile. The OWL 2 RL profile can also be used with Direct
>>>>>>>>>> Semantics, so I have added that there too. Further I have added a
>>>>>>>>>> section about aggregates with RDF(S) entailment, addressing at least
>>>>>>>>>> parts of Axel's comments (no owl:sameAs discussion yet for
>>>>>>>>>> aggregation). I also defined the behaviour for inconsistent graphs
>>>>>>>>>> more clearly because the previous spec didn't define the scoping graph
>>>>>>>>>> in the case of inconsistencies. It was rather assumed that the scoping
>>>>>>>>>> graph is still equivalent to the active graph, so that systems can
>>>>>>>>>> just use the graph as is modulo bnode renaming, but that allowed
>>>>>>>>>> infinite answers for inconsistent graphs. I now use Axel's suggestion
>>>>>>>>>> for condition C2 and require not only bindings for variables inn
>>>>>>>>>> subject position to occur in the input, but require this for all
>>>>>>>>>> variables. This also solves the OWL RDF-Based semantics problem where
>>>>>>>>>> you can have infinite answers from owl:topDataProperty, which relates
>>>>>>>>>> an individual to all data values. Now all RDF-Based regimes (RDF,
>>>>>>>>>> RDFS, OWL 2 RDF-Based (for OWL Full and OWL RL)) use the same
>>>>>>>>>> definitions, which is nice IMO.
>>>>>>>>>>
>>>>>>>>>> Birte
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>>>> mobile: +31-641044153
>>>>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>>>>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>> mobile: +31-641044153
>>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>> Home: http://www.w3.org/People/Ivan/
>>>>> mobile: +31-641044153
>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>
>>>
>>
>>
>> --
>> Dr. Birte Glimm, Room 306
>> Computing Laboratory
>> Parks Road
>> Oxford
>> OX1 3QD
>> United Kingdom
>> +44 (0)1865 283529
>>
>
> --
>
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF   : http://www.ivan-herman.net/foaf.rdf
> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Friday, 19 February 2010 10:57:52 UTC