Re: [TF-ENT] Entailment regimes doc update from Ivan Herman on 2010-02-19 (public-rdf-dawg@w3.org from January to March 2010)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 19 Feb 2010 09:26:34 +0100
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>, Chimezie Ogbuji <ogbujic@ccf.org>, Axel Polleres <axel.polleres@deri.org>
Message-ID: <4B7E4B3A.9010209@w3.org>
I am not sure, procedurally, what the next step is. Would we need a nod
from the WG to make these changes on your draft, or would you just go
ahead and do it? It affects the (C2) condition on previously published
entailment regimes, too.

Personally, I would prefer some other 'go-ahead' reactions from all
parties interested... You can also raise this shortly on the next telco
if there is time enough (noting that this document still has the
'time-permitting' label attached to it...)

Ivan

On 2010-2-18 13:18 , Birte Glimm wrote:
> [snip]
>>> I am not 100% happy. I think it is better because I can imagine tools
>>> that allow you to choose between getting all triples and getting only
>>> those that use the input vocabulary. Thinking even more about this,
>>> here is another refinement that might capture even better what most
>>> users would expect.
>>> (C2) For each variable x in V(BGP), sk(μ(x)) occurs in sk(SG) or in Vocab.
>>> (C3) For each triple s p o in P(BGP), either s or o occurs in sk(SG).
>>>
>> So... I wonder whether we are not going down the false route here. The
>> _only_ requirement we have is to ensure finiteness. (C2) ensures that,
>> but we could, maybe, make it even more lax by concentrating only on the
>> rdf:_i properties and disallow any of those in the conclusion that do
>> not appear in the original graph. (This is what ter Horst does.). That
>> might ensure finiteness.
> 
> C2 is that lax at the moment since we said vocab is the reserved
> vocabulary for the regime you are using minus the rdf:_i and variable
> binding come either from there or from the queried graph, so if the
> graph talks about rdf:_3, then rdf:_3 is allowed. Ter Horst goes even
> a little bit further if I remember correctly by taking the largest n
> such that rdf:_n occurs in the input and allows all rdf:_m with m<=n.
> 
>> The result set will be large? Sure it will. But this _is_ what the
>> RDF/OWL semantics dictates, isn't it, so why would we want to second
>> guess the user? Does he/she wants to control the size of the output?
>> Well, that is why, in their infinite wisdom, the authors of SPARQL
>> invented FILTER-s...:-)
> 
> That's a good point. It might be a bit slower to first add all the
> results and then remove them again, but in the end there are not that
> many. (More than a user might have expected, but in total numbers not
> many).
> 
>> If I look at the particular example that you gave, here is a modified
>> version:
>>
>> SELECT ?ind ?class
>> WHERE {
>>  ?ind a ?class .
>>  FILTER(
>>     !regex( str(?ind), "^http://www.w3.org/2000/01/rdf-schema#" ) &&
>>     !regex( str(?ind), "^http://www.w3.org/1999/02/22-rdf-syntax-ns#")
>>  )
>> }
>>
>> with the data:
>>
>> ex:a a ex:C .
>>
>> and, I believe, with RDFS entailment we would get what we want.
> 
> True.
> 
>> So I believe we may want to examine an alternative route:
>>
>> - define (C2) to be the absolute strict minimum to ensure a finite solution
>> - look at the existing FILTER possibilities to see if we can handle
>> those common use cases. We may want to propose some shortcuts, like the
>> one above, ie, some sort of an operation which says
>>
>> inNamespace( ?x, URI )
>>
>> which means that ?x as a term starts with the URI.
>>
>> What do you (and others!) think?
> 
> I think that could work out very nicely :-)
> 
> Birte
> 
>> I have the gut feeling it will also help in defining the RIF
>> alternative, b.t.w.
>>
>> Ivan
>>
>>
>>> Here, C2 is very lax and it allows almost all axiomatic triples, but
>>> still guarantees finiteness of the answers in all cases. Now C3
>>> basically says that if the instantiated BGP has nothing to do with
>>> your data, then omit it. I omit the predicate as an option in C3
>>> since predicates such as rdf:type occur in most graphs (possibly by
>>> implicit triples as in OWL) and will then let through most axiomatic
>>> triples. Answers that are then filtered out are basically axiomatic
>>> triples that you didn't mention at all. This could be relaxed to (not
>>> sure which one makes more sense, needs more thinking)
>>> (C3) There is at least one triple s p o in P(BGP) such that either s
>>> or o occurs in sk(SG).
>>>
>>> Going back to the examples, if we have the query:
>>> SELECT ?ind ?class WHERE { ?ind a ?class }
>>> and data:
>>> [[
>>> ex:a a ex:b .
>>> ]]
>>> as in you generated output, then you would get under OWL RL
>>> ?ind/ex:a, ?class/ex:b
>>>
>>> For the query
>>> SELECT ?r WHERE { ex:a ?r ex:a }
>>> over
>>> [[
>>> ex:a ex:b ex:c .
>>> ]]
>>> you would get
>>> ?r/owl:sameAs
>>> no mater whether there is ex:something owl:sameAs ex:something or not.
>>>
>>> It is still possible to generate artificial examples, but less so. E.g.,
>>> SELECT ?type WHERE { rdf:type a ?type }
>>> over the empty graph gives you nothing since C3 cannot be satisfied in
>>> an empty graph. If we query over
>>> [[
>>> ex:b a ex:c.
>>> ]]
>>> we get
>>> ?type/rdf:Property
>>> C2 holds because rdf:Property is in vocab (assuming we do RDF(S) entailment)
>>> C3 holds because the instantiated BGP is rdf:type a rdf:Property and
>>> the subject rdf:type occurs in its abbreviated form in your data. That
>>> seems a nicer compromise. If we don't exclude axiomatic triples, we
>>> get infinite answers, and if we exclude some, we get some non-local
>>> side effects, but that cannot be totally avoided.
>>>
>>>> Well... this is clearly not our decision. To be formal, we should
>>>> definitely flag that as an issue to be discussed. In some ways, the
>>>> question is: is it better to have many potential responses (ie, the user
>>>> will have to filter things out) or a small number though some expected
>>>> results will not be returned (only via ugly tricks).
>>>
>>> Yes.
>>> I would prefer better too many than too few. Tools will hopefully
>>> provide a way to be configured such that they hide what I don't want
>>> to see, while still giving me the chance to see it if I want to.
>>>
>>> [snip]
>>>>>                                         Another would be to define
>>>>> something like "extensible entailment regimes", where the entailment
>>>>> regime is a combination of one of the defined semantics plus some
>>>>> rules/axioms that the endpoint will always apply, e.g., the SKOS
>>>>> axioms are always assumed to be present.
>>>>
>>>> I think the answer is: this is where RIF comes in. That gives me the
>>>> necessary flexibility (and I can always simulate OWL 2 RL level with a
>>>> RIF rule set).
>>>
>>> Yes. That would be nice.
>>>
>>>> B.t.w.: a point here for the future RIF discussion:
>>>>
>>>>  - we have something for OWL 2 RL
>>>>  - we will have something for RIF (hopefully)
>>>>  - there is a RIF document that, essentially, defines a RIF rule set for
>>>> OWL 2 RL
>>>>
>>>> Surely the result of the entailment should be the same whether I use the
>>>> OWL 2 RL entailment definition or the RIF one and use the official rule
>>>> set...
>>>
>>> Yes. It would be not nice at all if that were not the case.
>>>
>>> Cheers,
>>> Birte
>>>
>>>> Cheers
>>>>
>>>> Ivan
>>>>
>>>>> Birte
>>>>>
>>>>>>> We might still want to exclude axiomatic triples unless they occur in
>>>>>>> the input because they potentially add lots of answers that you don't
>>>>>>> really want.
>>>>>>
>>>>>> See above. But if I am wrong, I do not mind that either.
>>>>>>
>>>>>>> Another possibility would be to apply C2 only to variables in subject
>>>>>>> and object position and allow anything in the predicate position. That
>>>>>>> still can cause counterintuitive side effects, but many more cases are
>>>>>>> covered. In that case, inconsistencies need extra care because in
>>>>>>> principle this would still allow infinite answers if the given graph
>>>>>>> is inconsistent and we just assume the scoping graph to be equivalent
>>>>>>> to the queried graph no matter what.
>>>>>>
>>>>>> I guess you had that in one of the first drafts and we did not really
>>>>>> like it:-(
>>>>>> [snip]
>>>>>>>>
>>>>>>>> Sigh.
>>>>>>>
>>>>>>> Sigh too.
>>>>>>
>>>>>> :-)
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>>> Birte
>>>>>>>
>>>>>>>>
>>>>>>>> Ivan
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> http://www.w3.org/TR/2009/REC-owl2-rdf-based-semantics-20091027/#Appendix:_Axiomatic_Triples_.28Informative.29
>>>>>>>> [2]
>>>>>>>> http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/#Entity_Declarations_and_Typing
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2010-2-16 18:42 , Birte Glimm wrote:
>>>>>>>>> Hi all,
>>>>>>>>> I have committed a new version of the entailment regimes document:
>>>>>>>>> http://www.w3.org/2009/sparql/docs/entailment/xmlspec.xml
>>>>>>>>>
>>>>>>>>> There is now a description of the OWL RDF-Based Semantics incl. the
>>>>>>>>> OWL 2 RL profile. The OWL 2 RL profile can also be used with Direct
>>>>>>>>> Semantics, so I have added that there too. Further I have added a
>>>>>>>>> section about aggregates with RDF(S) entailment, addressing at least
>>>>>>>>> parts of Axel's comments (no owl:sameAs discussion yet for
>>>>>>>>> aggregation). I also defined the behaviour for inconsistent graphs
>>>>>>>>> more clearly because the previous spec didn't define the scoping graph
>>>>>>>>> in the case of inconsistencies. It was rather assumed that the scoping
>>>>>>>>> graph is still equivalent to the active graph, so that systems can
>>>>>>>>> just use the graph as is modulo bnode renaming, but that allowed
>>>>>>>>> infinite answers for inconsistent graphs. I now use Axel's suggestion
>>>>>>>>> for condition C2 and require not only bindings for variables inn
>>>>>>>>> subject position to occur in the input, but require this for all
>>>>>>>>> variables. This also solves the OWL RDF-Based semantics problem where
>>>>>>>>> you can have infinite answers from owl:topDataProperty, which relates
>>>>>>>>> an individual to all data values. Now all RDF-Based regimes (RDF,
>>>>>>>>> RDFS, OWL 2 RDF-Based (for OWL Full and OWL RL)) use the same
>>>>>>>>> definitions, which is nice IMO.
>>>>>>>>>
>>>>>>>>> Birte
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>>> mobile: +31-641044153
>>>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>>>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>> mobile: +31-641044153
>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>>
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>
>>
> 
> 
> --
> Dr. Birte Glimm, Room 306
> Computing Laboratory
> Parks Road
> Oxford
> OX1 3QD
> United Kingdom
> +44 (0)1865 283529
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF   : http://www.ivan-herman.net/foaf.rdf
vCard  : http://www.ivan-herman.net/HermanIvan.vcf
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Friday, 19 February 2010 08:24:03 UTC