Re: [TF-ENT] Entailment regimes doc update from Ivan Herman on 2010-02-18 (public-rdf-dawg@w3.org from January to March 2010)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 18 Feb 2010 09:47:02 +0100
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>, Chimezie Ogbuji <ogbujic@ccf.org>
Message-ID: <4B7CFE86.7070404@w3.org>
Good morning Birte,

On 2010-2-17 21:12 , Birte Glimm wrote:
> [snip]
>> But yes. Many extra triples are generated for this case. I used my own
>> OWL 2 RL stuff
>>
>> http://www.ivan-herman.net/Misc/2008/owlrl/
>>
>> to generate
>>
>> http://tinyurl.com/ygtuodk
> 
> That's a nice little tool. I knew you had it, but I hadn't seen the
> web interface before.
> 

Thanks. Actually, I wonder whether we would not make a more systematic
use of it for our tests... we shall see.


>>>> Wouldn't that work? Would that bring in so many extra triples?
>>> Better than before I guess.
>>
>> You do not seem to be convinced:-( And you are probably right...
> 
> I am not 100% happy. I think it is better because I can imagine tools
> that allow you to choose between getting all triples and getting only
> those that use the input vocabulary. Thinking even more about this,
> here is another refinement that might capture even better what most
> users would expect.
> (C2) For each variable x in V(BGP), sk(μ(x)) occurs in sk(SG) or in Vocab.
> (C3) For each triple s p o in P(BGP), either s or o occurs in sk(SG).
> 

So... I wonder whether we are not going down the false route here. The
_only_ requirement we have is to ensure finiteness. (C2) ensures that,
but we could, maybe, make it even more lax by concentrating only on the
rdf:_i properties and disallow any of those in the conclusion that do
not appear in the original graph. (This is what ter Horst does.). That
might ensure finiteness.

The result set will be large? Sure it will. But this _is_ what the
RDF/OWL semantics dictates, isn't it, so why would we want to second
guess the user? Does he/she wants to control the size of the output?
Well, that is why, in their infinite wisdom, the authors of SPARQL
invented FILTER-s...:-)

If I look at the particular example that you gave, here is a modified
version:

SELECT ?ind ?class
WHERE {
  ?ind a ?class .
  FILTER(
     !regex( str(?ind), "^http://www.w3.org/2000/01/rdf-schema#" ) &&
     !regex( str(?ind), "^http://www.w3.org/1999/02/22-rdf-syntax-ns#")
  )
}

with the data:

ex:a a ex:C .

and, I believe, with RDFS entailment we would get what we want.

So I believe we may want to examine an alternative route:

- define (C2) to be the absolute strict minimum to ensure a finite solution
- look at the existing FILTER possibilities to see if we can handle
those common use cases. We may want to propose some shortcuts, like the
one above, ie, some sort of an operation which says

inNamespace( ?x, URI )

which means that ?x as a term starts with the URI.

What do you (and others!) think?

I have the gut feeling it will also help in defining the RIF
alternative, b.t.w.

Ivan


> Here, C2 is very lax and it allows almost all axiomatic triples, but
> still guarantees finiteness of the answers in all cases. Now C3
> basically says that if the instantiated BGP has nothing to do with
> your data, then omit it. I omit the predicate as an option in C3
> since predicates such as rdf:type occur in most graphs (possibly by
> implicit triples as in OWL) and will then let through most axiomatic
> triples. Answers that are then filtered out are basically axiomatic
> triples that you didn't mention at all. This could be relaxed to (not
> sure which one makes more sense, needs more thinking)
> (C3) There is at least one triple s p o in P(BGP) such that either s
> or o occurs in sk(SG).
> 
> Going back to the examples, if we have the query:
> SELECT ?ind ?class WHERE { ?ind a ?class }
> and data:
> [[
> ex:a a ex:b .
> ]]
> as in you generated output, then you would get under OWL RL
> ?ind/ex:a, ?class/ex:b
> 
> For the query
> SELECT ?r WHERE { ex:a ?r ex:a }
> over
> [[
> ex:a ex:b ex:c .
> ]]
> you would get
> ?r/owl:sameAs
> no mater whether there is ex:something owl:sameAs ex:something or not.
> 
> It is still possible to generate artificial examples, but less so. E.g.,
> SELECT ?type WHERE { rdf:type a ?type }
> over the empty graph gives you nothing since C3 cannot be satisfied in
> an empty graph. If we query over
> [[
> ex:b a ex:c.
> ]]
> we get
> ?type/rdf:Property
> C2 holds because rdf:Property is in vocab (assuming we do RDF(S) entailment)
> C3 holds because the instantiated BGP is rdf:type a rdf:Property and
> the subject rdf:type occurs in its abbreviated form in your data. That
> seems a nicer compromise. If we don't exclude axiomatic triples, we
> get infinite answers, and if we exclude some, we get some non-local
> side effects, but that cannot be totally avoided.
> 
>> Well... this is clearly not our decision. To be formal, we should
>> definitely flag that as an issue to be discussed. In some ways, the
>> question is: is it better to have many potential responses (ie, the user
>> will have to filter things out) or a small number though some expected
>> results will not be returned (only via ugly tricks).
> 
> Yes.
> I would prefer better too many than too few. Tools will hopefully
> provide a way to be configured such that they hide what I don't want
> to see, while still giving me the chance to see it if I want to.
> 
> [snip]
>>>                                         Another would be to define
>>> something like "extensible entailment regimes", where the entailment
>>> regime is a combination of one of the defined semantics plus some
>>> rules/axioms that the endpoint will always apply, e.g., the SKOS
>>> axioms are always assumed to be present.
>>
>> I think the answer is: this is where RIF comes in. That gives me the
>> necessary flexibility (and I can always simulate OWL 2 RL level with a
>> RIF rule set).
> 
> Yes. That would be nice.
> 
>> B.t.w.: a point here for the future RIF discussion:
>>
>>  - we have something for OWL 2 RL
>>  - we will have something for RIF (hopefully)
>>  - there is a RIF document that, essentially, defines a RIF rule set for
>> OWL 2 RL
>>
>> Surely the result of the entailment should be the same whether I use the
>> OWL 2 RL entailment definition or the RIF one and use the official rule
>> set...
> 
> Yes. It would be not nice at all if that were not the case.
> 
> Cheers,
> Birte
> 
>> Cheers
>>
>> Ivan
>>
>>> Birte
>>>
>>>>> We might still want to exclude axiomatic triples unless they occur in
>>>>> the input because they potentially add lots of answers that you don't
>>>>> really want.
>>>>
>>>> See above. But if I am wrong, I do not mind that either.
>>>>
>>>>> Another possibility would be to apply C2 only to variables in subject
>>>>> and object position and allow anything in the predicate position. That
>>>>> still can cause counterintuitive side effects, but many more cases are
>>>>> covered. In that case, inconsistencies need extra care because in
>>>>> principle this would still allow infinite answers if the given graph
>>>>> is inconsistent and we just assume the scoping graph to be equivalent
>>>>> to the queried graph no matter what.
>>>>
>>>> I guess you had that in one of the first drafts and we did not really
>>>> like it:-(
>>>> [snip]
>>>>>>
>>>>>> Sigh.
>>>>>
>>>>> Sigh too.
>>>>
>>>> :-)
>>>>
>>>> Ivan
>>>>
>>>>> Birte
>>>>>
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>> [1]
>>>>>> http://www.w3.org/TR/2009/REC-owl2-rdf-based-semantics-20091027/#Appendix:_Axiomatic_Triples_.28Informative.29
>>>>>> [2]
>>>>>> http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/#Entity_Declarations_and_Typing
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2010-2-16 18:42 , Birte Glimm wrote:
>>>>>>> Hi all,
>>>>>>> I have committed a new version of the entailment regimes document:
>>>>>>> http://www.w3.org/2009/sparql/docs/entailment/xmlspec.xml
>>>>>>>
>>>>>>> There is now a description of the OWL RDF-Based Semantics incl. the
>>>>>>> OWL 2 RL profile. The OWL 2 RL profile can also be used with Direct
>>>>>>> Semantics, so I have added that there too. Further I have added a
>>>>>>> section about aggregates with RDF(S) entailment, addressing at least
>>>>>>> parts of Axel's comments (no owl:sameAs discussion yet for
>>>>>>> aggregation). I also defined the behaviour for inconsistent graphs
>>>>>>> more clearly because the previous spec didn't define the scoping graph
>>>>>>> in the case of inconsistencies. It was rather assumed that the scoping
>>>>>>> graph is still equivalent to the active graph, so that systems can
>>>>>>> just use the graph as is modulo bnode renaming, but that allowed
>>>>>>> infinite answers for inconsistent graphs. I now use Axel's suggestion
>>>>>>> for condition C2 and require not only bindings for variables inn
>>>>>>> subject position to occur in the input, but require this for all
>>>>>>> variables. This also solves the OWL RDF-Based semantics problem where
>>>>>>> you can have infinite answers from owl:topDataProperty, which relates
>>>>>>> an individual to all data values. Now all RDF-Based regimes (RDF,
>>>>>>> RDFS, OWL 2 RDF-Based (for OWL Full and OWL RL)) use the same
>>>>>>> definitions, which is nice IMO.
>>>>>>>
>>>>>>> Birte
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>> mobile: +31-641044153
>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>>>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>>
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF   : http://www.ivan-herman.net/foaf.rdf
>> vCard  : http://www.ivan-herman.net/HermanIvan.vcf
>>
>>
> 
> 
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF   : http://www.ivan-herman.net/foaf.rdf
vCard  : http://www.ivan-herman.net/HermanIvan.vcf
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 18 February 2010 08:44:31 UTC