- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Tue, 21 Dec 2010 15:04:08 +0000
- To: lenzerini@dis.uniroma1.it
- Cc: Enrico Franconi <franconi@inf.unibz.it>, SPARQL Working Group <public-rdf-dawg@w3.org>
On 21 Dec 2010, at 08:11, Maurizio Lenzerini wrote: > Hi Bijan, > > thank you for you message. Thanks for the reply. > On 12/20/10 6:30 PM, Bijan Parsia wrote: >> Hi Maurizio, >> >> Thanks for the message! >> >> On 20 Dec 2010, at 09:48, Enrico Franconi wrote: >> >>> I forward this message I received from Maurizio Lenzerini. >>> >>> Begin forwarded message: >> [snip] >>>> In all the applications mentioned above, there is a strong need of answering queries with non-distinguished variables. Just to name one interesting scenario where missing non-distinguished variables would be a real problem, consider checking quality/completeness of data. >>>> >>>> The query: >>>> >>>> { x,z | R1(x,y), R2(y,z) } >>>> >>>> tells me which x and z are connected through y, without necessarily knowing who is the y. On the other hand, the query >>>> >>>> { x,y,z | R1(x,y), R2(y,z) } >>>> >>>> tells me for which x,z I KNOW the y. >> >> >> As an example, this isn't really very informative. It's a fake toy example which merely illustrates the difference between the two. > > It is not a fake toy example. Sorry, that came out a bit wrong. In the form you presented it was merely demonstrating an abstract capability rather than an in situ use case. We have tons of abstract examples. > It is one of the examples showing why the query language should allow pure existential variables in the query. It can, in principle, show a difference in expressivity (though we need to see the data). What it doesn't really help is show the field use or necessity. > Say that a customer C is said to be monitored by A (an authority) if it belongs to a group that is monitored by A. I want to know who are the customers monitored by A. The right query is: > > QUERY 1: { x,z | Belongs(x,y), GroupMonitoredBy(y,z) } > > Assume that the result of the query is (C,A): this means that I know that customer C is monitored by authority A, *even if I do NOT know which is the group monitored by A to which C belongs*. I understand. But for this to have an answer the KB must entail that answer. For this to be a true case of non-distinguished variables, it must do so without a binding for y (otherwise, it's just projection). For this to be compelling as a use case, it has to be reasonable prevalent, useful, usable, implementable, without good work arounds, etc. It's the latter stuff we're trying to determine. Is it possible for a standard OWL QL data set to have an answer to this query without a binding for y? Is it common? > Now, suppose that the result of the following query: > > QUERY 2: { x,y,z | Belongs(x,y), GroupMonitoredBy(y,z) } > > does not include the pair (C,A). Now I KNOW that my data is incomplete! And I know it *only because I can compare the result of query 1 with the result of query 2*! Ok, thanks. I understand. The example is still rather incomplete without the TBox at least (so you're measuring whether the data are complete wrt a schema). This is somewhat specialized, yes? I.e., it's not straightforward query answering but an analytical task? > Indeed, without the answer to QUERY 1, why should I conclude that my data are incomplete? It is because I know that the pair (C,A) is in the result of QUERY 1 (meaning that there is SOME group G representing the "bridge" between C and A), and not in the result of QUERY 2 (meaning that I do not know such group G) that I have the proof that data are incomplete! I understand, thanks! Cheers, Bijan.
Received on Tuesday, 21 December 2010 15:05:05 UTC