Re: [Fwd: Comments on SPARQL] (entailment, soundness, completeness) from Enrico Franconi on 2005-09-19 (public-rdf-dawg-comments@w3.org from September 2005)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Tue, 20 Sep 2005 01:12:44 +0200
To: Pat Hayes <phayes@ihmc.us>
Cc: Ian Horrocks <horrocks@cs.man.ac.uk>, public-rdf-dawg-comments@w3.org, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <e44acda900752b9bd11a6a6cc9ab3cd3@inf.unibz.it>
On 19 Sep 2005, at 21:09, Pat Hayes wrote:
> Moreover, contrary to the implied claims in your note above, this 
> 'logical' point of view about querying is not by any means universal, 
> and has many disadvantages. (The most glaring disadvantage - alone 
> enough to render it unusable - is that it requires any conforming 
> SPARQL engine to also be a theorem-prover.) There is a different 
> tradition and view of querying, incorporated into SQL, which arises 
> from database technology, in which querying is seen as a process of 
> applying filters to a large set of what are essentially vectors of 
> items; the meaning of the query language, if formalized, would be 
> specified rather like that of an algebraic programming language. (I 
> think of it as the baleen-whale model of querying.) This perspective 
> hardly mentions (what we call) entailment at all, and instead focuses 
> on query operations, seen as information-extraction operations on a 
> databases.

As I explained in the long message on entailment, the main difference 
is that in the relational algebra (SQL) you *don't* allow existential 
variables (aka null values or bnodes) in the answer set. If you take 
the same limitation in SPARQL, I can assure you that we can implement 
it in SQL in half a day. Bnodes in the data model are the real new and 
interesting thing of SPARQL, that requires special attention.
Moreover, I also argued that SQL *is* logic/entailment based 
<http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0450>:

"""
Please note that in the case of the relational model (e.g., SQL) this 
problem does not arise, since there is no equivalent to the bnodes in 
the relational data model, and redundancy never appears in the data and 
the answer sets of arbitrary relational algebra queries are always non 
redundant. Also note that the notion of SQL queries is perfectly 
formalised as entailment (after the seminal works by Ray Reiter in the 
late 70ies), and as a matter of fact when a notion similar to bnodes 
sneaked in the relational model in the form of null values, the logic 
based definition of query answering proved to be extremely useful to 
characterise the various notions of incompleteness implied by the null 
values. So, the peculiarity and interestingness of SPARQL lies in the 
fact that it allows bnodes in the answers.
"""

> It became clear quite early in the SPARQL WG discussions that it was 
> neither practical not desirable to ignore this other tradition, so we 
> were trying to have the best of both worlds for a while. But is simply 
> isn't possible to both kinds of task at the same time. There are many 
> natural uses of a query language, giving rise to design constraints 
> for the language, which not only do not fit naturally into what might 
> be called the 'logical' picture of querying, but are sharply 
> incompatible with it,

Well, we are giving a logic based account to SPARQL that accommodates 
both views. I hope you like it.

>  in that the purpose of the query is to extract information that 
> cannot by any stretch of the imagination be viewed as an entailment by 
> the knowledge base.

In general you are right, but the idea we are proposing is based on the 
simple idea that entailment of ground atomic formulas (the query) from 
a ground set of atomic formulas (the original skolemised graph) is 
basically a syntactic operation, since we are working in a minimal 
model of the original graph. Simple theorems prove that this is exactly 
what you want.

>  (These include such things as the number of matching triples in the 
> graph, the fact that there is no subgraph matching a query, and the 
> size of the graph.)

I see these as logic-based aggregation queries on our approach. By the 
way, in this email I am always talking about RDF and RDFS, nothing 
more.

> RDF querying in SPARQL really is not about checking entailment.

This is only one view, and I accept it, and we have formalised it. 
However, there is also the other dual view, and we have worked to have 
both.

> A SPARQL engine is not required to perform any inferences, to check 
> semantic redundancy, etc. (Though it may, it is not required to.)

This should be decided by the user, I guess.

>  Querying is about trolling through RDF graphs looking for syntactic 
> patterns.

Again, only in the first legitimate view.

> In one very simple case these coincide: simple entailment is having an 
> instance which is a subgraph, the basic case for the SPARQL match, if 
> you think of query variables as a species of bnode. But only in the 
> very simple basic case do you find this coincidence: as soon as things 
> get at all complicated, querying and entailment-checking diverge.

Yea, they coincide only in absence of bnodes in the graph, and for 
simple RDF entailment.

> So, do not think of querying as entailment checking: if you do, it 
> will seem to be broken, and it will have all kinds of apparently 
> irrelevant barnacles attached to it, and it will seem ugly and ad-hoc. 
> Think of it instead as a kind of process of fishing for patterns in 
> RDF graphs. Entailment is managed by inference engines. Querying is a 
> process of trolling a pattern through an RDF graph to see what is in 
> it. Querying is fundamentally not a semantic business; it is 
> essentially syntactic, having to do with constraints on bindings which 
> match patterns.

I tend to disagree with this opinion, since I believe we are building 
the *semantic* web ;-)

cheers
--e.
Received on Monday, 19 September 2005 23:13:23 UTC