Re: [Fwd: Comments on SPARQL] (entailment, soundness, completeness) from Pat Hayes on 2005-09-19 (public-rdf-dawg-comments@w3.org from September 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 19 Sep 2005 14:09:20 -0500
To: Ian Horrocks <horrocks@cs.man.ac.uk>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, Enrico Franconi <franconi@inf.unibz.it>, Bijan Parsia <bparsia@isr.umd.edu>, public-rdf-dawg-comments@w3.org
Message-Id: <p0620070bbf4d4aaab5ae@[192.168.1.5]>
Ian Horrocks writes:
------
>Two comments/questions:
>
>Firstly, I strongly support the suggestion to define query answers in
>terms of entailment rather than subgraph. Using an entailment based
>definition has numerous advantages, and no disadvantages that I can
>see.
>
>Entailment based definition:
>
>- is widely used and very well understood
>- is concise, clear and unambiguous
>- builds on existing semantic definitions
>- can deal with a wide variety of languages, including RDF, RDFS, OWL,
>SWRL and FOL, simply by referring to the existing entailment semantics
>for the relevant language
>
>Subgraph based definition:
>
>- is not widely used, not well understood, and not *known* to work at
>all (at least not for more expressive languages)
>- is verbose, obfuscated (it seems capable of confusing even expert
>logicians), and may even be ambiguous
>- ignores existing semantic definitions
>- cannot easily deal with more expressive languages, and may even be
>incapable of doing so
>
>Can someone please explain to me why the subgraph based definition has
>been preferred?
-------

As one who argued for the subgraph-based definition, allow me to chime in.

As I'm sure you know, Ian, since we were co-authors of the DAML query 
proposal, I too tend to think naturally of a query as a goal to be 
proved, or as a sentence whose entailment by the knowledge base is 
being questioned, so that querying is simply a synonym for 
posing-as-a-goal-to-be-proved, and the entire story is given in terms 
of logical ideas like entailment. As you say, this has many 
advantages both for rigor and elegance, and it allows one to 
parameterize the query with respect to the semantics. It is really 
very nice. Unfortunately, it simply isn't adequate for the intended 
uses. This isn't how RDF knowledge bases are being used: or at any 
rate, it's not the only way.

Moreover, contrary to the implied claims in your note above, this 
'logical' point of view about querying is not by any means universal, 
and has many disadvantages. (The most glaring disadvantage - alone 
enough to render it unusable - is that it requires any conforming 
SPARQL engine to also be a theorem-prover.) There is a different 
tradition and view of querying, incorporated into SQL, which arises 
from database technology, in which querying is seen as a process of 
applying filters to a large set of what are essentially vectors of 
items; the meaning of the query language, if formalized, would be 
specified rather like that of an algebraic programming language. (I 
think of it as the baleen-whale model of querying.) This perspective 
hardly mentions (what we call) entailment at all, and instead focuses 
on query operations, seen as information-extraction operations on a 
databases. There are probably more people who think of querying and 
query languages in this way than there are who think of queries in a 
logical-entailment way, by at least one and probably several orders 
of magnitude, and a huge and mature technology (and body of theory) 
which backs it up. It became clear quite early in the SPARQL WG 
discussions that it was neither practical not desirable to ignore 
this other tradition, so we were trying to have the best of both 
worlds for a while. But is simply isn't possible to both kinds of 
task at the same time. There are many natural uses of a query 
language, giving rise to design constraints for the language, which 
not only do not fit naturally into what might be called the 'logical' 
picture of querying, but are sharply incompatible with it, in that 
the purpose of the query is to extract information that cannot by any 
stretch of the imagination be viewed as an entailment by the 
knowledge base. (These include such things as the number of matching 
triples in the graph, the fact that there is no subgraph matching a 
query, and the size of the graph.) Other applications require the 
ability to specify constraints on answers which, while they could in 
principle be viewed in semantic terms as embodying conditions 
involving entailments, would - if one were to attempt to do that - 
break any existing semantic framework for RDF, RDFS or OWL, since 
they involve for example complex arithmetical constraints on datatype 
values which cannot be axiomatized in OWL.

<autobio>
I began as a member of the DAWG, viewing myself as the defender of 
the logical-entailment view of what querying was all about. After a 
brief attempt to either suppress these clearly not-entailment-like 
constructions, or to modify them to make them more entailment-like, I 
realized that this was neither useful nor productive. I howled with 
protest when it was proposed to allow queries which could detect the 
non-presence of a triple in the graph: this would be non-monotonic 
inference sneaking in by the back door, aaargh!!  But when reasonable 
people, with real applications, asked me what possible harm could 
there be in users being allowed to discover such things about the 
triples in an RDF graph, I had no reasonable reply to make to them, 
other than a kind of generalized growl about the Semantic Perils of 
NonMon. One can only get so far with growling; and I quickly realized 
that the mistake was mine: only only gets the Perils if one thinks of 
the query answer as being entailed by the graph. But in this case it 
was perfectly clear to everyone that it wasn't entailed; and also 
perfectly clear that its was nevertheless useful information, and 
that people expected to be able to extract it using a query protocol; 
and that I had no reasonable argument why they should not be allowed 
to do so.
</autobio>

RDF querying in SPARQL really is not about checking entailment. A 
SPARQL engine is not required to perform any inferences, to check 
semantic redundancy, etc. (Though it may, it is not required to.) 
Querying is about trolling through RDF graphs looking for syntactic 
patterns. In one very simple case these coincide: simple entailment 
is having an instance which is a subgraph, the basic case for the 
SPARQL match, if you think of query variables as a species of bnode. 
But only in the very simple basic case do you find this coincidence: 
as soon as things get at all complicated, querying and 
entailment-checking diverge. SPARL queries allow the expression of a 
large number of conditions on matches which do not correspond 
directly to any kind of RDF/RDFS/OWL entailment; and anything beyond 
simple entailment is not automatically handled by SPAQRL.

So, do not think of querying as entailment checking: if you do, it 
will seem to be broken, and it will have all kinds of apparently 
irrelevant barnacles attached to it, and it will seem ugly and 
ad-hoc. Think of it instead as a kind of process of fishing for 
patterns in RDF graphs. Entailment is managed by inference engines. 
Querying is a process of trolling a pattern through an RDF graph to 
see what is in it. Querying is fundamentally not a semantic business; 
it is essentially syntactic, having to do with constraints on 
bindings which match patterns.

<autobio>
After learning to live with the chagrin of finding that my cherished 
view of querying was fundamentally inadequate for even quite simple 
RDF applications in the real world, I have become quite reconciled to 
this separation between inference/entailment and querying. I think in 
fact that it is potentially beneficial to all.
</autobio>

What it does mean, however, is that one has to learn to mentally 
separate querying from inference. Which, to return to the topic, is 
why I suggested that the SPARQL documents refer explicitly to 
matching subgraphs, rather than to simple entailment: precisely 
because the use of the 'entailment' word is potentially misleading. 
And I have to say, your message bears out my worries here: for even 
when we did not use the e-word, you are already misled, as it were 
pre-emptively. I was afraid this might happen... :-)

Hope this helps. I realize this will need more discussion and 
justification, and does not yet really answer your last question 
(why?) but at least you might now see that SPARQL is not simply a 
botched attempt at defining entailment. It has different aims in view.

Pat

PS. One quick response: you say the subgraph-based definition is

"obfuscated (it seems capable of confusing even expert logicians)".

What is confusing the expert logicians in this case, it seems to me, 
is that they are not actually reading the definition as given, but 
are instead hallucinating their own favorite view of what the 
definition ought to have been onto the words as actually written: and 
when they do not fit, they are then (characteristically for expert 
logicians, in my experience) concluding that the words as written 
have a silly mistake in them.
-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 19 September 2005 19:12:34 UTC