- From: Pat Hayes <phayes@ihmc.us>
- Date: Mon, 19 Sep 2005 14:09:20 -0500
- To: Ian Horrocks <horrocks@cs.man.ac.uk>
- Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, Enrico Franconi <franconi@inf.unibz.it>, Bijan Parsia <bparsia@isr.umd.edu>, public-rdf-dawg-comments@w3.org
Ian Horrocks writes: ------ >Two comments/questions: > >Firstly, I strongly support the suggestion to define query answers in >terms of entailment rather than subgraph. Using an entailment based >definition has numerous advantages, and no disadvantages that I can >see. > >Entailment based definition: > >- is widely used and very well understood >- is concise, clear and unambiguous >- builds on existing semantic definitions >- can deal with a wide variety of languages, including RDF, RDFS, OWL, >SWRL and FOL, simply by referring to the existing entailment semantics >for the relevant language > >Subgraph based definition: > >- is not widely used, not well understood, and not *known* to work at >all (at least not for more expressive languages) >- is verbose, obfuscated (it seems capable of confusing even expert >logicians), and may even be ambiguous >- ignores existing semantic definitions >- cannot easily deal with more expressive languages, and may even be >incapable of doing so > >Can someone please explain to me why the subgraph based definition has >been preferred? ------- As one who argued for the subgraph-based definition, allow me to chime in. As I'm sure you know, Ian, since we were co-authors of the DAML query proposal, I too tend to think naturally of a query as a goal to be proved, or as a sentence whose entailment by the knowledge base is being questioned, so that querying is simply a synonym for posing-as-a-goal-to-be-proved, and the entire story is given in terms of logical ideas like entailment. As you say, this has many advantages both for rigor and elegance, and it allows one to parameterize the query with respect to the semantics. It is really very nice. Unfortunately, it simply isn't adequate for the intended uses. This isn't how RDF knowledge bases are being used: or at any rate, it's not the only way. Moreover, contrary to the implied claims in your note above, this 'logical' point of view about querying is not by any means universal, and has many disadvantages. (The most glaring disadvantage - alone enough to render it unusable - is that it requires any conforming SPARQL engine to also be a theorem-prover.) There is a different tradition and view of querying, incorporated into SQL, which arises from database technology, in which querying is seen as a process of applying filters to a large set of what are essentially vectors of items; the meaning of the query language, if formalized, would be specified rather like that of an algebraic programming language. (I think of it as the baleen-whale model of querying.) This perspective hardly mentions (what we call) entailment at all, and instead focuses on query operations, seen as information-extraction operations on a databases. There are probably more people who think of querying and query languages in this way than there are who think of queries in a logical-entailment way, by at least one and probably several orders of magnitude, and a huge and mature technology (and body of theory) which backs it up. It became clear quite early in the SPARQL WG discussions that it was neither practical not desirable to ignore this other tradition, so we were trying to have the best of both worlds for a while. But is simply isn't possible to both kinds of task at the same time. There are many natural uses of a query language, giving rise to design constraints for the language, which not only do not fit naturally into what might be called the 'logical' picture of querying, but are sharply incompatible with it, in that the purpose of the query is to extract information that cannot by any stretch of the imagination be viewed as an entailment by the knowledge base. (These include such things as the number of matching triples in the graph, the fact that there is no subgraph matching a query, and the size of the graph.) Other applications require the ability to specify constraints on answers which, while they could in principle be viewed in semantic terms as embodying conditions involving entailments, would - if one were to attempt to do that - break any existing semantic framework for RDF, RDFS or OWL, since they involve for example complex arithmetical constraints on datatype values which cannot be axiomatized in OWL. <autobio> I began as a member of the DAWG, viewing myself as the defender of the logical-entailment view of what querying was all about. After a brief attempt to either suppress these clearly not-entailment-like constructions, or to modify them to make them more entailment-like, I realized that this was neither useful nor productive. I howled with protest when it was proposed to allow queries which could detect the non-presence of a triple in the graph: this would be non-monotonic inference sneaking in by the back door, aaargh!! But when reasonable people, with real applications, asked me what possible harm could there be in users being allowed to discover such things about the triples in an RDF graph, I had no reasonable reply to make to them, other than a kind of generalized growl about the Semantic Perils of NonMon. One can only get so far with growling; and I quickly realized that the mistake was mine: only only gets the Perils if one thinks of the query answer as being entailed by the graph. But in this case it was perfectly clear to everyone that it wasn't entailed; and also perfectly clear that its was nevertheless useful information, and that people expected to be able to extract it using a query protocol; and that I had no reasonable argument why they should not be allowed to do so. </autobio> RDF querying in SPARQL really is not about checking entailment. A SPARQL engine is not required to perform any inferences, to check semantic redundancy, etc. (Though it may, it is not required to.) Querying is about trolling through RDF graphs looking for syntactic patterns. In one very simple case these coincide: simple entailment is having an instance which is a subgraph, the basic case for the SPARQL match, if you think of query variables as a species of bnode. But only in the very simple basic case do you find this coincidence: as soon as things get at all complicated, querying and entailment-checking diverge. SPARL queries allow the expression of a large number of conditions on matches which do not correspond directly to any kind of RDF/RDFS/OWL entailment; and anything beyond simple entailment is not automatically handled by SPAQRL. So, do not think of querying as entailment checking: if you do, it will seem to be broken, and it will have all kinds of apparently irrelevant barnacles attached to it, and it will seem ugly and ad-hoc. Think of it instead as a kind of process of fishing for patterns in RDF graphs. Entailment is managed by inference engines. Querying is a process of trolling a pattern through an RDF graph to see what is in it. Querying is fundamentally not a semantic business; it is essentially syntactic, having to do with constraints on bindings which match patterns. <autobio> After learning to live with the chagrin of finding that my cherished view of querying was fundamentally inadequate for even quite simple RDF applications in the real world, I have become quite reconciled to this separation between inference/entailment and querying. I think in fact that it is potentially beneficial to all. </autobio> What it does mean, however, is that one has to learn to mentally separate querying from inference. Which, to return to the topic, is why I suggested that the SPARQL documents refer explicitly to matching subgraphs, rather than to simple entailment: precisely because the use of the 'entailment' word is potentially misleading. And I have to say, your message bears out my worries here: for even when we did not use the e-word, you are already misled, as it were pre-emptively. I was afraid this might happen... :-) Hope this helps. I realize this will need more discussion and justification, and does not yet really answer your last question (why?) but at least you might now see that SPARQL is not simply a botched attempt at defining entailment. It has different aims in view. Pat PS. One quick response: you say the subgraph-based definition is "obfuscated (it seems capable of confusing even expert logicians)". What is confusing the expert logicians in this case, it seems to me, is that they are not actually reading the definition as given, but are instead hallucinating their own favorite view of what the definition ought to have been onto the words as actually written: and when they do not fit, they are then (characteristically for expert logicians, in my experience) concluding that the words as written have a silly mistake in them. -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Monday, 19 September 2005 19:12:34 UTC