- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Tue, 12 Dec 2006 16:30:09 +0000
- To: "Bob MacGregor" <bmacgregor@siderean.com>
- Cc: "Eric Prud'hommeaux" <eric@w3.org>, "axel@polleres.net" <axel@polleres.net>, "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>, "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>
On Dec 7, 2006, at 5:07 PM, Bob MacGregor wrote: > I realize that the UNSAID issue is closed. However, it shouldn't > have been. I agree with this, but not for the reasons given below. As I pointed out on behalf of Axel, > Here's what Pat Hayes said about it: > > > If SPARQL contains UNSAID then it will be inconsistent with any > account of meaning which is based on the RDF/RDFS/OWL normative > semantics. This will not render SPARQL unusable, but it will > place it > outside the 'semantic web layer cake' and probably lead to the > eventual construction of a different, and rival, query language for > use by Web reasoners. Let me just interject that this argument does nothing for me. I totally fail to see why having *convenience syntax* for expressivity *already* in the language (via unbound) magically breaks the architecture. (Oh, and I'm unconvincable on this point, Pat, so please don't bother :)) Generally, it's much easier and safer to shove funky expressivity into query languages than it is to do so in the representation formalism itself. And frankly, if I were going to bolt and try to set up a SPARQL rival, "UNSAID" wouldn't weigh it *at all*. Turtle syntax is way more likely to send *me* running :) > Unfortunately, Pat has things exactly backwards. The omission of > UNSAID > INCREASES the odds that a rival query language will be constructed > for use by Web reasoners, for the simple reason that UNSAID is useful > for a great many things. Expressively useful, yes. > Almost all LARGE SCALE reasoning assumes both > unique name assumption and closed world assumption. Even with the qualifier "Almost" I don't believe this is true, and if it happened to be true, it's not the case that UNA and CWA necessarily help the scale of reasoning. RDF and RDFS has neither and Steve Harris has reported a store holding over 1.5 billion triples and responding in user sufficient real time to various classes of SPARQL queries. Pellet has a UNA mode which can actually slow down reasoning :) (It's funny, sometimes adding negation, e.g., disjoints, can radically speed up reasoning (hey! the clash is obvious!) or it can slow things down (damn, I have to try a bunch of node merges that fail)) > The numbers simply > make it impractical to assert owl:differentFrom or > owl:maxCardinality > over and over. This is a different issue. If you want UNA and CWA it is certainly pragmatically a non-answer to say, "Hey, you can add it yourself!", at least, IMHO. > The "semantic web layer cake" that Pat refers to is > currently designed so that it only applies to relatively small > knowledge > sets (say, below 1 million triples). I have no idea what the grounds for this are, but it certainly seems to me to be false in many points. I believe Pellet, Racer, FaCT++ and KAON2 can scale, given reasonable server hardware, to more than a "million triples" (but this is meaningless anyway; much depends on the *nature* of the axioms encoded by those triples). If they cannot, it's *really* hard to see how adding UNA or some sort of *non-mon* (non-mon makes things harder!!!) would magically help. Even restricting to a CWA, it's hard to see how it would help. (And I'd need to hear in considerable more detail exactly what sort of CWA you wish to add to SHOIN to make it "scale better".) Finally, current SPARQL has Logspace datacomplexity, which indicates that SPARQL query answering is in relationalland. There are a number of fragments of OWL that either have logspace datacomplexity (e.g., DL Lite, RDFS(DL)) or PTime data complexity (e.g., EL++, DLP, and hornSHIQ) at least for various key inference services. (E.g., EL++ is PTIme complete in the size of data for consistency, subsumption, concept satisfiability, and instance checking (not surprisingly, given their interreducibility) but for conjunctive query, it's only known that it's PTime-hard. None of these impose UNA or CWA. Of course, some of them are rather weak on equality, which helps. This is, of course, ignoring new research. To pick just one newer one, we see from IBM: <http://iswc2006.semanticweb.org/items/Kershenbaum2006qo.pdf> I direct your attention to table 2. A 6 million role assertion ontology (with many type assertions as well) can be checked for consistency in 485 seconds. While they list the expressivity as SHIN, I think it's problably hornSHIQ, so if you were tuned for that, you might do much better. I'll also note that the times as you scale up the LUBM in that chart is roughly linear. This doesn't mean that you can do useful arbitrary conjunctive query on SHIN or hornSHIQ kbs, but it's highly suggestive. No UNA or CWA. I'll also point out that this is preliminary work, and using a version of Pellet that I *know* is not nearly pushing the boundaries of want moderately good engineering, much less novel techniques, can do. This is a complex topic with a lot of parameters left underspecified at the moment. But I believe the considerations I've raised as sufficient to block your reasoning. I'd be very interested in a survey of large scale, in your sense, reasoning that goes beyond database expressivity (and beyond datalog expressivity) and was shown to be hopeless without CWA and UNA and when these were added in, they became all peanuts and good cheer. I would be interested in just one example, actually. > That means that rival standards that > CAN handle larger datasets are certain to emerge. I fail to see how lack of an explicit construct for negation as failure will change the scalability of SPARQL one whit. I do think that the lack is unfortunate for users who could put it to good use, as I've argued before. I dislike having such a feature accessible in a contorted way. But eh. Let the vendors support an alternative. There's not so many we couldn't get agreement, especially with a simple rewriter adding compatibility. If we had a nice XML syntax, this would be an easy XSLT. My recommendation to the DAWG is that if you are not moved by the user considerations to reopen the issue (which is reasonable), then there is no "scalability" argument of any sort that counts as new information. And frankly, if I were you all, I'd dismiss scare tactic arguments *either way* out of hand. Though, aherm, perhaps with a bit of that thing they call "tact". :) Cheers, Bijan.
Received on Tuesday, 12 December 2006 16:32:34 UTC