- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Tue, 12 Dec 2006 20:33:59 +0000
- To: "Bob MacGregor" <bmacgregor@siderean.com>
- Cc: "Eric Prud'hommeaux" <eric@w3.org>, "axel@polleres.net" <axel@polleres.net>, "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>, "public-rdf-dawg@w3.org" <public-rdf-dawg@w3.org>
On 12 Dec 2006, at 19:06, Bob MacGregor wrote: > > Dear Bijan, > > Thanks for responding. Since I only *outlined* Well, not by any understanding of "outlined" that I have, but I accept your point that your email contained an enthymeme. > my arguments against SPARQL, it is easy to misinterpret them. I interpreted your email as written. Better than that I cannot do. > I will try to expand on them a bit. > > A good query language should has the following attributes (among > others): > > (1) It should be expressive > (2) Its operators should be composable, and minimal (except for > sugar-coating) > (3) Its should be pleasantly readable > > For the uses we employ in a logic language, we add a couple more: > (4) It should support aggregation > (5) It should scale to very large datasets > > SQL is an example of a language that meets all of these criteria. I guess 3 is in the eye of the beholder. > Datalog satisfies all except that my impression is that it doesn't > handle > aggregation. Depends on the query language. You could use SQL like queries against a Datalog KB (depending on the exact flavor). > So we have proof of concept. > > I would be happy to use either of these with RDF. Instead, we have > SPARQL. > SPARQL violates all but the last. Unless you mean something very specific, then it's hard to argue that SPARQL is not expressive. It is, of course, easy to argue that it lacks expressivity that is vital to you. > Violating any of 1, 2, or 3 is grounds for divorce. > > Expressivity: > > In our RDF applications, we routinely use AND, OR and NOT(UNSAID) > operators, with > arbitrary nesting. Even excluding UNSAID, we can't express our > queries in SPARQL, due to the artificial limitations it places on > the OR (UNION) operator. We frequently nest AND > inside of UNSAID, and less often use OR and UNSAID. So, SPARQL > doesn't even come close to satifying our expressivity requirements. Don't see what this has to do with UNA and CWA and scaling. But I accept that SPARQL does not have the expressivity you need. Sorry about that. > Composable Operators: > > "Composable" means that operators can be combined to produce useful > expressions. > Only the AND is composable in SPARQL. The UNION operator has > artificial syntactic > restrictions that don't belong. It doesn't have NOT/UNSAID, so > composability can't be > discussed there. Instead of a few composable operators, SPARQL has > a kitchen sink > of non-composable ones. Please see the new algebra based on the excellent work in: http://iswc2006.semanticweb.org/items/Arenas2006bv.pdf > Pleasantly Readable > > Turtle syntax. 'nuff said. Not my favorite either. But it's workable enough for simple things. Ok, that's one hell of an enthymeme. > Now I will address a few specific comments: > >>> I totally fail to see why having *convenience syntax* for >>> expressivity ... > > I assume you are considering the UNSAID operator to be "convenience > syntax". Yes. > If we ignore nested AND/OR/UNSAID within UNSAID, then it is indeed > a convenience syntax. That assumes a SPARQL mind-set, which I am > not assuming. I don't even understand that. Hmm. Ok, I see. In this context, I believe we were discussing *even* UNSAID that was just an abbreviation for UNBOUND (plus a some stuff). My point against Pat is that once UNBOUND is in there, it hardly seems worse to give it convenience syntax. Additional expressivity has to be evaluated on its own terms, e.g., complexity, implementation impact, use cases, etc. Since we are on a list discussing SPARQL, if you have other assumptions I think you need to make them clear. > I believe its not the case that > OPTIONAL/UNBOUND can simulate UNSAID expressions with compound > arguments. That's possible. I'd need to know more about the particular expressions you have in mind. >>> Even with the qualifier "Almost" I don't believe this is true, >>> and if it happened to be >>> true, it's not the case that UNA and CWA necessarily help the >>> scale of reasoning. > > They help if the alternative is using millions of > owl:minCardinality and owl:disjointWith statements to accomplish > the same thing. Not clear. Actually, you seem to be rather randomly spraying constructs around. minCardinaility? disjointWith? Don't you mean differentFrom? What I didn't get was that you separated reasoning in general (and how it scaled) with reasoning where the intent is to have UNA and CWA. These are different things. > No one I know does that, instead they either assume UNA/CWA or they > don't use aggregation. Aggregation is a separate issue, though I see now that it was central to your thinking when you wrote your prior email. Aggregation is, of course semantically tricky for the languages we might query with a SPARQL variant. For example, many people think that BNodes in results are countable (and we do have a way of counting them...but they cannot merely denote distinct objects, obviously, so they have to be counting something else, e.g., "answers"; of course, how users will interpret that is a different issue; see the archives for discussion of DISTINCT and REALLYREALLYDISTINCT and friends). > Also, they either have millions of owl:maxCardinality > statements or they aren't using owl:allValuesFrom. Well, our experiences obviously differ. I cannot say how valid yours are, or how general, since there isn't any data for me to look at. >>> RDF and RDFS has neither and Steve Harris has reported a store >>> holding over 1.5 billion >>> triples and responding in user sufficient real time to various >>> classes of SPARQL queries. > > My guess is that they aren't computing any aggregate expressions. Certainly true, but that's not because of the difficulty per se. He *is* treating the data more or less in a database fashion, I believe. >>> This is a different issue. If you want UNA and CWA it is >>> certainly pragmatically a non-answer >>> to say,"Hey, you can add it yourself!", at least, IMHO. > > I'm not sure if you are saying "its OK to add UNA and CWA yourself". What? No. I'm saying that if you want your KB to be governed by the UNA and CWA, i.e., it's part of your modeling, it's a total nonanswer to say "Oh, just add a bunch of inequalities and use nominals to close your domain, and the roles, and the concepts". Just as I think it's rather a non-answer to say, "Oh, no UNSAID of any sort for you! just simulate it with UNBOUND" even for the cases where it's possible. However, I also want the group to finish :) And they do too. UNSAID can be added as an extension and standardized later. In any case this (that it's bloody insane to have to manually impose the UNA when you intend the UNA) is a vastly different claim than "reasoning won't scale without UNA and CWA" which is, in fact, what you claimed. > If you are, then you are > abandoning basic precepts of SPARQL and the "semantic layer cake". ? Well, frankly, I don't care about the semantic layer cake a whit. I don't know what the "basic precepts" of SPARQL is. There is a problem if you want these things in RDF or OWL as the specs don't support them. However, as I said, Pellet has a mode that allows the UNA and we've extended it in a number of ways with nonmon constructs. I tried to force some of the issues to a head in this group, since I think we do have an implicit UNA that covers BNodes as well. (Note: I didn't plan to go down that route...I was working it out as we went along as is too often the case.) And I think that's the right choice, given what implementations and users actually do. But I wish the W3C would face that fact and try to come up with a plan for dealing with it. > Plus, you are inventing your > own semantics. However, WE are adding UNA and CWA ourselves, > because we have no choice. I observed Mike Dean using UNA in a > large-scale application that we worked on together, and he was not > apologetic, but he did acknowledge that doing so inherently > contradicts OWL semantics (since it is non-monotonic). It would be > much preferable if we could use UNA and CWA in combination with RDF > and have Pat's blessing. I fail to see what Pat's blessing has to do with it. However, if you want to interoperate, it would be a good idea to write down in a somewhat specy form what you are doing. Make it a W3C submission if you'd like, which would give it visibility. > From the IBM paper you reference: >>> We apply the filtering criteria described in Section 3.2 to a >>> canonical summary >>> Abox AŒ. For correctness with respect to cardinality >>> restrictions, we need to >>> augment the canonical summary Abox with role assertion >>> statistics, since role >>> assertions are merged by the summary Abox transformation. For >>> each role R >>> that is part of a cardinality restriction, we associated with R >>> the maximum >>> number of R-neighbors that any individual a has in A. > > I can't be sure, Why not? > but it sounds like they are asserting maxCardinality restrictions No. > (a few hundred thousand?) into the ABox. No. > If so, they have frozen the ABox. Not really. (And this is the *summary* not the actual ABox.) > If their > application has no updates, then that's valid. This is "inside" a reasoner, roughly speaking. It's not done at the user level. The process would update if there were a change to the ontology, just the way a tableau reasoner would have to update its completion graph. > However, every application we work with > allows for introduction of new data (updates). In that case, the > operation of adding maxCardinality assertions would have to be > done and undone over and over. We find that assertion/deletion is > an expensive operation. Well, it depends. They are trying to be very careful about making these internal operations relatively cheap and a lot could be done with a maintenance approach (as we've applied to completion graphs). > So, I would claim that IBM's results are valid for non-static > applications only if the time to make and unmake the maxCardinality > assertions is factored into their timings. They of course didn't > do that, because they were looking for low numbers, rather than > realistic ones. Before you libel people, why not try to understand the work? At this point, I am continuing in the expectation that you will recognize that this was offensive and apologize. >>> Finally, current SPARQL has Logspace datacomplexity > > The only complexity that matters is average case complexity. I suspect you don't actually mean "average case complexity" in the technical sense, e.g.,: http://www.uncg.edu/mat/avg.html Since it's perfectly possible to have a good average case complexity with respect to some distribution and have *all the real problems people want to solve* to be in the worst case (or in a worse case). > Anything with average-case complexity of N-squared or greater is > non-scalable. An interesting opinion. For what distribution? > Logspace complexity doesn't meet that criteria, so is irrelevant. So relational databases aren't scalable? Interesting. Datalog isn't scalable? Interesting. > Later on you quote some empirical results, which reflect average > case complexity, I see, you do mean "common user case" complexity, not average case complexity. (After all, the samples weren't randomly generated, so how can they reflect average case complexity!) I think you didn't really understand the data complexity argument. My point was that if you have logspace *data* (worst case) complexity, then you are roughly in the same boat as relational databases. (Of course, you may have a different average case complexity, or "real case" complexity, etc. etc. etc.). My point was that there are clearly KR languages without CWA and UNA which plausibly scale. Now, they may give up other things that you want, like aggregation, but since your email didn't *mention* aggregation, I don't think I'm at fault for not getting this. And there's an issue about what goes into the query language vs. what goes into the representation language. > so those are > indeed relevant. However, I acknowledge that SPARQL can scale; its > OWL that can't scale, That's what I've shown *not* to be the case, at least for OWL DL (well, minus nominals in the IBM case; but in any case, reasoners like Pellet, KAON2, FaCT++, and racer do better than you seem to expect, and could do better still). > unless > you ignore its core operators, I suggest you examine EL++, which is indeed a fragment of OWL DL, but a very expressive one (it does give up allValuesFrom, but includes nominals and GCIs of various sorts). I don't know if allValuesFrom is a "core" operator, since I don't know there's a technical meaning for "core" (EL++ is generally accepted as a very expressive, if subboolean, description logic). If you fit into EL++ great. If you don't, then a lot depends on your particular problem. > or freeze your Abox (a bit of a contractiction with the notion > of "large scale"). Well, since this was based on your misunderstanding, I don't see it bolsters your case at all. > Why is OWL relevant to SPARQL? Well, Pat was using the "semantic > layer cake" as an excuse for requiring open world semantics. Given > that the upper, open-world layers do not scale, we would prefer to > see adoption of upper closed-world layers that do scale. Well, again, your assertion is unsupported to say the least, at least for the general case. Again, I totally concede that *if you want CWA and UNA in your KB* for modeling purposes, then it's really mean to say, "Oh, encode it yourself". But if you say, "The reason I want CWA and UNA is so that it'll SCALE" then you have not made the case in the least. Even for aggregation (though there is surely issues with what aggregation *means* in the case of OWL, it's hardly sensible to claim that a little CWA and UNA sprinkled haphazardly will clarify everything. Afterall, nonmon generally makes reasoning *harder*, although <http://logcom.oxfordjournals.org/cgi/content/abstract/ 6/2/295> is interesting!) Before replying I would suggest that you read my initial email a little more carefully, and check out some of the pointers too with a bit more care. (Note, of course I don't mean to suggest that existing reasoners definitely scale to your problem! I don't know what your problem is! It would require some examination to determine what's going on.) And I would appreciate more precision in the problem statements. (I.e., if this is about aggregation, let's stick to aggregation.) But, this is not the appropriate list. public-owl-dev might be more appropriate as we are discussing a possible next OWL working group. A UNA flag, at least, would be quite an easy and reasonable addition. At OWLED 2006 we discussed some variants of constraints (a la the K and A operators) which might help you. I doubt that these would get into the very next version of OWL as things are still unclear at this stage, but they are certainly something we might be able to get de facto support across the board in tools. But coming back to *this* list, I don't see that the group has to do much in reply to your general problems with SPARQL since it's unlikely that such changes will be made in this iteration. UNSAID is only reasonably considered since there *is* a version which is just syntactic sugar, thus, in a sense, doesn't require a large revision to the language. But even that probably won't get in now, so the sensible move is a W3C note and lobbying the implementors. Aggregation is a bit better shape for "no reasoning" SPARQL as Jorge Perez et al have done a very nice job of specifying how many answers to expect (for graph matching SPARQL). Obviously, there are a host of issues when you pop this up to OWL, but for that, I suggest OWLED as the appropriate venue. I'll be trying to get together a "SPARQL/DL" document for it, so there's interest and attention. Cheers, Bijan.
Received on Tuesday, 12 December 2006 20:35:08 UTC