- From: Pat Hayes <phayes@ihmc.us>
- Date: Fri, 4 Aug 2006 04:01:46 -0700
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- Cc: Eric Prud'hommeaux <eric@w3.org>, Enrico Franconi <franconi@inf.unibz.it>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
>I think you're confused too, Pat. Perhaps, but let us get down to details. >>This would therefore require that the two kinds of pattern be given >>fundamentally different semantics: intuitively, one might say that >>a bnode means 'some thing exists', but a query variable means 'some >>named thing exists', where the presence of the name is what makes >>the answer binding possible; and a bnodeID does not count as a >>'name' in the required sense. > >I'll note that restricting the domain of bindings (and thus of >correct answers) is significant even if you don't return >"existential" bindings. That is, in general, queries with only >distinguished variables (or, as it's sometimes said, when variables >are ranging over the "active domain Can you give references for all this terminology that you cite? What exactly is the "active" domain? There is nothing in any semantic theory that I know of that distinguishes *things in the domain* on the basis of the kind of name that is used to refer to them with. The idea does not make sense, in any case: if bnodes were obliged to refer to a non-active domain while names refer to something else, then the troublesome redundancies would be eliminated. >) are generally easier to answer. Even in RDF, considering only >simple entailment, this is true (as is shown by the work we need to >do to control redundancy). Redundancy is a red herring here. Of course bnodes introduce new possibilities for semantic redundancy in answer bindings (and in RDF graphs themselves), but that does not in itself make answer computation harder. What it does make harder is computing *nonredundant* answer sets, of course: but then one might (and I would) argue that to require non-redundancy in answer sets when the original graphs may contain redundancies is the poor design decision which produces the difficulty here. And of course if the language admits true equality, even ground answers may contain semantic redundancies, and the computational burden of delivering nonredundant answer sets is overwhelming, quite independently from anything to do with bnodes. In all these cases, the task of computing nonredundant answer sets is on a par with the task of computing a nonredundant subset of a possibly redundant KB. IMO, none of this should be required by the specs defining query answering, which should be allowed to 'pass through' any redundancy which may be present in the source KB. This would then allow quite trivial extensions of the query strategies for the ground 'database' case to be used virtually unchanged with RDF, simply by treating source KB bnodes identically with URIrefs in the binding process. >>(3) This idea is given terminological flesh by the distinction, >>introduced in >>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0192 >>between 'distinguished' and 'undistinguished' variables, and the >>observation that SPARQL currently uses 'semi-distinguished' >>variables (which can bind to both "names" and bnodeIDs). In >>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0018 >>this distinction is justified on the grounds that bnodes are not >>"names", so that it would not be appropriate to supply a bnodeID as >>an answer binding to a query variable; or at any rate, as is >>well-known to the cognoscenti, this is simply not the way that >>things are done: > >Maybe *this* is the appropriate condescending comment? Its tone was admittedly an implied rebuke to that used in these messages. >>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0031. > >Do you *deny* that semi-distinguished variables are novel? Yes. I have never previously heard of this terminology of "distinguished" vs. "nondistinguished". (You have everyone's permission at this point to roll your eyes in amusement at my profound ignorance, of course.) I would be interested to see where this terminology was first used, and what its history is. In a database context where there are no bnodes, the distinction would be vacuous. To me, the obvious generalization of the notion of binding a variable to a name in a DB, when we extend this to RDF, is to have variables which bind to URIrefs, literals and bnodes, since these are the three kinds of 'name' in RDF. The idea of a variable which binds only no non-bnodes strikes me as rather ridiculous, on the face of it, in an RDF context: it would mean that answer binding was not preserved under simple entailment, for example. > Note that I've never argued against them. I think that BNodes in >answers can be very useful for expressing coreference between >answers. But they are novel. Ah, wait. Bnodes are novel, I agree. That is, bnodes make RDF into something other than simply a simplified database format. In fact, Bnodes mean that one has to re-think quite a lot of DB-inspired ideas when trying to apply them to RDF, which is why for example SPARQL is not simply SQL. But that is a different issue. Where was the 'distinguished variable' contrast used before there were bnodes? Were all variables 'distinguished' ? >>(4) None of the reasons cited in the previous paragraph seem to me >>to be convincing objections to the obvious reply to the objection >>in (2), which is that bnodeIDs *can* be given as bindings to query >>variables; > >Since I don't see that they were given as objections to all that, I >fail to see why you might have *expected* them to be convincing. Then I apologize for my misunderstanding of your messages, as this seemed to be their main substantive point. >In that debate, the main point was 1) explicating the behavior of Pellet OK, let me just bow silently and pass on this, as the behavior of Pellet seems to me not a particularly interesting matter. > and 2) pointing out that whether the variables are >semi-distinguished or not is a function of the way that the semantic >framework of sparql is instantiated. If there are no BNodes in the >scoping set, perforce the query variables are distinguished. True, although this seems to me to be an odd way to put it, since this restriction doesn't seem to me to provide any change in the status of the *variables*, or even the definitions of binding, etc..; but whatever. >There *IS NO SPECIFIED SPARQL SEMANTIC FOR OWL DL*, and so Pellet >implements the behavior that is most common in the DL community OK, fine. No need to shout. I really don't care what Pellet does; and since, as you point out, there is no specification for OWL-DL-SPARQL, whether or not Pellet would conform to it if there were one seems not worth discussing. >, and those that are best understood. BNodes are always >nondistinguished in SPARQL. > >(Racer and KAON2 support only distinguished variables. So if we are >to describe the current STATE OF THE ART by looking at *existing >implementation*, then we have to at least *support* the notion of >distinguished variable. I don't mind supporting the notion in the sense of making it *possible* for a query engine to treat variables in this way, but I don't like writing this restriction into the semantics. These restrictions don't have a semantic justification, but a pragmatic one. It seems to me that we should just call such reasoners incomplete. There's no stigma associated with being incomplete. >Having only distinguished variables makes a query *MUCH* easier to answer.) Well, yes, but only in the sense that work is a lot easier if you simply refuse to do it. The answers obtained by binding variables to bnodes are after all entailed by the KB, so to simply refuse to deliver them just because it makes the implementors life easier seems to me to be a little like, in Russell's phrase, extolling the advantages of theft over honest toil. Which is fine, let me add, as long as we are being honest about it. I don't have any objection to someone advertising a query service with the caveat that it does not deliver existential answers. But I do object to the spec itself being written so as to make this the standard, just because it makes implementor's lives easier. And at least, if an engine is capable of answering an existential ASK query correctly, it seems to me that it should be capable of providing a bnode as an answer binding to the corresponding SELECT. >>and since this is so, any engine which can determine the entailment >>of a pattern with a bnode, could deliver such a binding as an >>answer to the similar query pattern with a variable in place of the >>bnode. That is, if ASK PATTERN[_:x] (with _:x scoped to the >>pattern) is entailed, then SELECT ?x PATTERN[?x] should succeed, >>possibly with ?x bound to some suitable bnodeID generated by the >>querying engine. Adopting this as a design principle for querying >>languages in the RDF family would, I suggest, bring some overall >>regularity and simplicity to the spectrum of SPARQL versions > >At some other costs. I believe them to be relatively small costs. If you think I am wrong, I would welcome detailed refutation of my argument, which admittedly is sketchy. >>(with different values for the entailment and scoping set >>parameters.) The fact that bnodes exist in the language means that >>purely existential answers can indeed be given by the same >>variable-binding mechanism as is used to deliver name bindings. In >>other words, there should be no difference between 'a thing exists >>such that...' and 'a named thing exists such that...', since the >>"name" can itself be a bnodeID, if necessary one generated for the >>sole purpose of being a binding for the query variable. > >No one denied that this is possible, and that having >semidistinguished variables can be both interesting and useful. >Indeed, as I said one of my posts you sneered at. I do not think I have sneered at anything you have written. >But forcing people to always check for unnamed bindings makes query >answering *much much* harder. Again, I am still unconvinced. Harder, I grant you, but not the *much much*. At the absolute worst case I can imagine, it would seem to be by a linear factor of N, where N is 2|(n-1) and n is the number of variables in the query pattern. But this is using a ridiculously simple method, and Im sure that a little analysis could make it into a close-to-trivial extra cost. And if we are willing to abandon the strict no-redundancy requirement on answer sets, which I think we should in any case, then the whole computation gets a lot easier. >It's not even known if it's decidable in full generality for OWL DL. > >>Let me call this the 'existential name principle', as it amounts to >>the claim that the RDF language family makes no distinction between >>existence and named existence; since if something exists, the >>presence of bnodes means that we can, in effect, validly generate a >>name for it. > >I was going to refute this, but I think I'll just laugh at it. I wish you would try to refute it, because it is in fact correct in logic and in RDF. If there is some peculiarity of description logics which makes it false there, I would very much like to have that pointed out explicitly. Think of this as an opportunity to educate, rather than laugh disdainfully at the ignorance of your correspondents. >>In the rest of this message I will defend this idea against the >>objections raised in the above messages, and try to distinguish >>this issue from others with which it might be confused. >> >>(5) One objection to this idea, cited earlier, is that it embodies >>a logical confusion: bnodes are existentially bound *variables*, >>and are therefore not logical names. Now, while it is of course >>true that bnodes and URIrefs/literals (which I will collectively >>call 'names' here) have subtly different truth conditions, they are >>not so different as this contrast suggests. First, calling one a >>'name' and the other a 'variable' is merely a matter of surface >>terminology. > >Simple entailment shows that this is false. Nonsense. >>Many logical syntaxes make a lexical distinction but this is not >>necessary, and many others, such as ISO Common Logic, do not: a >>'variable' then is simply a name that is bound by a quantifier. > >That's quite a difference! It is when you get to the truth-conditions for the quantifier: but while you are inside the scope of the quantifier, they are literally indistinguishable. And there are no quantifiers in RDF. Neither the syntax not the semantics makes any distinction between name and variable: all that matters is the free/bound distinction, which cannot be determined locally but must be provided from outside the immediate linguistic context. Which is my only point: the bnode/URIref distinction in RDF is essentially a scoping difference. >>Second, more to the point, they have essentially the same >>semantics: they both in effect assert that something exists, and >>use a lexical token to refer to it. > >Well, one is a singular term and the other isn't. Singular? Im not sure what you mean, but neither of them are singular terms in the linguistic sense, which implies uniqueness. >Big difference to me. > >>(It is worth noting that there are logical syntactic conventions >>which do not use lexical tokens *of any kind* to show existence, >>such as Peirce's existential graphs.) The only difference is that >>the scope of a name token might extend beyond the local syntactic >>context, allowing other assertions to be made about the thing >>denoted by the name; whereas an existential 'variable' is a lexical >>token only within the context defined by the scope of the >>quantifier. In natural deduction logical systems, logical names and >>existentials are almost interconvertible, using the rules EI and EG: >> >>PHI[name] >>----------------EG >>(exists (x)PHI[x]) >> >>(exists (x)PHI[x]) >>--------------------EI >>PHI[name*] >>when name* does not occur free in the derivation, ie is a 'new' name. > >And boy, does that make a HUGE DIFFERENCE. To what, exactly? Not to the semantics, not to the classical proof theory. The difference is so small that some textbook logics don't even bother to make the distinction. I don't see why it is such a very big deal for query answering, since the cost of generating a gensym bnodeID is trivial (use URIrefs if you get desperate.) So, if you could replace bluster with argument, maybe we could get something useful out of this conversation. Im sure you are right that it makes some difference to something, and maybe for DLs the distinction is more critical and important than it is in classical logics or RDF: if so, I'd be glad to be told why and how. (Later: your reference to the Baader work on defaults helps answer this, thanks for that pointer.) >Ok, I stop now. > >Aside from this logical sophistry on Pat's part, the brute fact is >that, even aside from the logical and computational differences, >BNodes are pragmatically quite different. Try reusing one from an >answer set in a fresh query. Try pasting one into the address bar of >a web browser. Er... both suggestions are meaningless. Being blank, they have no textual form to do such things with. If you mean bnodeIDs, then yes, you need to take careful heed of the scoping rules in place for these IDs, defined by the document conventions for the particular class of IDs. That is why you should not be able to do the first (although we did consider the possibility, as you know, under pressure from Maryland), since the new query sets up a new document scope. And of course Web browsers are not set up to reason with bnodes. >[snip a bizarre account of BNodes. I agree it is rather bizarre at first reading. It is, however, accurate, and reflects quite a lot of hard thinking by me and others that went into the design of the RDF graph syntax and semantics. And since SPARQL is chartered to be a query language for RDF, not for description logics, it is therefore relevant. >>(6) Another objection seems to be that to allow variables to bind >>to bnodes somehow introduces great additional complexity to the >>query-answering process. It seems to me that this must be wrong, > >First, the core point is that having any level of >non-distinguishness most certainly adds to the complexity. Whether >fully non-distinguished or semi-distinguished, it's just much harder >than having only fully distinguished variables. For example, it's >not known if undistinguished variables in the query, plus transitive >roles in the query, is decidable for OWL DL. Is it known whether the corresponding inference problem with bnodes in place of the variables is decideable? If not, this is not relevant to my point. If so, what is wrong with my argument? >Many bright people for several years have been working on that. In >the context of rules, this is very clear, SHOIQ + DL Safe rules >(SWRL rules where the variables are restricted to distinguished >variables) are decidable, whereas SWRL is not. Open defaults when >combined with a DL may be undecidable, whereas open defaults with >the variables treated as distinguished are (fairly robustly) >decidable. I'll let you google for DL Safe rules, but here's the >defaults link: > http://citeseer.ist.psu.edu/baader95embedding.html > >So, *never* allowing the variables to be purely distinguished makes >queries harder to answer even when we know how to answer them. This >is why racer only does distinguished variables. ? That seems like a nonsequiteur to me, but let it pass. >Some optimizations are very hard to work for nondistinguished >variables. I can speak from personal (and Evren's) experience that >having only distinguished variables makes things much easier. Im sure that is so. My argument was whether the difficulties of handling bindings to bnodes (which I agree is nonzero cost) are really as disastrous as you indicate. Bear in mind, I am *assuming* that an engine exists which can properly answer ASK queries with existentials in them. As I understand the argument given by Enrico, SPARQL for OWL-DL must be allowed to have queries with bnodes in them because these can be addressed by methods which would fail when the bnode is replaced with a query variable which can be bound to bnodes. My argument is that there must be something wrong with this argument, since one can define a successful engine of the second kind (that at times will deliver a bnode binding to a variable) *in terms of* one of the first kind, with what appears to be a small extra computational cost factor. I may be wrong, in which case I would be glad to see my mistake pointed out. But I am not claiming that existential query answering is inherently easy to do. It may be that the difference between us is that when I refer to an engine which allows bnode bindings to query variables, I do not mean to imply anything about completeness relative to a logic. I am making a purely pragmatic point about how to deliver answers to queries. > Heck they make it easier even in RDF as it avoids the need, on the >one hand if you allow full simple entailment (i.e., infinite bnodes >in the scoping set) having to minimize or leanify, or, on the other >hand, having to do the extra work the semantic framework does to >ensure restricting answers to told redundancy. If DISTINCT is >interpreted to eliminate told redundancy (which I would think it >should Well, I disagree, but your next point ... >) then having BNodes in answers makes things harder. ... I concede: if you want to require nonredundancy, then bnodes make your implementor's life harder. But the other side of this point is, that not providing this information means that your answering engine is seriously incomplete, and will refuse to provide information to a user when in fact it would be easy to answer the query. There is an inherent trade-off here between the implementor's workload and user's convenience, not surprisingly. If I ask for all people who own a blue car and live in Nantucket, and am told there are no answers, I will be seriously inconvenienced if I have to also ask if anyone *exists* who owns a blue car in Nantucket, in order to find out about the bnodes that your scared-to-be-redundant engine has refused to tell me about, even though the KB contained an explicit assertion to this effect. And the point is that the bnodes in RDF really are there: they aren't going to go away, so the fact that efficient engines exist which ignore them seems to me to be rather beside the point. >>because one can define the variable-binding option in terms of the >>other. Suppose that a query engine is capable of generating answers >>to ASK queries containing bnodes, then it can deliver appropriate >>answers to SELECT queries by the following pseudocode, where >>(gensymBnode) generates a new bnodeID and P[a/b] means the pattern >>P with a replaced by b: >> >>if ( ASK P[(gensymBnode)/?x] ) then return (bind ?x (gensymBnode)) >> >>Of course, this needs to be written more carefully to account for >>multiple variables, but you get the idea. > >I get the idea that you don't know what you're talking about, or, at >least, haven't thought it through. This would eliminate coreference >*between answers* which is the coolest feature of semi-distinguished >variables, but has a cost. I have thought coreference through, but I wanted to keep the email short. Coreference between answers in the same answer set would not be handled by this, indeed, but then it would not be handled by a simple ASK query either; and of course it would be completely impossible if the variables were required to never be bound to bnodes. I wasn't suggesting this as a practicable technique, only as an existence proof to counter the claim that allowing bnode bindings necessarily made everything much, much, harder. (And havnt you rather changed sides here? I thought you were arguing against allowing bnodes as bindings at all, but suddenly you seem to be arguing for the most sophisticated kind of such binding imaginable, where entire patterns of bnode bindings can be used to give subtle information about coreference between answers.) > Oh right, you've been confused on the scope of BNodes in answers >before, as I recall. >And accounting for coreference *within* a answer across distinct >variables is also non-trivial, and different from an ASK query. True, so in the worst case this would require a set of ASK queries with different patterns of bnode substitution for the variables in the pattern. This is still at worst a small integer multiplier. And the basic point is that this is what the query user is going to have to do, to make sure of getting all the answers: so even this slightly ludicrous algorithm can be viewed as a kind of helpful built-in user script for covering all the cases, when faced with a dumb-ass query engine that refuses to give all the answers that it can first time around. If the complexity is intrinsic and we can do no better, then so be it: but SOMETHING has to be able to deliver all the right answers SOMEHOW. Why should it necessarily fall to the user to have to scan through all the combinatorics, tying up web traffic with almost-duplicate queries? > (For one, if you have two answers, one with BNode cross reference >between distinct variables and one without, you have one more answer >than for just the ASK query. When each answer has complexity >NEXPTIME or much worse, well, practically speaking that can be a >pain. Practically speaking, I agree. But it *could* be done, Im sure it could be done a lot better than this with some thought. Thinking of this took about a minute. >) > >>And as written this does require doing an ASK for each pattern of >>variables in the query; but this is at worst a small multiple >>linear extra cost, and need only be done in cases where a real >>SELECT query on the variable has failed; and I am sure a more >>efficient method can be defined in practice, since the derivation >>of the inferences supporting the existential query will in any case >>involve generating suitable 'names'. > >The devil is in that "suitable", and you haven't thought it through. Again. Then please enlighten me. Seems to me on the face of it that one name is pretty much like another; and since a bnode/existential can be used to generalize a variety of individuals in different interpretations, why does this not provide a simple strategy: by default, assume that the answer will be an existential; in the case where all interpretations use the same identifier, use that instead. We aren't here talking about the innards of the reasoner, after all, only about what name it provides as output. >> (I can kind of see that in the presence of elaborate equality >>reasoning, problems might arise in determining whether or not >>multiple bindings are really the same. But even here, the >>possibility of providing a mere existential name - a bnode - does >>seem to provide an opportunity for computational savings with a >>possible loss of completeness but without sacrificing correctness.) > >A lot of things "seem" a certain way to you, but alas are not the >way they seem to you. No doubt. But since your insight into these matters is clearly superior to mine, perhaps you could instruct me, rather than simply rejoicing in my ignorance. >>(7) Finally, there seems to be a misapprehension that because any >>graph entails infinitely many other bnode-rich graphs which have it >>as an instance, that to allow bnodes as bindings will somehow open >>an infinite floodgate of redundant answers. > >If care is not taken, yes. But really the problem is determining >what the *right* number of answers is, and that's nontrivial. Well, we have the option, when writing the specs, to decide this by decree, as it were. For RDF/RDFS it is fine to simply restrict to the bnodes which actually occur in the scoping graph, i.e. to treat bnodes just like other names. (My own preference for the actual spec would be to not specify this *at all*, on the grounds that there is no 'right' answer for all purposes, but only remark on the possibilty of redundancy in answer sets and say that engines *should* (not *must*) reduce redundancy as far as possible, with the understanding that to eliminate it completely may be impractical. But this is not the position that the WG has adopted.) >It's *much* easier with only distinguished variables. Its much easier for the implementors, sure. But that is a bit like saying that a programming language spec should keep the language as simple as possible in order to make the compiler-writer's job easier, regardless of whether it can be used to write useful programs. If you look at it from the point of view of someone who wants a query answered, having to repeat every SELECT as a similar, but not identical, ASK in order to be told about the bnodes in the KB seems like an unacceptable cost to me. I would rather the engine at the other end did all that grunt work for me. >>And this would be true, if the spec were written in a very naive >>way so that entailment alone were a sufficient criterion for >>correctness of an answer binding. But this, of course, is why the >>SPARQL specs are written in the way that they are, to restrict >>answer bindings to a particular scoping set of possible bindings - >>in the case of normative SPARQL, the URIs and Bnodes which actually >>occur in the scoping graph. > >And this is computationally simpler than requiring true semantic >minimization, but it does mean that DISTINCT needs to be specified >correctly. If DISTINCT really does mean DISTINCT, then it will be a >much harder operation when dealing with BNodes. OK, then we need to be careful when describing DISTINCT. I would be happy to give DISTINCT a naif treatment in which it is essentially lexical-plus-datatypes. I'm not saying this all doesn't need care, but I still think that this option is better than just pretending that bnodes don't really exist. >If we don't allow that, then people have to work in user code to >figure out which Bnoded answers are actually redundant...which is >unfortunate in my opinion. If they care that much about non-redundancy, yes, they will. But if they don't, they won't. Which seems to me exactly as it should be. Its not the job of a query engine to clean up a messy KB or to provide guarantees of absolute perfection when the world is full of imperfect data. In many cases even in traditional DBs there is actual redundancy which never gets detected, which is why I get so many duplicate emails. But the world gets along, wasting some small percentage of effort in the name of an easy life. >>The separation between the scoping graph and the scoping set was >>written into the spec to allow for alternative strategies for >>handling OWL. Bijan notes that the spec as written therefore allows >>'OWL-DL-SPARQL' - which, to emphasize, is not itself defined by the >>spec - to restrict itself to bindiings to an arbitrarily small set >>of legal answer bindings, and therefore to restrict itself so as to >>disallow bindings to bnodes. This is technically correct, but >>wholly against the design intention of those definitions. > >Well, since Enrico was involved in the design of those definition, >and I know that THIS WAS HIS INTENT, you're just dead wrong. > >Again. > >And really, you're also full of crap to be pulling the "design >intention" move. How slimy. ?? I am genuinely amazed. I was after all involved in the design, and in fact I suggested the 'scoping set' construction; and certainly the above reflects my intentions (and my thinking during the admittedly complex process involving Enrico and others). Evidently Enrico and I weren't in total communication about this, but we were under a lot of pressure to get something set into a final form, so the first time we managed to agree, we stopped. But I am still amazed. I don't recall Enrico ever expounding this restriction-to-no-bnodes idea during the discussions, and I am sure I would have reacted strongly if I had heard or read it; but again my memory might be faulty (and I did miss some of the telecons, admittedly). If query variables cannot be bound to bnodes, then there is no need for the scoping graph at all, so the definitions could have been greatly simplified; and getting things like scoping between multiple answers straight was a large part of the entire discussion, which would all have been quite pointless if no bnode bindings to variables were being considered. As I understood the discussion, it began with the 'little-house' example which shows that OWL-DL-RDF could entail an existential (ie a pattern instance with a bnode) even when the original KB graph has no name to bind anything to, and more pointedly that no kind of 'closure' would provide such a name, so there can be no 'OWL-DL entailment lemma' like the RDF and RDFS entailment lemmas in the RDF semantics. So, the argument went, we must allow 'genuine existentials', ie bnodes, in the query pattern. Now, the idea of a 'scoping graph' was introduced into the SPARQL definitions to keep the scope of bnodeIDs in the answer distinct from those in the KB (actually, more correctly, to allow for the possibility of their being distinct), with the idea that the bnodes used as answer bindings would be drawn from this graph; and when it was pointed out that together with the above little-house point, this meant that to restrict to bnodes in any kind of KB - even a 'closure' under some rule set - would not give us enough bnodes to give all answer bindings, I suggested loosening the requirement to virtual vacuity, to allow *some specified set* of identifiers (which includes bnodes, in this mathematical level description); and honestly, what I was thinking of there was that the scoping set could be *bigger* than the set gotten by taking the identifiers in any kind of virtual graph, not radically smaller than it; since the problem that it was being put there to solve was, that there *weren't enough* bnodes in the scoping graph. So, I am sure this disconnect arose from a combination of narrow bandwidth and time pressure when we were trying to get this thing done. But it is interesting that people can agree on the form of a definition while having such sharply divergent ideas about what the implications of it are. >>It was never the intent of that construction to restrict the set of >>legal answer bindings: > >Pif and fle. Perhaps I should have said, it was never MY intention, or as far as I know the intention of the WG, although I think that by the time we had reached this part of the work most of the WG had given up trying to follow all the ramifications and just wanted Enrico and me to get something finished and agreed on. But it does have a lot of implications which AFAIK have not been thought through or discussed, about how RDF-based querying should relate to OWL-based querying. Surely you have to admit that it does seem odd that if a KB is legal OWL-DL-RDF and a query pattern is OWL-DL-RDF legal, that one can get an answer binding using RDF entailment but not using OWL-DL entailment. It seems almost non-monotonic. >Actually, it's a *nice feature* of the scoping set that it can allow >us to define a variety of types of variables, including this new one >and even allow us to define told redundancy. Told redundancy, yes, that was always one intention of the design. And in general I agree about allowing many flowers to bloom. But honestly, this idea of OWL-DL restricting to no-bnodes is a new one to me, and I would have howled if I had heard it earlier. >Funny thing, Pat, is that I don't want to get rid of >semi-distinguished variables...you (and eric) want to get rid of >distinguished variables. Well, not exactly. I don't mind there being variables in a query which are restricted to match only in all kinds of ways, including not to bnodes. But I also want to allow *users* to make this decision, not the software or the spec. And many people have asked, why do we want to have bnodes in queries when they can be given as answers? And I have to say, I don't have a good answer to give them, other than "the DL folk say that things will break in OWL-DL if we allow this", which, frankly, really does not strike me as a very good answer to be giving when we are discussing an RDF standard. >But they are useful. Oh, I suppose they'll sneak back in with a >hidous isBnode() filter hack, but please, goodness me, screw that. Well, I don't like that much either, but again argument would be more useful than rhetoric. >>it was, rather, to allow the scoping set to contain *extra* >>bnodeIDs, precisely in order to overcome the objection raised in >>the second paragraph above, i.e. to allow the query engine to >>invent new bnodes when they are required in order to provide an >>answer binding to a query which has a purely existential answer, >>for which no bnode which actually occurs in the scoping graph is an >>appropriate answer binding. >> >>(8) Although the restricted interpretation (where a scoping set for >>OWL-DL queries must contain no bnodes) does not violate any actual >>technical requirement of the current SPARQL draft, I would argue >>strongly against adopting it as a standard, to the point that I >>would argue for pre-emptively rendering it illegal by suitable >>wording to the current spec. > >The university of manchester will formally object to this move on >the following grounds: > > 1) There is no need to constrain future groups Fine: then let the SPARQL spec make *no reference at all* to any kind of semantics beyond RDF/S. That is what its charter says, after all. As I recall, you were one of the chorus who insisted that we provide a general entailment-based definition which would apply to all future extensions. > 2) Distinguished variables are useful (in particular, if you >don't care about purely existential answers, you should be able to >blow them off and reap the computational benefits) Agreed. The point was that *if* the engine accepts the responsibility for the purely existential query, then it should be willing to provide a bnode as an answer binding for the corresponding SELECT. Or, of course, you can just refuse to answer pure existentials, which might in many cases be a wise decision, I agree. > 3) Distinguished variables are a standard notion > 4) Distinguished variables are widely implemented (Racer, >Kaon2, and Pellet, which covers just about *all* the OWL DL query >answering implementation; which, in point of fact, covers all the >OWL query answering implementations, in any reasonable sense of the >term) Right, and the fact that there are only three means that the use of the word 'standard' is meaningless and tendentious, IMO. There simply are no established standards here to appeal to. I don't mean here to denigrate anyones work or implicitly trash anyone's favorite area of research, only that there simply isn't enough of it yet to be the prime source of guiding principles driving the SWeb standards effort. >; > 5) Distinguished variables are computationally and >implementationally easier. Again, the proposal does not disallow distinguished variables. What it does do, is rule out the case where the user is obliged to make an ASK query with a bnode in order to discover something they cannot discover by making a SELECT query with a variable. If you get no bnode bindings from the SELECT, then the corresponding ASK, with the bnode supplied in the query, should also fail. The ASK and SELECT queries should have answers that align rationally: either they both tell you when a thing exists, or neither of them do. After all, to repeat a very simple RDF-ish observation, this is exactly what bnodes were intended to be used for: to indicate existence when no URIref is used to name the thing which exists. >It's enormously amusing that you, Pat, who so championed the >implementors point of view before If I ever did that, it was by accident :-). Im always on the side of the user. IMO, the way the semantic web should have been designed is to have given users - that is, people writing ontologies and queries, and making websites - as expressive and unconstrained languages and as many options as possible, and then invited implementors to make the best of them that they can, and publish the talents and abilities (and the inadequacies or restrictions or incompletenesses) of their various engines, and let the world decide which of the various computationally feasible restrictions are in fact of the most use. That is, I would like to have seen a SWeb driven by writers of ontologies rather than implementors of engines, by expressibility and semantics rather than computation and decideability. Then we would have had a lot of different kinds of computation, much of it dirty and pragmatic, some of it no doubt opening up new ideas in how to handle Web-scale information sources, rather than just being conventional industrial inference engines with predictable performance, re-treaded for XML/RDF. (The most useful RDF engines have been very simple forward-chaining closure generators, which I for one did not even remotely anticipate.) This most definitely is not giving control to the implementors (by which, you mean the implementors of DL inference engines, I presume.) DLs would then have been correctly and appropriately seen as one very sweet spot in guaranteed inference performance, obtained by using logic in a highly restricted fashion, rather than the defining syntax of the entire SWeb. But there could be - indeed there are - other sweet spots. To introduce these now would require re-writing the entire set of standards, rather than just having some enterprising implementor offer a new kind of inference service. IMO, basing the SWeb around the narrow technology of description logics was a really bad decision, a classical example of the mistake of premature optimization. I greatly regret that I didn't have the courage and self-confidence to speak up more clearly against it when DAML was being designed. >, would now be so hostile to them now. Really, its not a question of hostility. Im just trying to reach a point where we can all agree and which seems technically feasible. There has been a lot of push-back on the idea of having bnodes in queries, and I have tried as best I can to put the official OWL-DL point of view when responding to these questions; but thinking about it, I wasn't persuaded by the arguments I had heard. In fact, I hadnt heard any arguments, only assertions. And I am never persuaded by claims along the lines of "I say it is so and you are too ignorant to tell me otherwise", which I seemed to detect in earlier responses. >Wait, I guess it's not so surprising after all. >>With the restricted interpretation, for example, we would have the >>absurd situation that answer bindings could be derived from a graph >>using RDF entailment which are not legal using OWL-DL entailment, >>even when the graph itself is legal OWL-DL/RDF. > >Entailment is not the only factor in defining a query language >semantics, as we've seen. So this comparison is nonsensical unless >you fully spell out the semantics. Once you fully spell out the >semantics it's not absurd. Well, I think it is, on strictly semantic grounds. Entailment plus (totally arbitrary) restrictions on answers are in fact the only parameters in the current general definition, and the arbitrariness of that parameter is more a reflection of our abdicating the topic than any clear mandate to choose freely. Clearly OWL-DL is a much stronger form of entailment than RDF (or than simple graph entailment), so it seems absurd that when the only operational difference between two queries is the entailment relation they are supposed to be using, that an OWL-DL query engine will be unable to give simple answers, which have trivially short entailment chains, that an RDF engine can give. In fact, it will be a sign that the OWL-DL engine with this behavior is incomplete: which in this case, of course, it will be. >>(9) A final quick word on the claim of prior art implicit in the >>wording of some of the earlier messages. > >Oh, HERE comes the appropriate condescending ad hominem! I will ignore the rest of this message in the interests of avoiding a public flame war. Suffice it to say that I do not apologize for anything in my email. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Friday, 4 August 2006 11:02:07 UTC