Re: bnodes as answer bindings from Pat Hayes on 2006-08-04 (public-rdf-dawg@w3.org from July to September 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 4 Aug 2006 04:01:46 -0700
To: Bijan Parsia <bparsia@cs.man.ac.uk>
Cc: Eric Prud'hommeaux <eric@w3.org>, Enrico Franconi <franconi@inf.unibz.it>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230904c0f803008c24@[192.168.1.6]>
>I think you're confused too, Pat.

Perhaps, but let us get down to details.

>>This would therefore require that the two kinds of pattern be given 
>>fundamentally different semantics: intuitively, one might say that 
>>a bnode means 'some thing exists', but a query variable means 'some 
>>named thing exists', where the presence of the name is what makes 
>>the answer binding possible; and a bnodeID does not count as a 
>>'name' in the required sense.
>
>I'll note that restricting the domain of bindings (and thus of 
>correct answers) is significant even if you don't return 
>"existential" bindings. That is, in general, queries with only 
>distinguished variables (or, as it's sometimes said, when variables 
>are ranging over the "active domain

Can you give references for all this terminology that you cite? What 
exactly is the "active" domain? There is nothing in any semantic 
theory that I know of that distinguishes *things in the domain* on 
the basis of the kind of name that is used to refer to them with. The 
idea does not make sense, in any case: if bnodes were obliged to 
refer to a non-active domain while names refer to something else, 
then the troublesome redundancies would be eliminated.

>) are generally easier to answer. Even in RDF, considering only 
>simple entailment, this is true (as is shown by the work we need to 
>do to control redundancy).

Redundancy is a red herring here. Of course bnodes introduce new 
possibilities for semantic redundancy in answer bindings (and in RDF 
graphs themselves), but that does not in itself make answer 
computation harder. What it does make harder is computing 
*nonredundant* answer sets, of course: but then one might (and I 
would) argue that to require non-redundancy in answer sets when the 
original graphs may contain redundancies is the poor design decision 
which produces the difficulty here. And of course if the language 
admits true equality, even ground answers may contain semantic 
redundancies, and the computational burden of delivering nonredundant 
answer sets is overwhelming, quite independently from anything to do 
with bnodes. In all these cases, the task of computing nonredundant 
answer sets is on a par with the task of computing a nonredundant 
subset of a possibly redundant KB. IMO, none of this should be 
required by the specs defining query answering, which should be 
allowed to 'pass through' any redundancy which may be present in the 
source KB. This would then allow quite trivial extensions of the 
query strategies for the ground 'database' case to be used virtually 
unchanged with RDF, simply by treating source KB bnodes identically 
with URIrefs in the binding process.

>>(3) This idea is given terminological flesh by the distinction, 
>>introduced in 
>>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0192
>>between 'distinguished' and 'undistinguished' variables, and the 
>>observation that SPARQL currently uses 'semi-distinguished' 
>>variables (which can bind to both "names" and bnodeIDs). In 
>>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0018
>>this distinction is justified on the grounds that bnodes are not 
>>"names", so that it would not be appropriate to supply a bnodeID as 
>>an answer binding to a query variable; or at any rate, as is 
>>well-known to the cognoscenti, this is simply not the way that 
>>things are done:
>
>Maybe *this* is the appropriate condescending comment?

Its tone was admittedly an implied rebuke to that used in these messages.

>>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0031.
>
>Do you *deny* that semi-distinguished variables are novel?

Yes. I have never previously heard of this terminology of 
"distinguished" vs. "nondistinguished". (You have everyone's 
permission at this point to roll your eyes in amusement at my 
profound ignorance, of course.) I would be interested to see where 
this terminology was first used, and what its history is. In a 
database context where there are no bnodes, the distinction would be 
vacuous. To me, the obvious generalization of the notion of binding a 
variable to a name in a DB, when we extend this to RDF, is to have 
variables which bind to URIrefs, literals and bnodes, since these are 
the three kinds of 'name' in RDF. The idea of a variable which binds 
only no non-bnodes strikes me as rather ridiculous, on the face of 
it, in an RDF context: it would mean that answer binding was not 
preserved under simple entailment, for example.

>  Note that I've never argued against them. I think that BNodes in 
>answers can be very useful for expressing coreference between 
>answers. But they are novel.

Ah, wait. Bnodes are novel, I agree. That is, bnodes make RDF into 
something other than simply a simplified database format. In fact, 
Bnodes mean that one has to re-think quite a lot of DB-inspired ideas 
when trying to apply them to RDF, which is why for example SPARQL is 
not simply SQL. But that is a different issue. Where was the 
'distinguished variable' contrast used before there were bnodes? Were 
all variables 'distinguished' ?

>>(4) None of the reasons cited in the previous paragraph seem to me 
>>to be convincing objections to the obvious reply to the objection 
>>in (2), which is that bnodeIDs *can* be given as bindings to query 
>>variables;
>
>Since I don't see that they were given as objections to all that, I 
>fail to see why you might have *expected* them to be convincing.

Then I apologize for my misunderstanding of your messages, as this 
seemed to be their main substantive point.

>In that debate, the main point was 1) explicating the behavior of Pellet

OK, let me just bow silently and pass on this, as the behavior of 
Pellet seems to me not a particularly interesting matter.

>  and 2) pointing out that whether the variables are 
>semi-distinguished or not is a function of the way that the semantic 
>framework of sparql is instantiated. If there are no BNodes in the 
>scoping set, perforce the query variables are distinguished.

True, although this seems to me to be an odd way to put it, since 
this restriction doesn't seem to me to provide any change in the 
status of the *variables*, or even the definitions of binding, etc..; 
but whatever.

>There *IS NO SPECIFIED SPARQL SEMANTIC FOR OWL DL*, and so Pellet 
>implements the behavior that is most common in the DL community

OK, fine. No need to shout. I really don't care what Pellet does; and 
since, as you point out, there is no specification for OWL-DL-SPARQL, 
whether or not Pellet would conform to it if there were one seems not 
worth discussing.

>, and those that are best understood. BNodes are always 
>nondistinguished in SPARQL.
>
>(Racer and KAON2 support only distinguished variables. So if we are 
>to describe the current STATE OF THE ART by looking at *existing 
>implementation*, then we have to at least *support* the notion of 
>distinguished variable.

I don't mind supporting the notion in the sense of making it 
*possible* for a query engine to treat variables in this way, but I 
don't like writing this restriction into the semantics. These 
restrictions don't have a semantic justification, but a pragmatic 
one. It seems to me that we should just call such reasoners 
incomplete. There's no stigma associated with being incomplete.

>Having only distinguished variables makes a query *MUCH* easier to answer.)

Well, yes, but only in the sense that work is a lot easier if you 
simply refuse to do it. The answers obtained by binding variables to 
bnodes are after all entailed by the KB, so to simply refuse to 
deliver them just because it makes the implementors life easier seems 
to me to be a little like, in Russell's phrase, extolling the 
advantages of theft over honest toil. Which is fine, let me add, as 
long as we are being honest about it. I don't have any objection to 
someone advertising a query service with the caveat that it does not 
deliver existential answers. But I do object to the spec itself being 
written so as to make this the standard, just because it makes 
implementor's lives easier. And at least, if an engine is capable of 
answering an existential ASK query correctly, it seems to me that it 
should be capable of providing a bnode as an answer binding to the 
corresponding SELECT.

>>and since this is so, any engine which can determine the entailment 
>>of a pattern with a bnode, could deliver such a binding as an 
>>answer to the similar query pattern with a variable in place of the 
>>bnode. That is, if ASK PATTERN[_:x] (with _:x scoped to the 
>>pattern) is entailed, then SELECT ?x PATTERN[?x] should succeed, 
>>possibly with ?x bound to some suitable bnodeID generated by the 
>>querying engine.  Adopting this as a design principle for querying 
>>languages in the RDF family would, I suggest, bring some overall 
>>regularity and simplicity to the spectrum of SPARQL versions
>
>At some other costs.

I believe them to be relatively small costs. If you think I am wrong, 
I would welcome detailed refutation of my argument, which admittedly 
is sketchy.

>>(with different values for the entailment and scoping set 
>>parameters.) The fact that bnodes exist in the language means that 
>>purely existential answers can indeed be given by the same 
>>variable-binding mechanism as is used to deliver name bindings. In 
>>other words, there should be no difference between 'a thing exists 
>>such that...' and 'a named thing exists such that...', since the 
>>"name" can itself be a bnodeID, if necessary one generated for the 
>>sole purpose of being a binding for the query variable.
>
>No one denied that this is possible, and that having 
>semidistinguished variables can be both interesting and useful. 
>Indeed, as I said one of my posts you sneered at.

I do not think I have sneered at anything you have written.

>But forcing people to always check for unnamed bindings makes query 
>answering *much much* harder.

Again, I am still unconvinced. Harder, I grant you, but not the *much 
much*. At the absolute worst case I can imagine, it would seem to be 
by a linear factor of N, where N is 2|(n-1) and n is the number of 
variables in the query pattern. But this is using a ridiculously 
simple method, and Im sure that a little analysis could make it into 
a close-to-trivial extra cost. And if we are willing to abandon the 
strict no-redundancy requirement on answer sets, which I think we 
should in any case, then the whole computation gets a lot easier.

>It's not even known if it's decidable in full generality for OWL DL.
>
>>Let me call this the 'existential name principle', as it amounts to 
>>the claim that the RDF language family makes no distinction between 
>>existence and named existence; since if something exists, the 
>>presence of bnodes means that we can, in effect, validly generate a 
>>name for it.
>
>I was going to refute this, but I think I'll just laugh at it.

I wish you would try to refute it, because it is in fact correct in 
logic and in RDF. If there is some peculiarity of description logics 
which makes it false there, I would very much like to have that 
pointed out explicitly. Think of this as an opportunity to educate, 
rather than laugh disdainfully at the ignorance of your 
correspondents.

>>In the rest of this message I will defend this idea against the 
>>objections raised in the above messages, and try to distinguish 
>>this issue from others with which it might be confused.
>>
>>(5) One objection to this idea, cited earlier, is that it embodies 
>>a logical confusion: bnodes are existentially bound *variables*, 
>>and are therefore not logical names. Now, while it is of course 
>>true that bnodes and URIrefs/literals (which I will collectively 
>>call 'names' here) have subtly different truth conditions, they are 
>>not so different as this contrast suggests. First, calling one a 
>>'name' and the other a 'variable' is merely a matter of surface 
>>terminology.
>
>Simple entailment shows that this is false.

Nonsense.

>>Many logical syntaxes make a lexical distinction but this is not 
>>necessary, and many others, such as ISO Common Logic, do not: a 
>>'variable' then is simply a name that is bound by a quantifier.
>
>That's quite a difference!

It is when you get to the truth-conditions for the quantifier: but 
while you are inside the scope of the quantifier, they are literally 
indistinguishable. And there are no quantifiers in RDF. Neither the 
syntax not the semantics makes any distinction between name and 
variable: all that matters is the free/bound distinction, which 
cannot be determined locally but must be provided from outside the 
immediate linguistic context. Which is my only point: the 
bnode/URIref distinction in RDF is essentially a scoping difference.

>>Second, more to the point, they have essentially the same 
>>semantics: they both in effect assert that something exists, and 
>>use a lexical token to refer to it.
>
>Well, one is a singular term and the other isn't.

Singular? Im not sure what you mean, but neither of them are singular 
terms in the linguistic sense, which implies uniqueness.

>Big difference to me.
>
>>(It is worth noting that there are logical syntactic conventions 
>>which do not use lexical tokens *of any kind* to show existence, 
>>such as Peirce's existential graphs.) The only difference is that 
>>the scope of a name token might extend beyond the local syntactic 
>>context, allowing other assertions to be made about the thing 
>>denoted by the name; whereas an existential 'variable' is a lexical 
>>token only within the context defined by the scope of the 
>>quantifier. In natural deduction logical systems, logical names and 
>>existentials are almost interconvertible, using the rules EI and EG:
>>
>>PHI[name]
>>----------------EG
>>(exists (x)PHI[x])
>>
>>(exists (x)PHI[x])
>>--------------------EI
>>PHI[name*]
>>when name* does not occur free in the derivation, ie is a 'new' name.
>
>And boy, does that make a HUGE DIFFERENCE.

To what, exactly? Not to the semantics, not to the classical proof 
theory. The difference is so small that some textbook logics don't 
even bother to make the distinction. I don't see why it is such a 
very big deal for query answering, since the cost of generating a 
gensym bnodeID is trivial (use URIrefs if you get desperate.) So, if 
you could replace bluster with argument, maybe we could get something 
useful out of this conversation. Im sure you are right that it makes 
some difference to something, and maybe for DLs the distinction is 
more critical and important than it is in classical logics or RDF: if 
so, I'd be glad to be told why and how. (Later: your reference to the 
Baader work on defaults helps answer this, thanks for that pointer.)

>Ok, I stop now.
>
>Aside from this logical sophistry on Pat's part, the brute fact is 
>that, even aside from the logical and computational differences, 
>BNodes are pragmatically quite different. Try reusing one from an 
>answer set in a fresh query. Try pasting one into the address bar of 
>a web browser.

Er... both suggestions are meaningless. Being blank, they have no 
textual form to do such things with. If you mean bnodeIDs, then yes, 
you need to take careful heed of the scoping rules in place for these 
IDs, defined by the document conventions for the particular class of 
IDs. That is why you should not be able to do the first (although we 
did consider the possibility, as you know, under pressure from 
Maryland), since the new query sets up a new document scope. And of 
course Web browsers are not set up to reason with bnodes.

>[snip a bizarre account of BNodes.

I agree it is rather bizarre at first reading. It is, however, 
accurate, and reflects quite a lot of hard thinking by me and others 
that went into the design of the RDF graph syntax and semantics. And 
since SPARQL is chartered to be a query language for RDF, not for 
description logics, it is therefore relevant.

>>(6) Another objection seems to be that to allow variables to bind 
>>to bnodes somehow introduces great additional complexity to the 
>>query-answering process. It seems to me that this must be wrong,
>
>First, the core point is that having any level of 
>non-distinguishness most certainly adds to the complexity. Whether 
>fully non-distinguished or semi-distinguished, it's just much harder 
>than having only fully distinguished variables. For example, it's 
>not known if undistinguished variables in the query, plus transitive 
>roles in the query, is decidable for OWL DL.

Is it known whether the corresponding inference problem with bnodes 
in place of the variables is decideable?  If not, this is not 
relevant to my point. If so, what is wrong with my argument?

>Many bright people for several years have been working on that. In 
>the context of rules, this is very clear, SHOIQ + DL Safe rules 
>(SWRL rules where the variables are restricted to distinguished 
>variables) are decidable, whereas SWRL is not. Open defaults when 
>combined with a DL may be undecidable, whereas open defaults with 
>the variables treated as distinguished are (fairly robustly) 
>decidable. I'll let you google for DL Safe rules, but here's the 
>defaults link:
>	http://citeseer.ist.psu.edu/baader95embedding.html
>
>So, *never* allowing the variables to be purely distinguished makes 
>queries harder to answer even when we know how to answer them. This 
>is why racer only does distinguished variables.

? That seems like a nonsequiteur to me, but let it pass.

>Some optimizations are very hard to work for nondistinguished 
>variables. I can speak from personal (and Evren's) experience that 
>having only distinguished variables makes things much easier.

Im sure that is so. My argument was whether the difficulties of 
handling bindings to bnodes (which I agree is nonzero cost) are 
really as disastrous as you indicate. Bear in mind, I am *assuming* 
that  an engine exists which can properly answer ASK queries with 
existentials in them. As I understand the argument given by Enrico, 
SPARQL for OWL-DL must be allowed to have queries with bnodes in them 
because these can be addressed by methods which would fail when the 
bnode is replaced with a query variable which can be bound to bnodes. 
My argument is that there must be something wrong with this argument, 
since one can define a successful engine of the second kind (that at 
times will deliver a bnode binding to a variable) *in terms of* one 
of the first kind, with what appears to be a small extra 
computational cost factor. I may be wrong, in which case I would be 
glad to see my mistake pointed out. But I am not claiming that 
existential query answering is inherently easy to do.

It may be that the difference between us is that when I refer to an 
engine which allows bnode bindings to query variables, I do not mean 
to imply anything about completeness relative to a logic. I am making 
a purely pragmatic point about how to deliver answers to queries.

>  Heck they make it easier even in RDF as it avoids the need, on the 
>one hand if you allow full simple entailment (i.e., infinite bnodes 
>in the scoping set) having to minimize or leanify, or, on the other 
>hand, having to do the extra work the semantic framework does to 
>ensure restricting answers to told redundancy. If DISTINCT is 
>interpreted to eliminate told redundancy (which I would think it 
>should

Well, I disagree, but your next point ...

>) then having BNodes in answers makes things harder.

... I concede: if you want to require nonredundancy, then bnodes make 
your implementor's life harder. But the other side of this point is, 
that not providing this information means that your answering engine 
is seriously incomplete, and will refuse to provide information to a 
user when in fact it would be easy to answer the query. There is an 
inherent trade-off here between the implementor's workload and user's 
convenience, not surprisingly. If I ask for all people who own a blue 
car and live in Nantucket, and am told there are no answers, I will 
be seriously inconvenienced if I have to also ask if anyone *exists* 
who owns a blue car in Nantucket, in order to find out about the 
bnodes that your scared-to-be-redundant engine has refused to tell me 
about, even though the KB contained an explicit assertion to this 
effect. And the point is that the bnodes in RDF really are there: 
they aren't going to go away, so the fact that efficient engines 
exist which ignore them seems to me to be rather beside the point.

>>because one can define the variable-binding option in terms of the 
>>other. Suppose that a query engine is capable of generating answers 
>>to ASK queries containing bnodes, then it can deliver appropriate 
>>answers to SELECT queries by the following pseudocode, where 
>>(gensymBnode) generates a new bnodeID and P[a/b] means the pattern 
>>P with a replaced by b:
>>
>>if (  ASK P[(gensymBnode)/?x] ) then return (bind ?x (gensymBnode))
>>
>>Of course, this needs to be written more carefully to account for 
>>multiple variables, but you get the idea.
>
>I get the idea that you don't know what you're talking about, or, at 
>least, haven't thought it through. This would eliminate coreference 
>*between answers* which is the coolest feature of semi-distinguished 
>variables, but has a cost.

I have thought coreference through, but I wanted to keep the email 
short. Coreference between answers in the same answer set would not 
be handled by this, indeed, but then it would not be handled by a 
simple ASK query either; and of course it would be completely 
impossible if the variables were required to never be bound to 
bnodes. I wasn't suggesting this as a practicable technique, only as 
an existence proof to counter the claim that allowing bnode bindings 
necessarily made everything much, much, harder.

(And havnt you rather changed sides here? I thought you were arguing 
against allowing bnodes as bindings at all, but suddenly you seem to 
be arguing for the most sophisticated kind of such binding 
imaginable, where entire patterns of bnode bindings can be used to 
give subtle information about coreference between answers.)

>  Oh right, you've been confused on the scope of BNodes in answers 
>before, as I recall.
>And accounting for coreference *within* a answer across distinct 
>variables is also non-trivial, and different from an ASK query.

True, so in the worst case this would require a set of ASK queries 
with different patterns of bnode substitution for the variables in 
the pattern. This is still at worst a small integer multiplier. And 
the basic point is that this is what the query user is going to have 
to do, to make sure of getting all the answers: so even this slightly 
ludicrous algorithm can be viewed as a kind of helpful built-in user 
script for covering all the cases, when faced with a dumb-ass query 
engine that refuses to give all the answers that it can first time 
around. If the complexity is intrinsic and we can do no better, then 
so be it: but SOMETHING has to be able to deliver all the right 
answers SOMEHOW. Why should it necessarily fall to the user to have 
to scan through all the combinatorics, tying up web traffic with 
almost-duplicate queries?

>  (For one, if you have two answers, one with BNode cross reference 
>between distinct variables and one without, you have one more answer 
>than for just the ASK query. When each answer has complexity 
>NEXPTIME or much worse, well, practically speaking that can be a 
>pain.

Practically speaking, I agree. But it *could* be done, Im sure it 
could be done a lot better than this with some thought. Thinking of 
this took about a minute.

>)
>
>>And as written this does require doing an ASK for each pattern of 
>>variables in the query; but this is at worst a small multiple 
>>linear extra cost, and need only be done in cases where a real 
>>SELECT query on the variable has failed; and I am sure a more 
>>efficient method can be defined in practice, since the derivation 
>>of the inferences supporting the existential query will in any case 
>>involve generating suitable 'names'.
>
>The devil is in that "suitable", and you haven't thought it through. Again.

Then please enlighten me. Seems to me on the face of it that one name 
is pretty much like another; and since a bnode/existential can be 
used to generalize a variety of individuals in different 
interpretations, why does this not provide a simple strategy: by 
default, assume that the answer will be an existential; in the case 
where all interpretations use the same identifier, use that instead. 
We aren't here talking about the innards of the reasoner, after all, 
only about what name it provides as output.

>>  (I can kind of see that in the presence of elaborate equality 
>>reasoning, problems might arise in determining whether or not 
>>multiple bindings are really the same. But even here, the 
>>possibility of providing a mere existential name - a bnode - does 
>>seem to provide an opportunity for computational savings with a 
>>possible loss of completeness but without sacrificing correctness.)
>
>A lot of things "seem" a certain way to you, but alas are not the 
>way they seem to you.

No doubt. But since your insight into these matters is clearly 
superior to mine, perhaps you could instruct me, rather than simply 
rejoicing in my ignorance.

>>(7) Finally, there seems to be a misapprehension that because any 
>>graph entails infinitely many other bnode-rich graphs which have it 
>>as an instance, that to allow bnodes as bindings will somehow open 
>>an infinite floodgate of redundant answers.
>
>If care is not taken, yes. But really the problem is determining 
>what the *right* number of answers is, and that's nontrivial.

Well, we have the option, when writing the specs, to decide this by 
decree, as it were. For RDF/RDFS it is fine to simply restrict to the 
bnodes which actually occur in the scoping graph, i.e. to treat 
bnodes just like other names. (My own preference for the actual spec 
would be to not specify this *at all*, on the grounds that there is 
no 'right' answer for all purposes, but only remark on the possibilty 
of redundancy in answer sets and say that engines *should* (not 
*must*) reduce redundancy as far as possible, with the understanding 
that to eliminate it completely may be impractical. But this is not 
the position that the WG has adopted.)

>It's *much* easier with only distinguished variables.

Its much easier for the implementors, sure. But that is a bit like 
saying that a programming language spec should keep the language as 
simple as possible in order to make the compiler-writer's job easier, 
regardless of whether it can be used to write useful programs. If you 
look at it from the point of view of someone who wants a query 
answered, having to repeat every SELECT as a similar, but not 
identical, ASK in order to be told about the bnodes in the KB seems 
like an unacceptable cost to me. I would rather the engine at the 
other end did all that grunt work for me.

>>And this would be true, if the spec were written in a very naive 
>>way so that entailment alone were a sufficient criterion for 
>>correctness of an answer binding. But this, of course, is why the 
>>SPARQL specs are written in the way that they are, to restrict 
>>answer bindings to a particular scoping set of possible bindings - 
>>in the case of normative SPARQL, the URIs and Bnodes which actually 
>>occur in the scoping graph.
>
>And this is computationally simpler than requiring true semantic 
>minimization, but it does mean that DISTINCT needs to be specified 
>correctly. If DISTINCT really does mean DISTINCT, then it will be a 
>much harder operation when dealing with BNodes.

OK, then we need to be careful when describing DISTINCT. I would be 
happy to give DISTINCT a naif treatment in which it is essentially 
lexical-plus-datatypes. I'm not saying this all doesn't need care, 
but I still think that this option is better than just pretending 
that bnodes don't really exist.

>If we don't allow that, then people have to work in user code to 
>figure out which Bnoded answers are actually redundant...which is 
>unfortunate in my opinion.

If they care that much about non-redundancy, yes, they will. But if 
they don't, they won't. Which seems to me exactly as it should be. 
Its not the job of a query engine to clean up a messy KB or to 
provide guarantees of absolute perfection when the world is full of 
imperfect data. In many cases even in traditional DBs there is actual 
redundancy which never gets detected, which is why I get so many 
duplicate emails. But the world gets along, wasting some small 
percentage of effort in the name of an easy life.

>>The separation between the scoping graph and the scoping set was 
>>written into the spec to allow for alternative strategies for 
>>handling OWL. Bijan notes that the spec as written therefore allows 
>>'OWL-DL-SPARQL' - which, to emphasize, is not itself defined by the 
>>spec - to restrict itself to bindiings to an arbitrarily small set 
>>of legal answer bindings, and therefore to restrict itself so as to 
>>disallow bindings to bnodes. This is technically correct, but 
>>wholly against the design intention of those definitions.
>
>Well, since Enrico was involved in the design of those definition, 
>and I know that THIS WAS HIS INTENT, you're just dead wrong.
>
>Again.
>
>And really, you're also full of crap to be pulling the "design 
>intention" move. How slimy.

?? I am genuinely amazed. I was after all involved in the design, and 
in fact I suggested the 'scoping set' construction; and certainly the 
above reflects my intentions (and my thinking during the admittedly 
complex process involving Enrico and others). Evidently Enrico and I 
weren't in total communication about this, but we were under a lot of 
pressure to get something set into a final form, so the first time we 
managed to agree, we stopped.

But I am still amazed. I don't recall Enrico ever expounding this 
restriction-to-no-bnodes idea during the discussions, and I am sure I 
would have reacted strongly if I had heard or read it; but again my 
memory might be faulty (and I did miss some of the telecons, 
admittedly). If query variables cannot be bound to bnodes, then there 
is no need for the scoping graph at all, so the definitions could 
have been greatly simplified; and getting things like scoping between 
multiple answers straight was a large part of the entire discussion, 
which would all have been quite pointless if no bnode bindings to 
variables were being considered.

As I understood the discussion, it began with the 'little-house' 
example which shows that OWL-DL-RDF could entail an existential (ie a 
pattern instance with a bnode) even when the original KB graph has no 
name to bind anything to, and more pointedly that no kind of 
'closure' would provide such a name, so there can be no 'OWL-DL 
entailment lemma' like the RDF and RDFS entailment lemmas in the RDF 
semantics. So, the argument went, we must allow 'genuine 
existentials', ie bnodes, in the query pattern. Now, the idea of a 
'scoping graph' was introduced into the SPARQL definitions to keep 
the scope of bnodeIDs in the answer distinct from those in the KB 
(actually, more correctly, to allow for the possibility of their 
being distinct), with the idea that the bnodes used as answer 
bindings would be drawn from this graph; and when it was pointed out 
that together with the above little-house point, this meant that to 
restrict to bnodes in any kind of KB - even a 'closure' under some 
rule set - would not give us enough bnodes to give all answer 
bindings, I suggested loosening the requirement to virtual vacuity, 
to allow *some specified set* of identifiers (which includes bnodes, 
in this mathematical level description); and honestly, what I was 
thinking of there was that the scoping set could be *bigger* than the 
set gotten by taking the identifiers in any kind of virtual graph, 
not radically smaller than it; since the problem that it was being 
put there to solve was, that there *weren't enough* bnodes in the 
scoping graph.

So, I am sure this disconnect arose from a combination of narrow 
bandwidth and time pressure when we were trying to get this thing 
done. But it is interesting that people can agree on the form of a 
definition while having such sharply divergent ideas about what the 
implications of it are.

>>It was never the intent of that construction to restrict the set of 
>>legal answer bindings:
>
>Pif and fle.

Perhaps I should have said, it was never MY intention, or as far as I 
know the intention of the WG, although I think that by the time we 
had reached this part of the work most of the WG had given up trying 
to follow all the ramifications and just wanted Enrico and me to get 
something finished and agreed on. But it does have a lot of 
implications which AFAIK have not been thought through or discussed, 
about how RDF-based querying should relate to OWL-based querying. 
Surely you have to admit that it does seem odd that if a KB is legal 
OWL-DL-RDF and a query pattern is OWL-DL-RDF legal, that one can get 
an answer binding using RDF entailment but not using OWL-DL 
entailment. It seems almost non-monotonic.

>Actually, it's a *nice feature* of the scoping set that it can allow 
>us to define a variety of types of variables, including this new one 
>and even allow us to define told redundancy.

Told redundancy, yes, that was always one intention of the design. 
And in general I agree about allowing many flowers to bloom. But 
honestly, this idea of OWL-DL restricting to no-bnodes is a new one 
to me, and I would have howled if I had heard it earlier.

>Funny thing, Pat, is that I don't want to get rid of 
>semi-distinguished variables...you (and eric) want to get rid of 
>distinguished variables.

Well, not exactly. I don't mind there being variables in a query 
which are restricted to match only in all kinds of ways, including 
not to bnodes. But I also want to allow *users* to make this 
decision, not the software or the spec. And many people have asked, 
why do we want to have bnodes in queries when they can be given as 
answers? And I have to say, I don't have a good answer to give them, 
other than "the DL folk say that things will break in OWL-DL if we 
allow this", which, frankly, really does not strike me as a very good 
answer to be giving when we are discussing an RDF standard.

>But they are useful. Oh, I suppose they'll sneak back in with a 
>hidous isBnode() filter hack, but please, goodness me, screw that.

Well, I don't like that much either, but again argument would be more 
useful than rhetoric.

>>it was, rather, to allow the scoping set to contain *extra* 
>>bnodeIDs, precisely in order to overcome the objection raised in 
>>the second paragraph above, i.e. to allow the query engine to 
>>invent new bnodes when they are required in order to provide an 
>>answer binding to a query which has a purely existential answer, 
>>for which no bnode which actually occurs in the scoping graph is an 
>>appropriate answer binding.
>>
>>(8) Although the restricted interpretation (where a scoping set for 
>>OWL-DL queries must contain no bnodes) does not violate any actual 
>>technical requirement of the current SPARQL draft, I would argue 
>>strongly against adopting it as a standard, to the point that I 
>>would argue for pre-emptively rendering it illegal by suitable 
>>wording to the current spec.
>
>The university of manchester will formally object to this move on 
>the following grounds:
>
>	1) There is no need to constrain future groups

Fine: then let the SPARQL spec make *no reference at all* to any kind 
of semantics beyond RDF/S. That is what its charter says, after all. 
As I recall, you were one of the chorus who insisted that we provide 
a general entailment-based definition which would apply to all future 
extensions.

>	2) Distinguished variables are useful (in particular, if you 
>don't care about purely existential answers, you should be able to 
>blow them off and reap the computational benefits)

Agreed. The point was that *if* the engine accepts the responsibility 
for the purely existential query, then it should be willing to 
provide a bnode as an answer binding for the corresponding SELECT. 
Or, of course, you can just refuse to answer pure existentials, which 
might in many cases be a wise decision, I agree.

>	3) Distinguished variables are a standard notion
>	4) Distinguished variables are widely implemented (Racer, 
>Kaon2, and Pellet, which covers just about *all* the OWL DL query 
>answering implementation; which, in point of fact, covers all the 
>OWL query answering implementations, in any reasonable sense of the 
>term)

Right, and the fact that there are only three means that the use of 
the word 'standard' is meaningless and tendentious, IMO. There simply 
are no established standards here to appeal to. I don't mean here to 
denigrate anyones work or implicitly trash anyone's favorite area of 
research, only that there simply isn't enough of it yet to be the 
prime source of guiding principles driving the SWeb standards effort.

>;
>	5) Distinguished variables are computationally and 
>implementationally easier.

Again, the proposal does not disallow distinguished variables. What 
it does do, is rule out the case where the user is obliged to make an 
ASK query with a bnode in order to discover something they cannot 
discover by making a SELECT query with a variable. If you get no 
bnode bindings from the SELECT, then the corresponding ASK, with the 
bnode supplied in the query, should also fail. The ASK and SELECT 
queries should have answers that align rationally: either they both 
tell you when a thing exists, or neither of them do. After all, to 
repeat a very simple RDF-ish observation, this is exactly what bnodes 
were intended to be used for: to indicate existence when no URIref is 
used to name the thing which exists.

>It's enormously amusing that you, Pat, who so championed the 
>implementors point of view before

If I ever did that, it was by accident :-). Im always on the side of 
the user. IMO, the way the semantic web should have been designed is 
to have given users - that is, people writing ontologies and queries, 
and making websites - as expressive and unconstrained languages and 
as many options as possible, and then invited implementors to make 
the best of them that they can, and publish the talents and abilities 
(and the inadequacies or restrictions or incompletenesses) of their 
various engines, and let the world decide which of the various 
computationally feasible restrictions are in fact of the most use. 
That is, I would like to have seen a SWeb driven by writers of 
ontologies rather than implementors of engines, by expressibility and 
semantics rather than computation and decideability. Then we would 
have had a lot of different kinds of computation, much of it dirty 
and pragmatic, some of it no doubt opening up new ideas in how to 
handle Web-scale information sources, rather than just being 
conventional industrial inference engines with predictable 
performance, re-treaded for XML/RDF. (The most useful RDF engines 
have been very simple forward-chaining closure generators, which I 
for one did not even remotely anticipate.) This most definitely is 
not giving control to the implementors (by which, you mean the 
implementors of DL inference engines, I presume.) DLs would then have 
been correctly and appropriately seen as one very sweet spot in 
guaranteed inference performance, obtained by using logic in a highly 
restricted fashion, rather than the defining syntax of the entire 
SWeb. But there could be - indeed there are - other sweet spots. To 
introduce these now would require re-writing the entire set of 
standards, rather than just having some enterprising implementor 
offer a new kind of inference service. IMO, basing the SWeb around 
the narrow technology of description logics was a really bad 
decision, a classical example of the mistake of premature 
optimization. I greatly regret that I didn't have the courage and 
self-confidence to speak up more clearly against it when DAML was 
being designed.

>, would now be so hostile to them now.

Really, its not a question of hostility. Im just trying to reach a 
point where we can all agree and which seems technically feasible. 
There has been a lot of push-back on the idea of having bnodes in 
queries, and I have tried as best I can to put the official OWL-DL 
point of view when responding to these questions; but thinking about 
it, I wasn't persuaded by the arguments I had heard. In fact, I hadnt 
heard any arguments, only assertions. And I am never persuaded by 
claims along the lines of "I say it is so and you are too ignorant to 
tell me otherwise", which I seemed to detect in earlier responses.

>Wait, I guess it's not so surprising after all.
>>With the restricted interpretation, for example, we would have the 
>>absurd situation that answer bindings could be derived from a graph 
>>using RDF entailment which are not legal using OWL-DL entailment, 
>>even when the graph itself is legal OWL-DL/RDF.
>
>Entailment is not the only factor in defining a query language 
>semantics, as we've seen. So this comparison is nonsensical unless 
>you fully spell out the semantics. Once you fully spell out the 
>semantics it's not absurd.

Well, I think it is, on strictly semantic grounds. Entailment plus 
(totally arbitrary) restrictions on answers are in fact the only 
parameters in the current general definition, and the arbitrariness 
of that parameter is more a reflection of our abdicating the topic 
than any clear mandate to choose freely. Clearly OWL-DL is a much 
stronger form of entailment than RDF (or than simple graph 
entailment), so it seems absurd that when the only operational 
difference between two queries is the entailment relation they are 
supposed to be using, that an OWL-DL query engine will be unable to 
give simple answers, which have trivially short entailment chains, 
that an RDF engine can give. In fact, it will be a sign that the 
OWL-DL engine with this behavior is incomplete: which in this case, 
of course, it will be.

>>(9) A final quick word on the claim of prior art implicit in the 
>>wording of some of the earlier messages.
>
>Oh, HERE comes the appropriate condescending ad hominem!

I will ignore the rest of this message in the interests of avoiding a 
public flame war. Suffice it to say that I do not apologize for 
anything in my email.

Pat


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 4 August 2006 11:02:07 UTC