- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Mon, 7 Aug 2006 16:56:04 +0100
- To: Pat Hayes <phayes@ihmc.us>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
On Aug 7, 2006, at 4:58 AM, Pat Hayes wrote: >> On Aug 4, 2006, at 11:11 PM, Enrico Franconi wrote: >> >>> >>>> Can you give references for all this terminology that you cite? >>>> What exactly is the "active" domain? There is nothing in any >>>> semantic theory that I know of that distinguishes *things in the >>>> domain* on the basis of the kind of name that is used to refer >>>> to them with. The idea does not make sense, in any case: if >>>> bnodes were obliged to refer to a non-active domain while names >>>> refer to something else, then the troublesome redundancies would >>>> be eliminated. >>> >>> The first entry in <http://scholar.google.com/scholar?q=%22active >>> +domain%22+database> is a survey in DBs written 20 years ago. > > OK, thanks for that. I can't actually get the article on-line from > this, and the abstract does not use 'distinguished' or 'active' > anywhere. But I will continue to search. I gave you a link that led to a use of it (in a Racer paper). I can get to that paper when inside the university network. If you are doing if from home, often the library research portal will give you a way to log in. You could also join the ACM ($200/year for full digital library access...which, if you don't have it via the university, you are missing out; great stuff in there) or at least register for the free limited access bit. I'm not sure if the free registration will give it to you and it's hard for me to test now that I'm at work. Here's the key paragraph: """When defining an instance of a semantic schema, an active domain is associated with each node of the schema. The active domain of an atomic type holds all objects of that type that are currently in the data- base. This notion of active domain is extended to type constructor nodes below. erately narrow and differs from the usage of that term in some models, including SDM and TAXIS. The representation of aggregations in those models is generally based on attributes and is discussed in the next section. It should also be noted that """ The key notion, I would say, is being "in the database", i.e., *used* in the database. Since no elements of the domain get into a database without a name, and the ABox portion of a DL kb is generally considered to be analogous to the *data* of a database, you can see why the Racer folks use the term. As you can see from the cases of Terminological defaults, horn rules, or cyclic queries with transitive rules (heck, even not cyclic queries), focusing on the active domain can make a lot of difference. In DL safe rules, the way they formalize it is to have a special predicate O which is true of all and only the named individuals in the KB. Then you can use appearance in the body of a O(X) where X is a variable to get the distinguished/nondistinguished distinction. >>>> I have never previously heard of this terminology of >>>> "distinguished" vs. "nondistinguished". (You have everyone's >>>> permission at this point to roll your eyes in amusement at my >>>> profound ignorance, of course.) I would be interested to see >>>> where this terminology was first used, and what its history is. >>>> In a database context where there are no bnodes, the distinction >>>> would be vacuous. >>> >>> Ah. Second and third entries in <http://scholar.google.com/ >>> scholar?q=distinguished%20variables> are DB references from >>> almost 30 years ago. > > Thanks again. Similar access problems. If you look at Ian and Sergio's paper in ISWC 2002, you'll see the terms used. Plus if you look, for example, on the KAON2 site, you'll see this terminology used. (This doesn't give you the history.) >> And Pat's own acquaintance with some variants of the latter >> terminology: >> http://daml.semanticweb.org/listarchive/joint-committee/1024.html >> http://pride.daml.org/listarchive/joint-committee/1125.html >> >> and >> http://daml.semanticweb.org/listarchive/joint-committee/1027.html > > Indeed, I had forgotten that we did use this terminology at one > point in DQL; but we used it with a completely different meaning, > which had nothing to do with what the variable is allowed to bind > to. (In retrospect, a better term for the DQL notion would have > been 'selected variable': it meant only a variable whose binding is > returned in an answer.) This is one aspect of distinguished variables, which corresponds to their being in the head. That's what being in the head means. (Roughly.) > I note that in the DATALOG literature the term is used with yet a > third meaning, viz. a variable which occurs in the head. I believe I pointed this out to you. And I don't think it's exactly a third meaning. It coincides with both being in the head and being restricted to the active domain. It's just that they coincide in Datalog. In SPARQL, not listing variables in the SELECT clause is a projection, which now is distinguishable from making a variable nondistinguished. That is, in Datalog, if you project away, it's the same as making the variable not appear in the head (since it no long appears in the answer set). But since *all* variables always are restricted to the active domain, it doesn't change anything. > Not surprisingly, the phrase "distinguished variable" seems to be > used for a variety of cases in which someone wishes to distinguish > one variable from another. Not at all. The distinguished/nondistinguished variable distinction is clearly a term of art in databases. When you extend it beyond the database context, you can choose to emphasize the "returns a binding" and come up with bindings for purely existential answers, or you can keep with what I think is part of the spirit of the distinction and also restrict them to the active domain. Richard Fikes was clearly borrowing from the DB literature, but his use of it has not been adopted anywhere that I see. > This however does not make it a widely used technical term, only a > common English phrase. You are joking of course. First off, this is a straw man. Even if it WERE not a *widely used* technical term I never *said* it was. I said it was *standard* which it is both in the database and the description logic communities. Since we were discussing a description logic answering system, all I really need to do is refer to that. It's definitely standard there, and in that community, widely used. Clearly Enrico and I both know the terms this way. Is it an accident? I promise that I didn't learn it from Enrico. > Apparently, in fact, these various uses - and I am sure one could > easily find others - have very little, if anything, in common. Uh...no. They quite clearly have history in common, and they have a substantive overlap in meaning. >> I believe the most deeply nested quote is Richard Fikes, the next >> level Pat, and the final line richard (in spite of the quote mark): >> >> """> >answer will include a binding for each distinguished variable. >> I am >>> >referring to the variables in the query pattern that are not >>> >distinguished variables as "non-distinguished variables". >>> >>> undistinguished variables? >> >>> From a quick check on the Web, I find them being called >> "nondistinguished variables".""" See, Richard was getting a standard term! >> I don't expect Pat to have remembered this. It was, after all, 5 >> years ago. It seems there is precedent for semi-distinguished >> variables in DQL. > > This terminology of 'semi-distinguished' is silly. I never claimed "semi-" or "quasi-distinguished" was a great name for them. But we need some name for them. > These are simply *variables*, plain vanilla. No. Variables include distinguished and nondistinguished, at least. Since we're defining several different behaviors, it helps not to appropriate the general term. > A variable is a syntactic token whose role is to stand in for, or > be replaced by, or be bound to, a piece of syntax so that the > resulting expression is well-formed. This notion of variable is > used a wide variety of contexts and has been so used for at least > 50 years (lambda calculus, substitutional interpretation of > quantifiers, production rules): Citations? Preferably going back 50 years? That's me being cheeky, btw. In any case, let me try an analogy. Let us say you were making a sorted logic where you wanted to also have variables which were not sorted, i.e., were not restricted to a type. You might call the first kind of variable "sorted" or "typed" variables, and the second, "untyped" or "unsorted" variables, even though the latter corresponds to the very notion of variable. See, in a *context* where one is making certain distinctions, sometimes it helps to have a more specialized name for the general concept, especially when the generalized concept no longer *quite* covers everything. Note that distinguished variables have two aspects, appearing in the head (i.e., appearing in the answer set) *and* ranging over names. These two features are related. So "semi" seems appropriate. > there is nothing new or exotic about it. In the context of answering queries, it is. The only antecedent that I've seen is in that email exchange. Don't you think it's a bit odd for you to find RDF query (with BNodes) so radical and different that supposedly we can saw little authoritative about it, on the one hand, and on the other that one of the key aspects of that strangeness is supposed to be completely pedestrian? Semi-distinguished variables *bind* and *report* that binding (even if projected away) over arbitrary elements of the domain (or syntactic elements, if you will). Where are the algorithms for this? Complexity? Implementations? Please cite me a paper for this, other than DQL, which deals with this. In fact, I don't think that DQL quite captures what we have in SPARQL because of the scope of the variables to between answers. I would review this exchange: http://www.daml.org/listarchive/joint-committee/1109.html Particularly starting here: http://www.daml.org/listarchive/joint-committee/1113.html I found your intuitions as to the answers to such queries as: """KB John rdf:type _:r . _:r daml:onProperty friend . _:r daml:minCardinality 3 . Query John friend ?l . ?l distinguished""" vs. """KB John rdf:type _:r . _:r daml:onProperty friend . _:r daml:minCardinality 3 . John friend _:f . Query John friend ?l . ?l distinguished"""" To be *extremely* counterintuitive, given that the KBs are equivalent. In this message: http://www.daml.org/listarchive/joint-committee/1121.html You basically say that you don't care whether equivalent KBs give the same answers. Perhaps you don't think that any longer, but I certainly think that for interoperability, we should strive to make the answers from different engines to be, at the very least, predicable. Frankly, I think that they should be "the same" insofar as we can assume that. BTW, in <http://www.daml.org/listarchive/joint-committee/1121.html> you point out that you were acquainted with the concept (by Ian) of a distinguished variable" """Ian has argued strongly that it should not include all 'answers' that can be logically inferred from the KB, but only those which arise from a binding of a query variable to a term in the KB Herbrand universe, in order to keep the inferential burden on the server within DL-manageable bounds. I am happy with that; but given the resulting incompleteness, it seems silly to object to a proposal on the grounds that logically equivalent KBs may not always deliver the same answers.""" As for incompleteness in general, there are *always* more expressive queries one could ask. I don't think there is a clear notion of "completeness" independent of the semantics of the query. So, I think that the complaint about incompleteness is, in general, a red herring. If you don't like the *expressiveness* of the query (which I think is a more helpful way to think about it), that is a different story. > Both 'distinguished' and 'undistinguished' variables, in the sense > you are using these qualifiers, are variables which are restricted > in some way to bind only to a certain class of syntactic instances. Er...sort of. I guess. I don't think that's the happiest way to think about them. But ok. > But to call an variable without any such restrictions applied to it > 'semi' anything, and to claim it is something new, is daft. It also has to appear in the head. I.e., to report back bindings. This is new, and there are no developed algorithms for it. And, as I pointed out to you, I believe in private email, there are subtlies involved in the DL case, if you are going to allow the sort of coreference that you have in the RDF case, that are reasonably challenging. Oh, is "daft" permitted in the set of appropriate snarky/insulting terms? I mean, if I say that *your* email was "daft", will I get in trouble? > These are just plain *variables*, and this idea is about as old as > algebra. I hope you see why your terminological rant is misguided. We are talking about variables in the specific context of query answering. We are talking about variables which are neither distinguished nor non-distinguished but share a property of each, that is, with distinguished of appearing in the head/answer set and with non- distinguished of ranging over arbitrary elements of the domain, that is, including ones not named in the KB. These are indeed new, and the techniques for dealing with them have not been developed. They are a bit less new in the SPARQL/RDF context (if we ignore minimization) because, essentially, we have chosen to treat the BNodes in the KB as names, thus as identifying part of the active domain. I tend to think this is an abuse of the nature of BNodes as existential variables, but 1) it has computational benefits, 2) most users do not think of BNodes as existential variables, 3) it comports with implementations, 4) we can enable the minimizing behavior with "DISTINCT" which is reasonble. 1 with 4 allows us to give a kind of SQLy reading, where the "practical language" is allowed to depart from the dictates of the relational algebra for computational reasons. > BTW, in the DQL sense of 'distinguished', "semi-distinguished" > would be incoherent. (Do you return half the answer?) See above for the origin. The "semi-" refers to the distinguishedness, not the answers. To reiterate, I see all three as useful in both the RDF and the DL context. Distinguished variables are less useful in the RDF context because of the ease of thinking of the BNodes as names. Basically, you don't have a very interesting structure to the models induced by the BNodes, you pretty much just have to replicate the asserted relational structure. They are useful in the context of computing redundancies since, if you don't *care* about purely existential answers, and you want non-redundant results, it's useful to avoid the BNodes. But perhaps that is rare enough in the RDF case to make the use of a filter not very painful. Is there a reason to continue this discussion on list? I mean, it's pretty much a debate about the history and use of terms. Cheers, Bijan.
Received on Monday, 7 August 2006 15:57:05 UTC