- From: Dan Brickley <danbri@w3.org>
- Date: Tue, 6 Nov 2001 18:23:17 -0500 (EST)
- To: Pat Hayes <phayes@ai.uwf.edu>
- cc: Richard Fikes <fikes@ksl.stanford.edu>, <w3c-rdfcore-wg@w3.org>
Very interesting and all, especially the blank nodes stuff. But I suspect you really meant to send this to the DAML+OIL joint committee list. Intrigued RDF Core WG members can read their archives linked from http://www.daml.org/ BTW I'm seeing a *lot* of RDF query discussion happening offlist, or scattered around the Web. A while back we set up rdf-rules@w3.org to provide a home for such discussion. It never got widely announced (in large part because of the difficulty scoping it w.r.t. rdf-logic list). I'll see about getting that done soon. Dan On Tue, 6 Nov 2001, Pat Hayes wrote: > >I said last week that I would send out a message articulating the issues > >we have been discussing regarding query-answering. This is that > >message. My apologies for not sending the message out sooner. I am > >back at home (and my father is recuperating well from his surgery), but > >I did not have time to put this together sooner. > > > >As per my earlier messages, my student Yulin Li has been significant > >contributions to the material in this message. > > > >Richard > > > >----------------------- > > > >This message is to articulate issues involved in specifying and > >formalizing the information content of a "query-answering discourse" > >between two agents in which one agent seeks information from a second > >agent by sending a "query" to the second agent. > > > >I will refer to the agent sending the query as the "client" and the > >agent receiving the query as the "server". I will refer to the response > >sent by the server to the client as a "query result" and will assume > >that a query result may contain one or more "query answers". > > > > > >Knowledge Base > > > >I assume a query is posed with respect to a knowledge base that is a > >DAML+OIL representation of a logical theory. Thus, a query needs to > >include a reference to a DAML+OIL knowledge base. I will refer to that > >knowledge base as the "query KB". > > > > > >Query Premise > > > >We have discussed enabling a query to include a premise that is to be > >added to the query KB so that the query is being asked of the query KB > >unioned with the premise. A premise essentially facilitates queries of > >the form if-then while still remaining within the expressiveness of > >DAML+OIL. > > > >ISSUE: Do we want to enable the inclusion of a premise in a query, and > >if so what is the form of a premise? I recommend that we do allow > >premises and that they be an arbitrary DAML+OIL knowledge base. > > > > I can't quite see the point of this. Presumably, this amounts to > having the query KB be the conjunction of the original query KB and > the premis, is that the idea? But we can do this simply by including > a reference to the original query KB into the premis, then using the > premis alone as the query KB. In the DAML+OIL world, any ontology can > include any other ontology, simply by referring to it, so there is no > need to invoke any special provision for conjoining two of them in a > special way. > > >Query Pattern > > > >I assume a query contains a "query pattern" that specifies relationships > >among unknown sets of objects in a domain of discourse. Each unknown > >object is represented in the query pattern by a "query variable". > >Answering a query with query variables x1,…,xn involves identifying > >tuples of object constants > > Why only constants? > > >such that for any such tuple <c1,…,cn>, if > >ci is substituted for xi in the query pattern for i=1,…,n, then the > >resulting "query pattern instance" specifies a sentence that is entailed > >by the query KB. > > That is the same as saying that the query is entailed by the KB, if > those are existential variables. (The logic is already doing this for > you, you don't need to do it all again :-) > > >For each such ci, the pair [xi, ci] is called a "query > >variable binding" for the corresponding query variable xi. Each set of > >query variable bindings specifies a candidate answer to the query. > > > >ISSUE: What is the expressive power of the query pattern language? For > >example, we might decide that a query pattern instance can specify a > >sentence that is a disjunction of a conjunction of RDF statements or is > >a negation of a conjunction of RDF sentences, or …. My recommendation > >is that we define the query pattern language so that a query pattern > >instance specifies a conjunction of sentences each of which is > >representable in DAML+OIL (i.e., a conjunction of RDF statements). > > > > > >Answer Mapping Function > > > >A query needs to specify a mapping of query variable bindings into query > >answers. > > ?? What is a query answer, exactly? (I thought that bindings to query > variables *were* query answers (??)) > > >In particular, a query answer may include or make use of > >bindings to only a subset of the query variables. > > > >ISSUE: What is the nature of the answer mapping function language? For > >example, a query answer might be allowed to be any s-expression > > Whoa! How did Sexpressions get in there? Aren't we talking about DAML ?? > > >whose > >atomic elements are bindings to specified variables. (E.g., map the > >bindings [x1, c1], [x2,c2], [x3,c3], and [x4,c4] to the s-expression > >"(c1 (c2 c3) c4)".) I think all that matters for our formalization and > >core design work is that a query answer may make use of the bindings to > >only a subset of the query variables. So, my recommendation is that for > >now we consider a query answer to consist of a set of bindings for a > >subset of the query variables > > That would be my natural inclination also, but I think it needs to be > extended some. For example, we might want to distinguish between the > case where the query is said to be OK but no bindings are provided, > from the case where the query is simply answered with 'no' or 'fail' > or whatever. Ian wants to distinguish the 'don't know' (ie unprovable > from KB) answer case from the 'no' (ie, contradictory with the KB) > cases. So in general there might be other information in an answer > than just the bindings alone. > > >, and that the query specifies which query > >variables are in that subset. I will make that assumption in the > >remainder of this document. > > Wait. The *query* specifies which query variables are in the set? So > there could be query variables which are not being, as it were, > queried? (Then why did you call them query variables?) > > > > >ISSUE: What constants can be in query answer bindings? In particular, > >can a query variable be bound in a query answer to an anonymous node in > >the RDF graph? > > 1. No. > 2. But in any case, an anonymous node is not a constant, so the > question doesn't even arise. > > HOWEVER, what this does raise as an issue is, is such a binding > considered to be to a *node* or to the label on the node? Its not > easy to see how to transfer a node as a binding, but maybe we need to > take that idea more seriously (?) > > > If so, what is the form and semantics of that binding? > >Also, can a query variable be bound in a query answer to an object that > >is entailed in the knowledge base (e.g., by a cardinality constraint) > >but whose identity is not known by the server? If so, what is the form > >and semantics of that binding? > > In general, *existential* variables in the antecedent (the query KB) > should never be passed out as bindings, since they have no meaning > outside their scope. So the issue here seems to me to be, how to > respond to a query in a way that indicates that a binding to query > variable exists, without passing the binding itself. That is what > that idea of having a <blank> binding was intended to do, of course. > > >Uniqueness of Answers > > > >In general, a query result may contain multiple query answers. > > > >ISSUE: What guarantees do we want to make about the distinctiveness of > >multiple query answers in a query result? We may want to guarantee that > >no two answers consist of identical sets of bindings. > > Why would we want to do that? This might be useful information, eg if > one queries > [?x ?y](exists (?z)(and (P ?x ?z)(P ?z ?y))) > > (Sorry about the KIF; query variables in brackets) > > then getting back two copies of the binding might allow you to infer > (if the KB is closed) that there are precisely two ?z's involved. (If > P is parent-of, this detects a certain family arrangement which is > universally frowned upon, for example.) > > > That would say > >that if A1 and A2 are query answers in a query result, then there exists > >a query variable xi for which the binding for xi in A1 is not the same > >constant as the binding for xi in A2. A more difficult guarantee would > >be that no two answers consist of equal sets of bindings. That would > >say that if A1 and A2 are query answers in a query result, then there > >exists a query variable xi for which the binding for xi in A1 is not > >equal to (i.e., does not denote the same object as) the binding for xi > >in A2. My recommendation is that we guarantee there are no identical > >sets of bindings > > I disagree. This is both needless, and onerous for the implementer. > > >and that we enable a query to include an indicator as > >to whether equal sets of bindings are acceptable. > > Again, I disagree. First, who is responsible for knowing that two > things are equal? The KB may not have the resources to be able to > prove this, but the querying engine might. But in any case, what is > the utility? We can allow people to include statements of equality in > the query if this issue bothers them. > > > > > > >Number of Answers > > > >I assume our query language needs to enable a query to specify what is > >being asked for and our query result language needs to enable a query > >result to contain the information requested in a query. > > You havn't said what this means, though. What IS the information > requested in a query? (Not just a debating point, this seems like a > central issue that we need to get clear, as many other things would > then be also clarified, I think.) > > >That > >specification in a query would include how many query answers are being > >requested and what information is being requested about how many query > >answers there are. Also, the query result language needs to enable > >including in a query result the information requested about how many > >query answers there are. > > > >ISSUE: Whether and how to include in a query the number of query answers > >being requested. For example, we might enable a query to include a "# > >answers requested" that could be either a non-negative integer, the > >constant "All", or the constant "Enumerator". If the number of answers > >requested is an integer n, the query is a request for as many query > >answers as the server can deduce up to n. If the number of answers > >requested is "All", the request is for as many query answers as the > >server can deduce. If the number of answers requested is "Enumerator", > >the request is for a process handle (continuation) that can be sent back > >to the server in a subsequent message to request that the server produce > >a "batch" of k query answers. Some convention is needed for the case > >where the number of answers requested is "All" and the server can deduce > >an infinite number of answers. Perhaps the convention would be that the > >server returns a process handle in that case. > > > >ISSUE: Whether and how to include in a query what information is being > >requested about how many query answers there are and in a query result > >the information requested. For example, we might enable a query to > >include a request for information as to how many query answers there are > >("# answers?") and for a query result to include a specification of how > >many query answers are entailed ("# answers entailed") in the form of an > >ordered paired denoting a closed interval. For example, a result > >containing the ordered pair [4,0-0] as the "# answers entailed" means > >that the server has determined that there are at least 4 answers to the > >query. (I am using "0-0" here to stand for the "infinity" symbol.) > >Note that a value of [0,0-0] for "# answers entailed" is always true. > >When a query includes a request of "# answers", the server is being > >asked to deduce whatever it can about how many answers there are. Note > >that a query can specify that "# answers requested" is zero so that the > >only information being requested is the number of answers. Also, note > >that a query result could contain a value for "# answers" even when that > >information was not requested in the query. > > This all seems rather like overkill. Do we really want to get into > infinite ordinals? > > Why not have the following distinctions: > query forms: > 1. one answer (ie first one found)/ > 2. (up to) N answers/ > 3. all answers. > replies: > 1. an answer or 'not provable'/ > 2. list of M answers, for some M<=N; if M<N then this means these are > all the answers/ > 3. some list of answers. > Issues: (1) what if there are infinitely many answers? Should we > provide some way to indicate this even though not all the bindings > can be provided? > (2) what about time? For example, it might be useful to get the first > answer when it is available, even if more have been requested. Which > suggests another dimension, viz. do you want the answers all at once > when the entire list is done, or each one when it is fresh? > > If we think about the query/answer business as a process, then maybe > we could specify the various query options in the form of process > descriptions of some kind. For example (making up the syntax here as > I go along), we might specify 'all answers' as: > (fetch answer) repeat. > where an exception of the form 'no more' indicates that you have them > all. Up to N would be > set n=0; (fetch answer; n=n+1) repeat until n=N > > This could be a kind of API protocol between the querying and replying agents. > > > > >Justification For Answers > > > >Under the assumption that our query language needs to enable a query to > >specify what is being asked for and our query result language needs to > >enable a query result to contain the information requested in a query, > >the query language needs to enable inclusion in a query a request for a > >justification for each query answer and the query result language needs > >to enable inclusion of such justifications in a query result. > > I don't see how this follows. Again, I think we need to get clearer > on what exactly counts as the information requested. If I hold up a > query then I am asking the KB to prove it for me. What do I want > back: the fact that it has been proved, some of the bindings in the > proof, the expressions used as assumptions in the proof, the entire > proof itself, all the proofs, or what? The last is probably the most > general form, but it would be overkill for many querying > applications. Still, it might be worth thinking of 'information > requested' as being ultimately the entire set of proofs, and then > defining various kinds of 'pruning' or simplification of this. That > would certainly cover the variable-bindings, for example, and it > might also cover justifications. (BTW, what is a justification, > exactly??) > > >ISSUE: Whether and how to include provisions for justifications to be > >requested in a query and be included in a query result. What kind of > >justification language(s) do we include in our query results language? > > I really would rather not keep inventing new languages for all this > stuff. We really don't need to. > > > > >Knowledge Base Structure Queries > > > >There seems to be a clear need to ask queries about the knowledge base > >itself as an artifact. The most compelling example involves determining > >the "direct" subclasses of a class or subproperties of a property. > > > >ISSUE: Whether and how to include such queries in our query language > >and query results language. The primary issue seems to be to what > >extent we allow intermingling of "structural queries" with "entailment > >queries". That is, we probably don't want to allow a request for direct > >subclasses to appear anywhere in a query pattern that an RDF statement > >(with variables) could appear. > > Well, it might be OK as long as it was clearly distinguishable as > being about the KB rather than entailed by it. > > Hmmm. I need to think about this a bit more. > > Pat > > > > >
Received on Tuesday, 6 November 2001 18:23:19 UTC