Re: Query Language Issues from Dan Brickley on 2001-11-06 (w3c-rdfcore-wg@w3.org from November 2001)

From: Dan Brickley <danbri@w3.org>
Date: Tue, 6 Nov 2001 18:23:17 -0500 (EST)
To: Pat Hayes <phayes@ai.uwf.edu>
cc: Richard Fikes <fikes@ksl.stanford.edu>, <w3c-rdfcore-wg@w3.org>
Message-ID: <Pine.LNX.4.30.0111061819390.17483-100000@tux.w3.org>
Very interesting and all, especially the blank nodes stuff. But I suspect
you really meant to send this to the DAML+OIL joint committee list.
Intrigued RDF Core WG members can read their archives linked from
http://www.daml.org/

BTW I'm seeing a *lot* of RDF query discussion happening offlist, or scattered
around the Web. A while back we set up rdf-rules@w3.org to provide a home
for such discussion. It never got widely announced (in large part because
of the difficulty scoping it w.r.t. rdf-logic list). I'll see about
getting that done soon.

Dan

On Tue, 6 Nov 2001, Pat Hayes wrote:

> >I said last week that I would send out a message articulating the issues
> >we have been discussing regarding query-answering.  This is that
> >message.  My apologies for not sending the message out sooner.  I am
> >back at home (and my father is recuperating well from his surgery), but
> >I did not have time to put this together sooner.
> >
> >As per my earlier messages, my student Yulin Li has been significant
> >contributions to the material in this message.
> >
> >Richard
> >
> >-----------------------
> >
> >This message is to articulate issues involved in specifying and
> >formalizing the information content of a "query-answering discourse"
> >between two agents in which one agent seeks information from a second
> >agent by sending a "query" to the second agent.
> >
> >I will refer to the agent sending the query as the "client" and the
> >agent receiving the query as the "server".  I will refer to the response
> >sent by the server to the client as a "query result" and will assume
> >that a query result may contain one or more "query answers".
> >
> >
> >Knowledge Base
> >
> >I assume a query is posed with respect to a knowledge base that is a
> >DAML+OIL representation of a logical theory.  Thus, a query needs to
> >include a reference to a DAML+OIL knowledge base.  I will refer to that
> >knowledge base as the "query KB".
> >
> >
> >Query Premise
> >
> >We have discussed enabling a query to include a premise that is to be
> >added to the query KB so that the query is being asked of the query KB
> >unioned with the premise.  A premise essentially facilitates queries of
> >the form if-then while still remaining within the expressiveness of
> >DAML+OIL.
> >
> >ISSUE: Do we want to enable the inclusion of a premise in a query, and
> >if so what is the form of a premise?  I recommend that we do allow
> >premises and that they be an arbitrary DAML+OIL knowledge base.
> >
>
> I can't quite see the point of this. Presumably, this amounts to
> having the query KB be the conjunction of the original query KB and
> the premis, is that the idea? But we can do this simply by including
> a reference to the original query KB into the premis, then using the
> premis alone as the query KB. In the DAML+OIL world, any ontology can
> include any other ontology, simply by referring to it, so there is no
> need to invoke any special provision for conjoining two of them in a
> special way.
>
> >Query Pattern
> >
> >I assume a query contains a "query pattern" that specifies relationships
> >among unknown sets of objects in a domain of discourse.  Each unknown
> >object is represented in the query pattern by a "query variable".
> >Answering a query with query variables x1,…,xn involves identifying
> >tuples of object constants
>
> Why only constants?
>
> >such that for any such tuple  <c1,…,cn>, if
> >ci is substituted for xi in the query pattern for i=1,…,n, then the
> >resulting "query pattern instance" specifies a sentence that is entailed
> >by the query KB.
>
> That is the same as saying that the query is entailed by the KB, if
> those are existential variables. (The logic is already doing this for
> you, you don't need to do it all again :-)
>
> >For each such ci, the pair [xi, ci] is called a "query
> >variable binding" for the corresponding query variable xi.  Each set of
> >query variable bindings specifies a candidate answer to the query.
> >
> >ISSUE: What is the expressive power of the query pattern language?  For
> >example, we might decide that a query pattern instance can specify a
> >sentence that is a disjunction of a conjunction of RDF statements or is
> >a negation of a conjunction of RDF sentences, or ….  My recommendation
> >is that we define the query pattern language so that a query pattern
> >instance specifies a conjunction of sentences each of which is
> >representable in DAML+OIL (i.e., a conjunction of RDF statements).
> >
> >
> >Answer Mapping Function
> >
> >A query needs to specify a mapping of query variable bindings into query
> >answers.
>
> ?? What is a query answer, exactly? (I thought that bindings to query
> variables *were* query answers (??))
>
> >In particular, a query answer may include or make use of
> >bindings to only a subset of the query variables.
> >
> >ISSUE:  What is the nature of the answer mapping function language?  For
> >example, a query answer might be allowed to be any s-expression
>
> Whoa! How did Sexpressions get in there? Aren't we talking about DAML ??
>
> >whose
> >atomic elements are bindings to specified variables.  (E.g., map the
> >bindings [x1, c1], [x2,c2], [x3,c3], and [x4,c4] to the s-expression
> >"(c1 (c2 c3) c4)".)  I think all that matters for our formalization and
> >core design work is that a query answer may make use of the bindings to
> >only a subset of the query variables.  So, my recommendation is that for
> >now we consider a query answer to consist of a set of bindings for a
> >subset of the query variables
>
> That would be my natural inclination also, but I think it needs to be
> extended some. For example, we might want to distinguish between the
> case where the query is said to be OK but no bindings are provided,
> from the case where the query is simply answered with 'no' or 'fail'
> or whatever. Ian wants to distinguish the 'don't know' (ie unprovable
> from KB) answer case from the 'no' (ie, contradictory with the KB)
> cases. So in general there might be other information in an answer
> than just the bindings alone.
>
> >, and that the query specifies which query
> >variables are in that subset.  I will make that assumption in the
> >remainder of this document.
>
> Wait. The *query* specifies which query variables are in the set? So
> there could be query variables which are not being, as it were,
> queried? (Then why did you call them query variables?)
>
> >
> >ISSUE:  What constants can be in query answer bindings?  In particular,
> >can a query variable be bound in a query answer to an anonymous node in
> >the RDF graph?
>
> 1. No.
> 2. But in any case, an anonymous node is not a constant, so the
> question doesn't even arise.
>
> HOWEVER, what this does raise as an issue is, is such a binding
> considered to be to a *node* or to the label on the node? Its not
> easy to see how to transfer a node as a binding, but maybe we need to
> take that idea more seriously (?)
>
> >  If so, what is the form and semantics of that binding?
> >Also, can a query variable be bound in a query answer to an object that
> >is entailed in the knowledge base (e.g., by a cardinality constraint)
> >but whose identity is not known by the server?  If so, what is the form
> >and semantics of that binding?
>
> In general, *existential* variables in the antecedent (the query KB)
> should never be passed out as bindings, since they have no meaning
> outside their scope. So the issue here seems to me to be, how to
> respond to a query in a way that indicates that a binding to query
> variable exists, without passing the binding itself. That is what
> that idea of having a <blank> binding was intended to do, of course.
>
> >Uniqueness of Answers
> >
> >In general, a query result may contain multiple query answers.
> >
> >ISSUE: What guarantees do we want to make about the distinctiveness of
> >multiple query answers in a query result?  We may want to guarantee that
> >no two answers consist of identical sets of bindings.
>
> Why would we want to do that? This might be useful information, eg if
> one queries
> [?x ?y](exists (?z)(and (P ?x ?z)(P ?z ?y)))
>
> (Sorry about the KIF; query variables in brackets)
>
> then getting back two copies of the binding might allow you to infer
> (if the KB is closed) that there are precisely two ?z's involved. (If
> P is parent-of, this detects  a certain family arrangement which is
> universally frowned upon, for example.)
>
> >  That would say
> >that if A1 and A2 are query answers in a query result, then there exists
> >a query variable xi for which the binding for xi in A1 is not the same
> >constant as the binding for xi in A2.  A more difficult guarantee would
> >be that no two answers consist of equal sets of bindings.  That would
> >say that if A1 and A2 are query answers in a query result, then there
> >exists a query variable xi for which the binding for xi in A1 is not
> >equal to (i.e., does not denote the same object as) the binding for xi
> >in A2.  My recommendation is that we guarantee there are no identical
> >sets of bindings
>
> I disagree. This is both needless, and onerous for the implementer.
>
> >and that we enable a query to include an indicator as
> >to whether equal sets of bindings are acceptable.
>
> Again, I disagree. First, who is responsible for knowing that two
> things are equal? The KB may not have the resources to be able to
> prove this, but the querying engine might. But in any case, what is
> the utility? We can allow people to include statements of equality in
> the query if this issue bothers them.
>
> >
> >
> >Number of Answers
> >
> >I assume our query language needs to enable a query to specify what is
> >being asked for and our query result language needs to enable a query
> >result to contain the information requested in a query.
>
> You havn't said what this means, though. What IS the information
> requested in a query? (Not just a debating point, this seems like a
> central issue that we need to get clear, as many other things would
> then be also clarified, I think.)
>
> >That
> >specification in a query would include how many query answers are being
> >requested and what information is being requested about how many query
> >answers there are.  Also, the query result language needs to enable
> >including in a query result the information requested about how many
> >query answers there are.
> >
> >ISSUE: Whether and how to include in a query the number of query answers
> >being requested.  For example, we might enable a query to include a "#
> >answers requested" that could be either a non-negative integer, the
> >constant "All", or the constant "Enumerator".  If the number of answers
> >requested is an integer n, the query is a request for as many query
> >answers as the server can deduce up to n.  If the number of answers
> >requested is "All", the request is for as many query answers as the
> >server can deduce.  If the number of answers requested is "Enumerator",
> >the request is for a process handle (continuation) that can be sent back
> >to the server in a subsequent message to request that the server produce
> >a "batch" of k query answers.  Some convention is needed for the case
> >where the number of answers requested is "All" and the server can deduce
> >an infinite number of answers.  Perhaps the convention would be that the
> >server returns a process handle in that case.
> >
> >ISSUE: Whether and how to include in a query what information is being
> >requested about how many query answers there are and in a query result
> >the information requested.  For example, we might enable a query to
> >include a request for information as to how many query answers there are
> >("# answers?") and for a query result to include a specification of how
> >many query answers are entailed ("# answers entailed") in the form of an
> >ordered paired denoting a closed interval.  For example, a result
> >containing the ordered pair [4,0-0] as the "# answers entailed" means
> >that the server has determined that there are at least 4 answers to the
> >query.  (I am using "0-0" here to stand for the "infinity" symbol.)
> >Note that a value of [0,0-0] for "# answers entailed" is always true.
> >When a query includes a request of "# answers", the server is being
> >asked to deduce whatever it can about how many answers there are.   Note
> >that a query can specify that "# answers requested" is zero so that the
> >only information being requested is the number of answers.  Also, note
> >that a query result could contain a value for "# answers" even when that
> >information was not requested in the query.
>
> This all seems rather like overkill. Do we really want to get into
> infinite ordinals?
>
> Why not have the following distinctions:
> query forms:
> 1.  one answer (ie first one found)/
> 2. (up to) N answers/
> 3. all answers.
> replies:
> 1. an answer or 'not provable'/
> 2. list of M answers, for some M<=N; if M<N then this means these are
> all the answers/
> 3. some list of answers.
> Issues: (1) what if there are infinitely many answers? Should we
> provide some way to indicate this even though not all the bindings
> can be provided?
> (2) what about time? For example, it might be useful to get the first
> answer when it is available, even if more have been requested. Which
> suggests another dimension, viz. do you want the answers all at once
> when the entire list is done, or each one when it is fresh?
>
> If we think about the query/answer business as a process, then maybe
> we could specify the various query options in the form of process
> descriptions of some kind. For example (making up the syntax here as
> I go along), we might specify 'all answers' as:
> (fetch answer) repeat.
> where an exception of the form 'no more' indicates that you have them
> all. Up to N would be
> set n=0; (fetch answer; n=n+1) repeat until n=N
>
> This could be a kind of API protocol between the querying and replying agents.
>
> >
> >Justification For Answers
> >
> >Under the assumption that our query language needs to enable a query to
> >specify what is being asked for and our query result language needs to
> >enable a query result to contain the information requested in a query,
> >the query language needs to enable inclusion in a query a request for a
> >justification for each query answer and the query result language needs
> >to enable inclusion of such justifications in a query result.
>
> I don't see how this follows.  Again, I think we need to get clearer
> on what exactly counts as the information requested. If I hold up a
> query then I am asking the KB to prove it for me. What do I want
> back: the fact that it has been proved, some of the bindings in the
> proof, the expressions used as assumptions in the proof, the entire
> proof itself, all the proofs, or what? The last is probably the most
> general form, but it would be overkill for many querying
> applications. Still, it might be worth thinking of 'information
> requested' as being ultimately the entire set of proofs, and then
> defining various kinds of 'pruning' or simplification of this. That
> would certainly cover the variable-bindings, for example, and it
> might also cover justifications. (BTW, what is a justification,
> exactly??)
>
> >ISSUE: Whether and how to include provisions for justifications to be
> >requested in a query and be included in a query result.  What kind of
> >justification language(s) do we include in our query results language?
>
> I really would rather not keep inventing new languages for all this
> stuff. We really don't need to.
>
> >
> >Knowledge Base Structure Queries
> >
> >There seems to be a clear need to ask queries about the knowledge base
> >itself as an artifact.  The most compelling example involves determining
> >the "direct" subclasses of a class or subproperties of a property.
> >
> >ISSUE:  Whether and how to include such queries in our query language
> >and query results language.  The primary issue seems to be to what
> >extent we allow intermingling of "structural queries" with "entailment
> >queries".  That is, we probably don't want to allow a request for direct
> >subclasses to appear anywhere in a query pattern that an RDF statement
> >(with variables) could appear.
>
> Well, it might be OK as long as it was clearly distinguishable as
> being about the KB rather than entailed by it.
>
> Hmmm. I need to think about this a bit more.
>
> Pat
>
>
>
>
>
Received on Tuesday, 6 November 2001 18:23:19 UTC