How many answers? (was: Re: existential answers to queries (was: Re: on behalf of sandro)) from Pat Hayes on 2001-10-25 (w3c-rdfcore-wg@w3.org from October 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Thu, 25 Oct 2001 13:57:25 -0500
To: Richard Fikes <fikes@KSL.Stanford.EDU>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101091b7fdff733fc5@[205.160.76.193]>
>
>One of the points I think is important in the query language is for a
>query to include a component that says how many answers are being asked
>for.

OK, as long as it is possible to omit it and still get a sensible answer.

>(Call it a value for property "answersRequested".)  Is it all the
>parents?

Or all the parents that the KB knows about? OR do you want to know if 
the KB *knows* that this is all there are, as opposed to just not 
knowing about any others?

>  Is it at most one parent?  If the request is for all the
>parents, and all the server knows is that there are two of them, then
>that information should be stateable in our answer language.

I can see all kinds of issues here.
case1.
Query has simple 'logical' form eg: ?[x](Foo(x))
Answers can range in amount of closure information provided, eg
yes
yes [[x/a]] (but I could go on....)
yes [[x/a]  [x/b]] and no more in KB.
yes [[x/a]  [x/b]] and guaranteed no others.
yes [[x/blank] [x/blank]]  (meaning, at least two)
etc?
case2.
Query has to select kind of answer required, eg
?[x:allCases](Foo(x)) :SearchToEnd
Answer is always unique.

Of these, 1 seems preferable to me. The querier isn't obliged to pack 
all the selecting into the query, and there is a clear distinction 
between the entailed query and the meta-information about the 
querying process.  The only apparent problem with 1 is that all 
queries are open-ended in the effort required, no way to say 'give me 
a quick answer'. I think that could be handled by assuming a 
sequential answering protocol, so that one might get a simple answer 
first which could be elaborated later.

One problem I can see with either case is that in general, a vector 
of queries requires a *matrix* of answers (or alternative queries), 
if we are going to allow arbitrary patterns of 'not yet finished' in 
the responses (or queries).

What I mean is, suppose the query is ?[x, y] (...), and suppose that 
the querying engine has found, say,
[[x/a y/a] [x/b y/c]]. Has it finished looking? Well, there are 
several senses. Maybe it knows that there are no other bindings of y 
that would be paired with the x/a binding, for example, but it hasn't 
yet finished checking all the x bindings. Or perhaps it knows that 
there are no other x bindings that would pair with the y/c binding, 
but it hasnt finished checking other y bindings, and its not yet sure 
about all the x bindings for the case y/a. Any and all such 
combinations are possible states of incompleteness in its search 
process. Here's a way to represent this: think of a binary matrix 
with possible bindings for x defining the columns and those for y the 
rows. Then any binding is an entry in this table, and any row or 
column may or may not have been completely surveyed. So if the table 
is nxm, then there are n.m possible bindings and 2|(n+m) possible 
states of search incompleteness, all of which need to be somehow 
expressible in a perfectly general-purpose query or answer language. 
(Maybe we can just not have full generality, eg just distinguish 
between 'no more answer bindings' and 'maybe more...' But the kinds 
of queries being considered seem to require more than this, eg if we 
want to ask who are all of Clyde's parents, we seem to need to be 
able to focus in on one column or row of the matrix and insist on 
that one being completed. )

>  That could
>be done with a separate component of the result that says how many
>answers the server can conclude there are (exactly 2 in this case)
>(using a prop, say "answersFound", whose value is a closed interval) or
>it could be done by letting the bindings returned contain some kind of
>surrogates for the 2 parents.  I thought that's what was being done by
>proposals (b) and (c) in which query variables are being bound to such
>surrogates.  The server then communicates that it knows there are two
>parents by returning two sets of bindings, each containing an
>existential variable surrogate.

Right, that is what I had in mind.

>  In either case, the server also needs
>to be able to say in the query result when it knows that the bindings
>returned (or the value of answersFound) are provably all of the answers.

Provably all, or just all that it can find after a complete search? 
(Provably all makes the query a lot harder to answer.)

>
>Your example queries seem useful to consider:
>
>>  1. Simple existential query:
>>  Q: Is there anything in the soup?
>>  K: Yes.
>
>The query could be "exists ?x RDF(contains MySoup ?x)".  Note: no query
>variables.

Right.

>The query result would contain one empty set of bindings indicating one
>affirmative answer and a value of "All" for "answersFound".

If there are no query variables, there is nothing to bind, right?

>  > 2.  Existential query with answer variable binding:
>>  Q: Is there anything in the soup, and if so what is it called?
>>  K: Yes, but I'm not going to tell you what I call it.
>
>The query could be "RDF(contains MySoup ?x)".  The query as stated is
>ambiguous as to whether everything that can be found in the soup is to
>be returned; i.e., it doesn't indicate how many answers are wanted.
>Note: ?x is a query variable because it is free.

I would rather have an explicit query binder. But OK for now.

>If all the server could conclude is that there exists something in the
>soup, then the query result would either contain one set of bindings
>with a binding of ?x to a surrogate for the existential variable, or if
>we are not allowing such surrogates (i.e., proposal (a)), then there
>would be no bindings and answersFound would have a value [1 o-o].  In
>either case, the result would indicate that the server does not know
>whether the answers returned are all of the answers.
>
>>  Q: Is there anything in the soup, and if so what is it called?
>>  K: Yes, and it is called Fred.
>>  Q: What kind of thing is Fred?
>>  K: Fred is a frog.
>
>The query could be "RDF(contains MySoup ?x)".  The query as stated is
>ambiguous as to whether everything that can be found in the soup is to
>be returned; i.e., it doesn't indicate how many answers are wanted.
>Note: ?x is a query variable because it is free.
>
>If the server concludes that Fred is the only thing in the soup, then
>the query result would contain one set of bindings with a binding of ?x
>to Fred, answersFound would have a value 1, the result would indicate
>that the answers returned are all of the answers.
>
>Since the client now knows the name of the thing in the soup (i.e.,
>Fred), it can pose follow-up queries to the server to obtain whatever
>information it wants to know about Fred.  In the example, the follow-up
>query is "RDF(type Fred ?c)".

The point of the frog called Fred was only to illustrate the 
difference between a plain assertion of existence, and handing back a 
constant name (even a skolem constant). The latter allows the 
questioner to ask for more information about the thing, while the 
base existential does not.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Thursday, 25 October 2001 14:57:34 UTC