Informal
DQL Specification
DAML Joint Committee
Richard Fikes, Pat Hayes, Ian Horrocks, editors
May 24, 2002
DQL is a
formal language and protocol for posing queries from a querying agent (which we
refer to as the "client") to an answering agent (which we refer to as
the "server"). A DQL query
contains a "query pattern" that is a collection of DAML+OIL sentences
in which literals and/or resources have been replaced by variables. A query includes a specification of which of
the variables in the query pattern are designated as "distinguished
variables". We consider
query-answering using DQL to mean determining answers each of which includes a
binding for all of the distinguished variables such that the sentences produced
by applying the bindings to the query pattern and considering the remaining
variables in the query pattern to be existentially quantified produces
sentences that are entailed by the knowledge base with respect to which the
query was answered.
A DQL
query can optionally specify a DAML+OIL knowledge base that is referred to as
the "answer KB". If an answer
KB is specified in the query, the query is considered to be posed with respect
to that knowledge base. If no answer KB
is specified in the query, the server is free to select an answer KB. It is presumed that a query-answering server
will provide information as to the knowledge bases it considers when selecting an
answer KB and may also provide additional facilities in the query protocol for
a client to constrain and guide that choice.
Each
binding in a query answer is a description of a node in the RDF graph of the
answer KB. That is, DQL is designed for
answering queries of the form "What nodes of the answer KB denote objects
that make the query pattern true?"
Or, in the case that there are no distinguished variables in the query
pattern, the query is of the form "Is the query pattern true in the answer
KB?". A binding is a “minimal
identifying description” corresponding to the smallest connected subgraph of
the RDF graph of the answer KB that contains the node being described for which
all “tip” nodes (i.e., nodes not in a loop in the graph) are either literals or
have an associated URI.
DQL
specifies a core set of protocol elements that are to be used by a client to
obtain query answers from a server.
Specifically, DQL specifies that a client initiates a query-answering
dialogue with a server by sending the server a DQL query. The server is expected to respond by sending
answers to the client one or more at a time along with a “server continuation”
that is either a “process handle” which the client can use to request
additional answers or a token indicating that the server will not provide any
more answers to the query. The token
can be 'none', meaning that the server is claiming that there are no
further answers entailed by the answer KB, or 'unknown', meaning that
the server is making no claims as to whether there are more answers entailed by
the answer KB. No attempt is made here
to specify a complete interagent protocol (e.g., with provisions for time-outs,
error handling, resource budgets, etc.).
Query answering servers are required to support the specified core
protocol elements and are not constrained by the DQL specification as to how
additional protocol functionality is provided.
A client
initiates a query-answering dialogue with a server by sending the server a “DQL
query” that is specified as follows:
· Query Pattern - A DQL query necessarily includes a "query pattern" that specifies relationships among unknown sets of objects in a domain of discourse. Each unknown object is represented in the query pattern by a variable. A query pattern is a collection of DAML+OIL sentences in which some of the literals and resources have been replaced by variables.
· Distinguished Variables – A DQL query necessarily includes a specification of which of the variables that occur in the query pattern are "distinguished variables". We refer to the variables in the query pattern that are not distinguished variables as "non-distinguished variables".
· Knowledge Base - A DQL query optionally includes a specification of a DAML+OIL knowledge base that is referred to as the "answer KB". If an answer KB is specified in the query, the query is considered to be posed with respect to that knowledge base. If no answer KB is specified in the query, the server is free to select an answer KB. The answer KB can be a single knowledge base or a collection of knowledge bases, where a collection of knowledge bases is interpreted to mean the knowledge base consisting of the union of the knowledge bases specified in the collection. Each knowledge base is specified by a URI or is included in the query.[1]
· Query Premise - A DQL query can optionally include a "query premise" to facilitate if-then queries while still remaining within the expressiveness of DAML+OIL. A query premise is an arbitrary DAML+OIL knowledge base, specified by either a URI or included in the query. When a query premise is specified, the sentences in the query premise are considered to be included in the answer KB.
· Answer Justification Request – A DQL query can optionally include a request for a justification for each query answer. The content and structure of a justification for a query answer has not yet been designed. The intent is to specify various types of justifications that can be requested in a query. The types will include at least the following two: a full proof of the answer, and a set of sentences from the answer KB from which the answer can be proved.
A server
that has received a DQL query or a DQL query continuation from a client is
expected to respond by sending the client a “DQL answer bundle” that is
specified as follows:
·
Query Answers – A DQL answer bundle necessarily
includes a (possibly empty) collection of query answers. Each “DQL query answer” is specified as
follows:
· Query Bindings - A DQL query answer necessarily includes a set of "query bindings" for all of the query pattern's distinguished variables. Each binding is a “minimal identifying description” (MID) corresponding to the smallest connected subgraph of the RDF graph of the answer KB that contains the node being described for which all “tip” nodes (i.e., nodes not in a loop in the graph) are either literals or have an associated URI. In the case where the node is a literal or has an associated URI, the binding is simply the literal or the URI. In the case of an anonymous node, the binding is a description (in the Description Logic sense) consisting of the arcs coming into and going out from the node in the graph. Such a description might say, for example, "a parent of Joe that has Paris as a hometown and two male siblings". The MID of a node in effect consists of the conjunction of the RDF statements defined by the arcs into and out of the node, where each node in the description is specified either by its associated URI, its associated literal, or by its MID (i.e., if an anonymous node is related to another anonymous node, then the MID of either of those nodes will include the description of the other. For example, a MID might be "a parent of a sister of Bill", where neither the parent nor the sister has a name.). A MID of an anonymous node will contain a variable for that node. For example, the MID "a parent of Joe that has Paris as a hometown and two male siblings" would be "(and (parentOf ?p Joe) (hometownOf ?p Paris) ...". We assume that if a MID of an anonymous node is a binding for a distinguished variable ?v, that ?v is the variable in the MID for that node.
Applying a binding that is a MID of an anonymous node to a query pattern means conjoining the MID with the query pattern (and leaving in the query pattern the distinguished variable for which the MID is a binding). An answer's query bindings are such that the sentences produced by applying the bindings to the query pattern and considering the remaining variables in the query pattern (including those in the conjoined MIDs) to be existentially quantified, produces sentences that are entailed by the knowledge base with respect to which the query was answered.
· Answer Justification – A DQL query answer necessarily includes an “answer justification” exactly when the query includes an answer justification request. The content and structure of an answer justification has not yet been designed.
·
Server Continuation – A DQL answer bundle necessarily
includes a “server continuation” which is either a token indicating that the
server will not provide any more answers to the query or a “DQL process handle”
that when returned to the server enables it to continue its query-answering
process from the point where it stopped when it produced the query response
containing the continuation. When the
server will not provide any more answers to the query, the server continuation
is either 'none', indicating that there are no further answers entailed
by the answer KB, or 'unknown', indicating that more answers may be
entailed by the answer KB. When the answer set is empty, the server
continuation must be either the token 'none' or the token 'unknown'.
·
Answer KB – A DQL answer bundle for a query
that does not include a specific answer KB necessarily includes a specification of the "answer
KB" with respect to which the query is being answered. The answer KB can be a single knowledge base
or a collection of knowledge bases, where a collection of knowledge bases is interpreted
to mean the knowledge base consisting of the union of the knowledge bases
specified in the collection. Each
knowledge base is specified by a URI.
· Query - A DQL query response necessarily includes the query to which it is a response.
· Server – A DQL query response necessarily includes a URI identifying the server that produced the response.
When
multiple answers for a query are produced, each answer is required to have a
unique set of bindings in the sense that no two sets of bindings have identical
bindings for every distinguished variable.[2]
A client
that has received a DQL answer bundle containing a DQL process handle can request
more answers to the query from the server that sent the answer bundle by
sending that server a “DQL query continuation”, specified as follows:
·
Server Process Handle
– A DQL continuation request necessarily includes the DQL process handle from
the server continuation of the server’s most recently produced answer bundle
for a given query.
This protocol can be used in more elaborate querying protocols that find all answers or the first N answers, etc., without thereby requiring all query-answering servers to maintain the machinery needed to support such functionality, and also keeping the basic querying language arithmetic-free. The protocol gives the server the freedom to produce one or more of the answers to a query each time a request is received from a client. The server can either provide a “process handle” that enables a client to ask for additional answers or provide all the answers it is going to produce in a single response.
The language and protocol contains no explicit constructs
for asking how many (or how many more) answers there are to a given query. Defining what is meant by “how many” is problematic
in that there can be multiple bindings for a given distinguished variable that
all denote the same object in the domain of discourse, so that how many answer bindings
there are for a given distinguished variable will in general differ from how
many answer objects in the domain of discourse that variable can denote.
The core protocol could reasonably be extended to support “how many”
queries, where “how many” means how many answers containing distinct sets of
bindings can the server produce. The difficulty
of a server determining how many answers it can produce to a query without
actually producing the answers has been the primary rationale for not including
a “how many” construct in the query language.
[1] Consideration is being given to developing a knowledge base description ontology and allowing a knowledge base specification in a query to describe a class of knowledge bases with the interpretation being that the answer KB for that query can be any instance of that class.
[2] Note that this uniqueness requirement does not prevent two sets of bindings having bindings that denote identical objects for every distinguished variable.