Informal DQL Specification

DAML Joint Committee

Richard Fikes, Pat Hayes, Ian Horrocks, editors

May 24, 2002

Overview

DQL is a formal language and protocol for posing queries from a querying agent (which we refer to as the "client") to an answering agent (which we refer to as the "server").  A DQL query contains a "query pattern" that is a collection of DAML+OIL sentences in which literals and/or resources have been replaced by variables.  A query includes a specification of which of the variables in the query pattern are designated as "distinguished variables".  We consider query-answering using DQL to mean determining answers each of which includes a binding for all of the distinguished variables such that the sentences produced by applying the bindings to the query pattern and considering the remaining variables in the query pattern to be existentially quantified produces sentences that are entailed by the knowledge base with respect to which the query was answered.

A DQL query can optionally specify a DAML+OIL knowledge base that is referred to as the "answer KB".  If an answer KB is specified in the query, the query is considered to be posed with respect to that knowledge base.  If no answer KB is specified in the query, the server is free to select an answer KB.  It is presumed that a query-answering server will provide information as to the knowledge bases it considers when selecting an answer KB and may also provide additional facilities in the query protocol for a client to constrain and guide that choice.

Each binding in a query answer is a description of a node in the RDF graph of the answer KB.  That is, DQL is designed for answering queries of the form "What nodes of the answer KB denote objects that make the query pattern true?"  Or, in the case that there are no distinguished variables in the query pattern, the query is of the form "Is the query pattern true in the answer KB?".  A binding is a “minimal identifying description” corresponding to the smallest connected subgraph of the RDF graph of the answer KB that contains the node being described for which all “tip” nodes (i.e., nodes not in a loop in the graph) are either literals or have an associated URI. 

DQL specifies a core set of protocol elements that are to be used by a client to obtain query answers from a server.  Specifically, DQL specifies that a client initiates a query-answering dialogue with a server by sending the server a DQL query.  The server is expected to respond by sending answers to the client one or more at a time along with a “server continuation” that is either a “process handle” which the client can use to request additional answers or a token indicating that the server will not provide any more answers to the query.  The token can be 'none', meaning that the server is claiming that there are no further answers entailed by the answer KB, or 'unknown', meaning that the server is making no claims as to whether there are more answers entailed by the answer KB.  No attempt is made here to specify a complete interagent protocol (e.g., with provisions for time-outs, error handling, resource budgets, etc.).  Query answering servers are required to support the specified core protocol elements and are not constrained by the DQL specification as to how additional protocol functionality is provided.

Detailed Specification

DQL Query

A client initiates a query-answering dialogue with a server by sending the server a “DQL query” that is specified as follows:

·        Query Pattern - A DQL query necessarily includes a "query pattern" that specifies relationships among unknown sets of objects in a domain of discourse.  Each unknown object is represented in the query pattern by a variable.  A query pattern is a collection of DAML+OIL sentences in which some of the literals and resources have been replaced by variables.

·        Distinguished Variables – A DQL query necessarily includes a specification of which of the variables that occur in the query pattern are "distinguished variables".  We refer to the variables in the query pattern that are not distinguished variables as "non-distinguished variables".

·        Knowledge Base - A DQL query optionally includes a specification of a DAML+OIL knowledge base that is referred to as the "answer KB".  If an answer KB is specified in the query, the query is considered to be posed with respect to that knowledge base.  If no answer KB is specified in the query, the server is free to select an answer KB.  The answer KB can be a single knowledge base or a collection of knowledge bases, where a collection of knowledge bases is interpreted to mean the knowledge base consisting of the union of the knowledge bases specified in the collection.  Each knowledge base is specified by a URI or is included in the query.[1]

·        Query Premise - A DQL query can optionally include a "query premise" to facilitate if-then queries while still remaining within the expressiveness of DAML+OIL.  A query premise is an arbitrary DAML+OIL knowledge base, specified by either a URI or included in the query.  When a query premise is specified, the sentences in the query premise are considered to be included in the answer KB.

·        Answer Justification Request – A DQL query can optionally include a request for a justification for each query answer.  The content and structure of a justification for a query answer has not yet been designed.  The intent is to specify various types of justifications that can be requested in a query.  The types will include at least the following two: a full proof of the answer, and a set of sentences from the answer KB from which the answer can be proved.

DQL Answer Bundle

A server that has received a DQL query or a DQL query continuation from a client is expected to respond by sending the client a “DQL answer bundle” that is specified as follows:

·        Query Answers – A DQL answer bundle necessarily includes a (possibly empty) collection of query answers.  Each “DQL query answer” is specified as follows:

·        Query Bindings - A DQL query answer necessarily includes a set of "query bindings" for all of the query pattern's distinguished variables.  Each binding is a “minimal identifying description” (MID) corresponding to the smallest connected subgraph of the RDF graph of the answer KB that contains the node being described for which all “tip” nodes (i.e., nodes not in a loop in the graph) are either literals or have an associated URI.  In the case where the node is a literal or has an associated URI, the binding is simply the literal or the URI.  In the case of an anonymous node, the binding is a description (in the Description Logic sense) consisting of the arcs coming into and going out from the node in the graph.  Such a description might say, for example, "a parent of Joe that has Paris as a hometown and two male siblings".  The MID of a node in effect consists of the conjunction of the RDF statements defined by the arcs into and out of the node, where each node in the description is specified either by its associated URI, its associated literal, or by its MID (i.e., if an anonymous node is related to another anonymous node, then the MID of either of those nodes will include the description of the other.  For example, a MID might be "a parent of a sister of Bill", where neither the parent nor the sister has a name.).  A MID of an anonymous node will contain a variable for that node.  For example, the MID "a parent of Joe that has Paris as a hometown and two male siblings" would be "(and (parentOf ?p Joe) (hometownOf ?p Paris) ...".  We assume that if a MID of an anonymous node is a binding for a distinguished variable ?v, that ?v is the variable in the MID for that node.

Applying a binding that is a MID of an anonymous node to a query pattern means conjoining the MID with the query pattern (and leaving in the query pattern the distinguished variable for which the MID is a binding).  An answer's query bindings are such that the sentences produced by applying the bindings to the query pattern and considering the remaining variables in the query pattern (including those in the conjoined MIDs) to be existentially quantified, produces sentences that are entailed by the knowledge base with respect to which the query was answered.

·        Answer Justification – A DQL query answer necessarily includes an “answer justification” exactly when the query includes an answer justification request.  The content and structure of an answer justification has not yet been designed. 

·        Server Continuation – A DQL answer bundle necessarily includes a “server continuation” which is either a token indicating that the server will not provide any more answers to the query or a “DQL process handle” that when returned to the server enables it to continue its query-answering process from the point where it stopped when it produced the query response containing the continuation.  When the server will not provide any more answers to the query, the server continuation is either 'none', indicating that there are no further answers entailed by the answer KB, or 'unknown', indicating that more answers may be entailed by the answer KB. When the answer set is empty, the server continuation must be either the token 'none' or the token 'unknown'.

·        Answer KB – A DQL answer bundle for a query that does not include a specific answer KB necessarily includes a specification of the "answer KB" with respect to which the query is being answered.  The answer KB can be a single knowledge base or a collection of knowledge bases, where a collection of knowledge bases is interpreted to mean the knowledge base consisting of the union of the knowledge bases specified in the collection.  Each knowledge base is specified by a URI.

·        Query - A DQL query response necessarily includes the query to which it is a response.

·        Server – A DQL query response necessarily includes a URI identifying the server that produced the response.

When multiple answers for a query are produced, each answer is required to have a unique set of bindings in the sense that no two sets of bindings have identical bindings for every distinguished variable.[2]    

DQL Query Continuation

A client that has received a DQL answer bundle containing a DQL process handle can request more answers to the query from the server that sent the answer bundle by sending that server a “DQL query continuation”, specified as follows: 

·        Server Process Handle – A DQL continuation request necessarily includes the DQL process handle from the server continuation of the server’s most recently produced answer bundle for a given query.

Notes and Commentary

Layered Protocols

This protocol can be used in more elaborate querying protocols that find all answers or the first N answers, etc., without thereby requiring all query-answering servers to maintain the machinery needed to support such functionality, and also keeping the basic querying language arithmetic-free.  The protocol gives the server the freedom to produce one or more of the answers to a query each time a request is received from a client.  The server can either provide a “process handle” that enables a client to ask for additional answers or provide all the answers it is going to produce in a single response.

“How Many” Queries

The language and protocol contains no explicit constructs for asking how many (or how many more) answers there are to a given query.  Defining what is meant by “how many” is problematic in that there can be multiple bindings for a given distinguished variable that all denote the same object in the domain of discourse, so that how many answer bindings there are for a given distinguished variable will in general differ from how many answer objects in the domain of discourse that variable can denote.  The core protocol could reasonably be extended to support “how many” queries, where “how many” means how many answers containing distinct sets of bindings can the server produce.  The difficulty of a server determining how many answers it can produce to a query without actually producing the answers has been the primary rationale for not including a “how many” construct in the query language.



[1] Consideration is being given to developing a knowledge base description ontology and allowing a knowledge base specification in a query to describe a class of knowledge bases with the interpretation being that the answer KB for that query can be any instance of that class.

[2] Note that this uniqueness requirement does not prevent two sets of bindings having bindings that denote identical objects for every distinguished variable.