- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Thu, 12 Jan 2006 13:37:34 -0800
- To: public-rdf-dawg-comments@w3.org
The set of RDF terms is unbounded, including all RDF literals, whether or not they appear in the graph. In addition, the set of IRIs may be an unbounded set (or perhaps you intend only those IRIs that appear in the graph; this should be clarified). Given that the set of RDF terms is unbounded, what is the result if a variable in the SELECT list is completely unconstrained by the WHERE clause? For example SELECT ?x WHERE { FILTER (?x = ?x) } It seems that the answer to this query must consist of all IRIs (possibly just all IRIs in the default graph), all RDF literals and all blank nodes in the default graph. It's probably an undecidable problem to characterize all such runaway queries (equivalent to the halting problem?). Perhaps we can expect that the user will not be interested in formulating such queries. However, they may still arise accidentally, for example, through typos, as in this query: SELECT ?name WHERE { [] v:isnamed ?nam }. And, from an implementation standpoint, it would be desirable if the definition of a solution could be restricted so that all solutions in some sense begin with triples in the graph, to avoid the theoretical need to enumerate all conceivable IRIs and RDF literals. Provisionally I propose the concept of "known RDF term" consisting of all IRIs in the graph, all RDF literals in the graph, and all blank nodes in the graph. Using this notion, a pattern solution would be a mapping of variables and SPARQL blank nodes to known RDF terms. However, this would introduce a different problem. Consider the query: tell me the price plus tax of the items for sale in the graph. Currently this can be written SELECT ?z WHERE { [] v:price ?x . [] v:tax ?y . FILTER ( ?z = ?x + ?y ) } If we restrict the range of variables to just values already in the graph, it is quite possible that there will be a price plus tax that is not a value in the graph. To solve this, I see three possible directions: 1. a LET capability as in XQuery. The preceding query might then be written SELECT ?z WHERE { [] v:price ?x . [] v:tax ?y . LET ?z = ?x + ?y } 2. An implicit LET capability that allows values to become known RDF terms, using the FILTER clause as in the example above, with adequate syntactic definitions to make this precise and sound. Specifying such syntactic requirements is messy, but basically it might be that there is an equality predicate with a single variable on the left hand side, the equality predicate is the sole contents of the FILTER clause or is only conjoined using && to other predicates, and the variable on the left hand side appears in only one such predicate. Additional rules regarding placement relative to UNION and OPTIONAL may also be necessary. 3. permit expressions in the SELECT list. In that case the query can be formulated SELECT ?x + ?y WHERE { [] v:price ?x . [] v:tax ?y } Of the three alternatives, I think users will find the third the most convenient. My second choice would be a LET clause. I think implicit LET may be too complicated to specify correctly. Finally, coming back to the problem of runaway queries caused by typos, this could be ameliorated by either a warning or perhaps an outright error if the query result involves an unbound variable. Fred Zemke
Received on Thursday, 12 January 2006 21:37:41 UTC