- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Thu, 12 Apr 2012 07:57:31 -0400
- To: W Martens <martens.wim@gmail.com>
- CC: public-rdf-dawg-comments@w3.org
Hello Wim and Katja, Thanks very much for your comment on the challenges of efficiently evaluating property paths given the current semantics. We have received several comments along these lines. The Working Group agreed to adopt an alternative design for property paths that we believe balances several of our design goals: Use case expressivity Users' learning curve / intuition Ease of implementation Potential for efficient evaluation Schedule The changes from the current Last Call working draft are as follows: The semantics of *, +, and ? are changed to be non-counting (they no longer preserve duplicates) The /, |, and ! remain unchanged as in the current draft (they preserve duplicates) The curly brace forms -- {n}, {n,m}, {n,}, {,m} -- have all been removed There are several motivations for these design changes: Changing the semantics of * and + to non-counting is expected to address the evaluation performance concerns raised by your comment. The /, |, and ! operators are often used as shortcuts for writing out equivalent graph patterns longhand. By leaving these operators with counting semantics (just as the equivalent graph pattern expansions have), SPARQL 1.1 property paths will continue to yield intuitive results. Two examples that the Working Group has considered in coming to this conclusion are: SELECT (sum(?cost) AS ?total) { :order :hasItem/:price ?cost } and SELECT ?member { ?list rdf:rest*/rdf:first ?member } ...when applied to a list with duplicate items, such as (1 2 1). (This type of query was one motivation for the property paths feature that the WG documented in the SPARQL 1.1 New Features & Rationale document: http://www.w3.org/TR/sparql-features/#Property_paths) In both cases, users' intuitive expectations demands that '/' preserve duplicates. The group decided to remove the curly brace forms due to a lack of experience with appropriate counting/non-counting semantics for these forms. We believe that implementations will likely extend SPARQL 1.1 Property Paths with support for these forms, and we will gather implementation experience to use for future standardization work. We've included this item on a list of work items for a future working group - http://www.w3.org/2009/sparql/wiki/Future_Work_Items We would be grateful if you would acknowledge that your comment has been answered by sending a reply to this mailing list. Lee On behalf of the SPARQL Working Group On 2/20/2012 8:51 AM, W Martens wrote: > Dear DAWG members, > > we tried to post the following message on public-rdf-dawg last week, but it > doesn't seem to appear. Perhaps it was not the correct mailing list for this > kind of comment. > > > We have some comments regarding the semantics of property paths, which > give some extra information about the issues already raised by Marcelo > Arenas, Sebastian Conca, and Jorge Perez in their post a few weeks ago > (http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Jan/0009.html). > > > We have noticed that the Working Group is already discussing the issue of > property path semantics intensely and we would like to share our findings > to aid in the discussion. We performed a mostly foundational study of the > current semantics of property paths in a paper which has been accepted at > PODS 2012. A preprint is available for download at > http://www.theoinf.uni-bayreuth.de/download/pods12submission.pdf. > > This work was conducted independently from the work by Arenas, > Conca, and Perez and it is remarkable that, in many cases, we arrive to > very similar conclusions. In what comes next, we will try not to repeat > what has been said before, but focus on giving new information. > > > > On the current semantics of Property Paths > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The current semantics of property paths in SPARQL 1.1 relies on two > choices which have a dramatic impact on the complexity of query > evaluation. > > In the paper, we refer to these as > (A) the "simple walk requirement" > (B) the necessity of "counting paths" for computing query answers > > > To clarify (A) and (B): > > By simple walk semantics, we refer to the definitions of ZeroOrMorePath > and OneOrMorePath in Section 18.4 in the working draft (SPARQL Algebra). > These definitions state that the answers to these operators are obtained > by "repeated use of path such that any nodes in the graph are traversed > once only". A path that traverses any node in the graph only once (and > possibly returns to its first node) is a simple walk. > > Condition (B) has already been mentioned in the post by Arenas, Conca, > and Perez. > > > > Implications of the current semantics on Property Path evaluation > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Each of requirements (A) and (B) make property paths hard to evaluate. > > More precisely, each of these requirements makes the evaluation problem > of property paths NP-hard or #P-hard, already for very, very simple queries. > Details are in the paper. > > > Can this be solved? > ~~~~~~~~~~~~~~~ > > > We discovered that there could be an alternative semantics for property > paths that alleviates the before mentioned efficiency problems. In > particular, consider the following conditions, which might be > alternatives for (A) and (B): > > (A') regular path semantics > (B') only check the existence of a path > > Option (A') would be the same than (A), but no longer require that any > node in the graph is traversed once only. > Option (B') is what [Arenas, Conca, and Perez] refer to as "existential > semantics". > > > Our findings regarding the trade-offs between (A) / (A') and (B) / (B') > can be summarized as follows: > > > Simple walk semantics vs Regular path semantics > ----------------------------------------------- > If the default behavior of a SPARQL property path would have conditions > (A') and (B'), then our paper gives an algorithm to evaluate SPARQL > property paths in polynomial time combined complexity (Theorem 3.2). The > algorithm in Theorem 3.2 can be easily adapted to compute all pairs (X,Y) > that should be returned by a SELECT X,Y WHERE { X :p Y } query in > polynomial time. Notice that this is a *double exponential* improvement > wrt. current implementations. > Furthermore, :p can be an arbitrary property path, including counters. > This means that, for evaluating a property path of the form p[200,300], > our algorithm only needs polynomial time in the number of bits needed to > represent the number 300. > > However, this improvement is not possible under requirement (A). > Requirement (A) already makes it NP-complete to test whether a given pair > (X,Y) should be in the answer of the above SELECT query or not. > > Therefore, our feeling is that the choice between (A) or (A') is > worthwhile reconsidering since (A') could dramatically improve > performance. > > > To Count or not? > ---------------- >> From a user perspective, it makes sense to have the option to choose > between (B) or (B'). > Sometimes, you like to count duplicates in an answer to a query and > sometimes you don't. > > On this topic, we largely agree with the work of [Arenas, Conca, and > Perez]. They suggest to apply (B') as a default semantics to property > paths, and to use a keyword in the case that the user desires to have > duplicates in the answer. > > > > > > > We would also like to send the message to reconsider the issue of > property path semantics and we are very happy to observe that the > working group is already discussing this matter. Also, we are open > to continue the discussion with the group and with our colleagues > Marcelo Arenas, Sebastian Conca, and Jorge Perez and, if > necessary, cooperate with the group to make a proposal that can > have an efficient evaluation method. > > > Best regards, > > Wim Martens > Katja Losemann > > >
Received on Thursday, 12 April 2012 11:58:00 UTC