- From: Matthew Perry <matthew.perry@oracle.com>
- Date: Wed, 26 Jan 2011 08:53:12 -0500
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- CC: SPARQL Working Group <public-rdf-dawg@w3.org>
I think this should work for the sum age query. SELECT SUM(?A) WHERE { ?F :age ?A { SELECT DISTINCT ?F WHERE { :me (:friend)+ ?F } }} It seems that, in general, DISTINCT will work as a there exists query when URIs and blank nodes are the endpoint of a path because we have a one-to-one mapping between URI/BN values and graph vertices, but we get a one-to-many mapping from literal values to vertices. - Matt On 1/26/2011 8:24 AM, Andy Seaborne wrote: > Here's an example. based on items in a (simple) shopping basket. There are two things that are :item1 in the order (via compound and directly - think "buying a spare part"). > > The order cost is 6, uniqueness would make it 2. > > (I've sent a version of this to Jorge offlist as discussions aren't possible on the comments list). > > Andy > > ==== D1.ttl > > @prefix : <http://example/> . > > :order :contains :thing1 . > :order :contains :compound1 . > > :thing1 :unitOf :item1 . > :thing2 :unitOf :item2 . > :thing3 :unitOf :item1 . > > :item2 :price 2 . > :item1 :price 2 . > > :compound1 :contains :thing2 . > :compound1 :contains :thing3 . > > > ==== Q1.rq > > PREFIX : <http://example/> > > SELECT (SUM(?itemPrice) AS ?price) > { > :order :contains+/:unitOf/:price ?itemPrice . > } > > > > -------- Original Message -------- > Subject: Re: Comments about the semantics of property paths > Date: Tue, 25 Jan 2011 15:55:08 -0300 > From: jorge perez <jorge.perez.rojas@gmail.com> > To: Andy Seaborne <andy.seaborne@epimorphics.com> > CC: public-rdf-dawg-comments@w3.org > > Hello Andy, > > Thanks for your email. Yes, my comments have been answered. An > additional comment is below. > > On Tue, Jan 25, 2011 at 1:08 PM, Andy Seaborne > <andy.seaborne@epimorphics.com> wrote: >> Hi Jorge, >> >> XPath is designed for XML processing where XML nodes and values are treated >> in different ways. XPath evaluation returns distinct XML nodes, but >> duplicate values. One evaluation of an XPath expression can't mix XML nodes >> and values - see the numbered list in [1]. "XQuery 1.0 and XPath 2.0 >> Functions and Operators" has an operation fn:distinct-values to make values >> unique in a sequence [2]. >> >> An RDF graph does not have this distinction of nodes and values. Graph nodes >> (vertexes) are IRIs, blank nodes or literals with no separation. Repetition >> of literals is significant, consider SUM applied to a purchase order where >> two items have the same price, so multiple paths to the same endpoint do >> matter. > > Aggregation is actually another reason of why multiple paths to the > same endpoint *do not* have to be considered. > > Consider a network of friends, and assume that you want to obtain the > SUM of the age of all your network (friends of your friends). Then a > very natural way to do this is with the query (simplified syntax) > > SUM (?A) > :me (:friend)+/:age ?A > > The query is navigating to all the friends of my friends, then to the > age value of every one, and then taking the SUM. Isn't this natural? > But, consider the following data > > :me :friend :f1 > :me :friend :f2 > :f1 :friend :f2 > :f1 :age 20 > :f2 :age 20 > > I would expect 40 as the result of the above query, but the expression > > :me (:fiend)+/:age ?A > > returns > > ?A > 20 (for the path :me->:f1) > 20 (for the path :me->:f2) > 20 (for the path :me->:f1->:f2) > > and thus, the answer of the SUM would be 60. How do you explain the > result of this query to a user? Notice that using DISTINCT does not > solve the problem, since with DISTINCT you would obtain 20 as the SUM > which is also wrong. > > Is there a way to correctly answer the above query with the current > design of property paths? > > Thanks, > - jorge > >> >> SPARQL property paths do not apply uniqueness to property paths and the >> property path expression is, where appropriate, the same the expansion in >> terms of triple patterns. It is not a matter of efficiency because the >> answers concerning duplicate literal values would be rather unexpected if >> only distinct values were returned. >> >> This leaves the ArbitraryLengthPath operation for the use of "+" in paths. >> This traverses cycles once by terminating the search on encountering an edge >> already traversed for that evaluation of ArbitraryLengthPath. In an earlier >> design, cycle termination was by detecting visiting nodes but the WG >> considers the edge traversal a better choice. The new design is one more >> step of evaluation on a cycle than the first design and leaves better >> prospects for future standardization. >> >> SPARQL has the keyword DISTINCT so an application can choose between >> duplicates and no duplicates. A query engine can exploit this if it chooses >> to; use with sub-queries mean that solution modifiers can be applied to >> specific parts of the query such as a path. >> >> An implementation is free to implement evaluation in anyway it chooses >> proved it results in the same answers. The WG felt that using an algorithm >> was the most helpful way to specify the feature, especially to implementers. >> >> Property paths have been implemented in a number of systems (see [3] for a >> partial list) and found to be useful. >> >> We would be grateful if you would acknowledge that your comment has been >> answered by sending a reply to this mailing list. >> >> Andy >> On behalf of the SPARQL working group. >> >> [1] http://www.w3.org/TR/xpath20/#id-path-expressions >> [2] http://www.w3.org/TR/xpath-functions/#func-distinct-values >> [3] http://esw.w3.org/SPARQL/Extensions/Paths >> >> On 15/12/10 18:34, jorge perez wrote: >>> >>> Hello Andy, >>> >>> Thank you very much for your response and for considering my comments, >>> and sorry for the late reply. >>> >>> There is a couple of comments that you have not answered. >>> >>> "" >>> As a separate but very important issue, notice that the XPath language >>> does not consider duplicate paths when evaluating expressions (XPath >>> is evaluated in the "there exists" way that I mentioned before). Thus, >>> counting paths in SPARQL would be somewhat in contradiction with >>> previously proposed path languages considered by the W3C. >>> "" >>> >>> I think that if this W3C Recommendation is in discordance with a >>> previous Recommendation about a similar topic, then DAWG should have >>> strong reasons for that, and make them clear in the specification. The >>> specification should also advice the reader about this issue. >>> >>> Besides that comment, you have said nothing about efficiency of >>> evaluation. Notice that this not related to a particular way of >>> implementing the language. It is about the huge efficiency impact that >>> any implementation will suffer in practice. You have not acknowledge >>> that in your response. Have you consider this as an issue? >>> >>> Another comment that is not covered by your response is whether there >>> exists a use case that demand counting different paths. In your >>> response, it seems that the reason for counting paths is to make >>> easier the job of the implementors (by reusing algebra operators). >>> Opposite to what the group think, I think that not counting paths >>> gives the implementor more freedom since paths could be implemented in >>> several different ways, being just one of them by reusing algebra >>> operators. Can you please clarify whether there are use cases about >>> this? This would help a lot. >>> >>> If you respond to the comments above I can consider my comments answered. >>> >>> I have a couple of additional words. Please do not consider them as a >>> formal objection to the process, but just as my opinion. >>> >>> I still strongly disagree with your design decisions about property >>> paths. In particular, I insist that it is a mistake to define the >>> semantics in the presence of cycles in a non-standard way and by >>> forcing a particular algorithm to evaluate them. In your response you >>> say that there can be corner cases, but it is not only a problem of >>> corner cases. From my point of view it will become a problem of >>> adoption of the standard. In this point I think that the group should >>> not neglect that there is a lot of related (theoretical and practical) >>> work in this area that have handled cycles in a completely different >>> way. >>> >>> To conclude, I do think that the property-paths material in the >>> current specification is far from being mature. Considering that the >>> group is in a tight schedule, I think that it would be better to not >>> include property paths in this round of standardization, than >>> including them in their current form. >>> >>> Thank you very much for considering my comments. >>> - jorge >>> >>> On Thu, Dec 2, 2010 at 8:20 AM, Andy Seaborne >>> <andy.seaborne@epimorphics.com> wrote: >>>> >>>> Jorge, >>>> >>>> Thank very much for your comments. >>>> >>>> The working group considered a number of factors in designing the >>>> property >>>> path features. In addition to the points you raise, the WG also included >>>> consideration that, while this working group is not adding a path >>>> datatype >>>> (needed to inquire about any path matched later in the query), nor the >>>> specific case of access to path length, the WG should leave open as many >>>> possibilities here for future work. Another factor in the design is the >>>> relationship of some property path expressions to triple pattern forms. >>>> >>>> Although not specifying returning the path length of a match, nor >>>> specifying >>>> returning the matched path itself, the WG felt that, on balance, the >>>> design >>>> in the working draft gave maximum scope for any later standardization >>>> work. >>>> The issue of path length particularly was considered as a feature for >>>> this >>>> round of work but, when considered against all the other work items the >>>> WG >>>> has taken on, it didn't make the final list of work items. This lead to >>>> the >>>> conclusion that counting path possibilities, not a "there exists" >>>> condition, >>>> was the better choice for this round of standardization. Adding access >>>> the >>>> the path matched is better served if all paths are considered. >>>> >>>> Another consideration was the relationship of property paths and existing >>>> queries using triple patterns. >>>> >>>> { ?x :p{2} ?y } >>>> >>>> and >>>> >>>> { ?x :p ?Z . ?Z :p ?y }, with ?Z projected away. >>>> >>>> The WG decided to make these equivalent, including in terms of numbers of >>>> solutions. This gives the semantics of many path forms in terms of SPARQL >>>> graph pattern operators. This was felt to be intuitive and to utilize the >>>> capabilities of query engines: rather that requiring yet another >>>> mechanism, >>>> the equivalence means that join-technology (for example) can be used to >>>> solve the pattern. >>>> >>>> This then leaves the issue of cycles in the "+" operator. The design is >>>> one >>>> in which the cycles in "+" operator are handled by traversing a directed >>>> edge (triple in the data) once. This will be explained in the final >>>> version >>>> of the query specification - there is a placeholder for it in the current >>>> editors working draft. The current working draft has been clarified to >>>> use >>>> "multiset-union" for the union in the ArbitraryLengthPath definition. >>>> >>>> This overall design is a tradeoff of implementation, future >>>> possibilities, >>>> and equivalence of patterns on graphs. The WG is aware that there can be >>>> corner cases can arise where different intuitions are not compatible. On >>>> balance, the WG feels that the current design is most suitable for this >>>> round of standardization. >>>> >>>> Again, that you for your helpful comments. >>>> >>>> We would be grateful if you would acknowledge that your comment has been >>>> answered by sending a reply to this mailing list. >>>> >>>> Andy >>>> on behalf of the SPARQL Working Group. >>>> >> >
Received on Wednesday, 26 January 2011 13:54:47 UTC