- From: Matthew Perry <matthew.perry@oracle.com>
- Date: Wed, 26 Jan 2011 12:56:18 -0500
- CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Forget that last part ... temporary brain freeze. - Matt On 1/26/2011 8:53 AM, Matthew Perry wrote: > I think this should work for the sum age query. > > SELECT SUM(?A) > WHERE > { ?F :age ?A > { SELECT DISTINCT ?F > WHERE > { :me (:friend)+ ?F } }} > > It seems that, in general, DISTINCT will work as a there exists query when URIs and blank nodes are the endpoint of a path because we have a one-to-one mapping between URI/BN values and graph vertices, but we get a one-to-many mapping from literal values to vertices. > > - Matt > > > On 1/26/2011 8:24 AM, Andy Seaborne wrote: >> Here's an example. based on items in a (simple) shopping basket. There are two things that are :item1 in the order (via compound and directly - think "buying a spare part"). >> >> The order cost is 6, uniqueness would make it 2. >> >> (I've sent a version of this to Jorge offlist as discussions aren't possible on the comments list). >> >> Andy >> >> ==== D1.ttl >> >> @prefix : <http://example/> . >> >> :order :contains :thing1 . >> :order :contains :compound1 . >> >> :thing1 :unitOf :item1 . >> :thing2 :unitOf :item2 . >> :thing3 :unitOf :item1 . >> >> :item2 :price 2 . >> :item1 :price 2 . >> >> :compound1 :contains :thing2 . >> :compound1 :contains :thing3 . >> >> >> ==== Q1.rq >> >> PREFIX : <http://example/> >> >> SELECT (SUM(?itemPrice) AS ?price) >> { >> :order :contains+/:unitOf/:price ?itemPrice . >> } >> >> >> >> -------- Original Message -------- >> Subject: Re: Comments about the semantics of property paths >> Date: Tue, 25 Jan 2011 15:55:08 -0300 >> From: jorge perez <jorge.perez.rojas@gmail.com> >> To: Andy Seaborne <andy.seaborne@epimorphics.com> >> CC: public-rdf-dawg-comments@w3.org >> >> Hello Andy, >> >> Thanks for your email. Yes, my comments have been answered. An >> additional comment is below. >> >> On Tue, Jan 25, 2011 at 1:08 PM, Andy Seaborne >> <andy.seaborne@epimorphics.com> wrote: >>> Hi Jorge, >>> >>> XPath is designed for XML processing where XML nodes and values are treated >>> in different ways. XPath evaluation returns distinct XML nodes, but >>> duplicate values. One evaluation of an XPath expression can't mix XML nodes >>> and values - see the numbered list in [1]. "XQuery 1.0 and XPath 2.0 >>> Functions and Operators" has an operation fn:distinct-values to make values >>> unique in a sequence [2]. >>> >>> An RDF graph does not have this distinction of nodes and values. Graph nodes >>> (vertexes) are IRIs, blank nodes or literals with no separation. Repetition >>> of literals is significant, consider SUM applied to a purchase order where >>> two items have the same price, so multiple paths to the same endpoint do >>> matter. >> >> Aggregation is actually another reason of why multiple paths to the >> same endpoint *do not* have to be considered. >> >> Consider a network of friends, and assume that you want to obtain the >> SUM of the age of all your network (friends of your friends). Then a >> very natural way to do this is with the query (simplified syntax) >> >> SUM (?A) >> :me (:friend)+/:age ?A >> >> The query is navigating to all the friends of my friends, then to the >> age value of every one, and then taking the SUM. Isn't this natural? >> But, consider the following data >> >> :me :friend :f1 >> :me :friend :f2 >> :f1 :friend :f2 >> :f1 :age 20 >> :f2 :age 20 >> >> I would expect 40 as the result of the above query, but the expression >> >> :me (:fiend)+/:age ?A >> >> returns >> >> ?A >> 20 (for the path :me->:f1) >> 20 (for the path :me->:f2) >> 20 (for the path :me->:f1->:f2) >> >> and thus, the answer of the SUM would be 60. How do you explain the >> result of this query to a user? Notice that using DISTINCT does not >> solve the problem, since with DISTINCT you would obtain 20 as the SUM >> which is also wrong. >> >> Is there a way to correctly answer the above query with the current >> design of property paths? >> >> Thanks, >> - jorge >> >>> >>> SPARQL property paths do not apply uniqueness to property paths and the >>> property path expression is, where appropriate, the same the expansion in >>> terms of triple patterns. It is not a matter of efficiency because the >>> answers concerning duplicate literal values would be rather unexpected if >>> only distinct values were returned. >>> >>> This leaves the ArbitraryLengthPath operation for the use of "+" in paths. >>> This traverses cycles once by terminating the search on encountering an edge >>> already traversed for that evaluation of ArbitraryLengthPath. In an earlier >>> design, cycle termination was by detecting visiting nodes but the WG >>> considers the edge traversal a better choice. The new design is one more >>> step of evaluation on a cycle than the first design and leaves better >>> prospects for future standardization. >>> >>> SPARQL has the keyword DISTINCT so an application can choose between >>> duplicates and no duplicates. A query engine can exploit this if it chooses >>> to; use with sub-queries mean that solution modifiers can be applied to >>> specific parts of the query such as a path. >>> >>> An implementation is free to implement evaluation in anyway it chooses >>> proved it results in the same answers. The WG felt that using an algorithm >>> was the most helpful way to specify the feature, especially to implementers. >>> >>> Property paths have been implemented in a number of systems (see [3] for a >>> partial list) and found to be useful. >>> >>> We would be grateful if you would acknowledge that your comment has been >>> answered by sending a reply to this mailing list. >>> >>> Andy >>> On behalf of the SPARQL working group. >>> >>> [1] http://www.w3.org/TR/xpath20/#id-path-expressions >>> [2] http://www.w3.org/TR/xpath-functions/#func-distinct-values >>> [3] http://esw.w3.org/SPARQL/Extensions/Paths >>> >>> On 15/12/10 18:34, jorge perez wrote: >>>> >>>> Hello Andy, >>>> >>>> Thank you very much for your response and for considering my comments, >>>> and sorry for the late reply. >>>> >>>> There is a couple of comments that you have not answered. >>>> >>>> "" >>>> As a separate but very important issue, notice that the XPath language >>>> does not consider duplicate paths when evaluating expressions (XPath >>>> is evaluated in the "there exists" way that I mentioned before). Thus, >>>> counting paths in SPARQL would be somewhat in contradiction with >>>> previously proposed path languages considered by the W3C. >>>> "" >>>> >>>> I think that if this W3C Recommendation is in discordance with a >>>> previous Recommendation about a similar topic, then DAWG should have >>>> strong reasons for that, and make them clear in the specification. The >>>> specification should also advice the reader about this issue. >>>> >>>> Besides that comment, you have said nothing about efficiency of >>>> evaluation. Notice that this not related to a particular way of >>>> implementing the language. It is about the huge efficiency impact that >>>> any implementation will suffer in practice. You have not acknowledge >>>> that in your response. Have you consider this as an issue? >>>> >>>> Another comment that is not covered by your response is whether there >>>> exists a use case that demand counting different paths. In your >>>> response, it seems that the reason for counting paths is to make >>>> easier the job of the implementors (by reusing algebra operators). >>>> Opposite to what the group think, I think that not counting paths >>>> gives the implementor more freedom since paths could be implemented in >>>> several different ways, being just one of them by reusing algebra >>>> operators. Can you please clarify whether there are use cases about >>>> this? This would help a lot. >>>> >>>> If you respond to the comments above I can consider my comments answered. >>>> >>>> I have a couple of additional words. Please do not consider them as a >>>> formal objection to the process, but just as my opinion. >>>> >>>> I still strongly disagree with your design decisions about property >>>> paths. In particular, I insist that it is a mistake to define the >>>> semantics in the presence of cycles in a non-standard way and by >>>> forcing a particular algorithm to evaluate them. In your response you >>>> say that there can be corner cases, but it is not only a problem of >>>> corner cases. From my point of view it will become a problem of >>>> adoption of the standard. In this point I think that the group should >>>> not neglect that there is a lot of related (theoretical and practical) >>>> work in this area that have handled cycles in a completely different >>>> way. >>>> >>>> To conclude, I do think that the property-paths material in the >>>> current specification is far from being mature. Considering that the >>>> group is in a tight schedule, I think that it would be better to not >>>> include property paths in this round of standardization, than >>>> including them in their current form. >>>> >>>> Thank you very much for considering my comments. >>>> - jorge >>>> >>>> On Thu, Dec 2, 2010 at 8:20 AM, Andy Seaborne >>>> <andy.seaborne@epimorphics.com> wrote: >>>>> >>>>> Jorge, >>>>> >>>>> Thank very much for your comments. >>>>> >>>>> The working group considered a number of factors in designing the >>>>> property >>>>> path features. In addition to the points you raise, the WG also included >>>>> consideration that, while this working group is not adding a path >>>>> datatype >>>>> (needed to inquire about any path matched later in the query), nor the >>>>> specific case of access to path length, the WG should leave open as many >>>>> possibilities here for future work. Another factor in the design is the >>>>> relationship of some property path expressions to triple pattern forms. >>>>> >>>>> Although not specifying returning the path length of a match, nor >>>>> specifying >>>>> returning the matched path itself, the WG felt that, on balance, the >>>>> design >>>>> in the working draft gave maximum scope for any later standardization >>>>> work. >>>>> The issue of path length particularly was considered as a feature for >>>>> this >>>>> round of work but, when considered against all the other work items the >>>>> WG >>>>> has taken on, it didn't make the final list of work items. This lead to >>>>> the >>>>> conclusion that counting path possibilities, not a "there exists" >>>>> condition, >>>>> was the better choice for this round of standardization. Adding access >>>>> the >>>>> the path matched is better served if all paths are considered. >>>>> >>>>> Another consideration was the relationship of property paths and existing >>>>> queries using triple patterns. >>>>> >>>>> { ?x :p{2} ?y } >>>>> >>>>> and >>>>> >>>>> { ?x :p ?Z . ?Z :p ?y }, with ?Z projected away. >>>>> >>>>> The WG decided to make these equivalent, including in terms of numbers of >>>>> solutions. This gives the semantics of many path forms in terms of SPARQL >>>>> graph pattern operators. This was felt to be intuitive and to utilize the >>>>> capabilities of query engines: rather that requiring yet another >>>>> mechanism, >>>>> the equivalence means that join-technology (for example) can be used to >>>>> solve the pattern. >>>>> >>>>> This then leaves the issue of cycles in the "+" operator. The design is >>>>> one >>>>> in which the cycles in "+" operator are handled by traversing a directed >>>>> edge (triple in the data) once. This will be explained in the final >>>>> version >>>>> of the query specification - there is a placeholder for it in the current >>>>> editors working draft. The current working draft has been clarified to >>>>> use >>>>> "multiset-union" for the union in the ArbitraryLengthPath definition. >>>>> >>>>> This overall design is a tradeoff of implementation, future >>>>> possibilities, >>>>> and equivalence of patterns on graphs. The WG is aware that there can be >>>>> corner cases can arise where different intuitions are not compatible. On >>>>> balance, the WG feels that the current design is most suitable for this >>>>> round of standardization. >>>>> >>>>> Again, that you for your helpful comments. >>>>> >>>>> We would be grateful if you would acknowledge that your comment has been >>>>> answered by sending a reply to this mailing list. >>>>> >>>>> Andy >>>>> on behalf of the SPARQL Working Group. >>>>> >>> >> > >
Received on Wednesday, 26 January 2011 17:57:34 UTC