Re: Fwd: Re: Comments about the semantics of property paths from Matthew Perry on 2011-01-26 (public-rdf-dawg@w3.org from January to March 2011)

From: Matthew Perry <matthew.perry@oracle.com>
Date: Wed, 26 Jan 2011 08:53:12 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4D402748.1060209@oracle.com>
I think this should work for the sum age query.

SELECT SUM(?A)
WHERE
{ ?F :age ?A
  { SELECT DISTINCT ?F
    WHERE
     { :me (:friend)+ ?F } }}

It seems that, in general, DISTINCT will work as a there exists query when URIs and blank nodes are the endpoint of a path because we have a one-to-one mapping between URI/BN values and graph vertices, but we get a one-to-many mapping from literal values to vertices.

- Matt


On 1/26/2011 8:24 AM, Andy Seaborne wrote:
> Here's an example. based on items in a (simple) shopping basket. There are two things that are :item1 in the order (via compound and directly - think "buying a spare part").
>
> The order cost is 6, uniqueness would make it 2.
>
> (I've sent a version of this to Jorge offlist as discussions aren't possible on the comments list).
>
>     Andy
>
> ==== D1.ttl
>
> @prefix : <http://example/> .
>
> :order :contains :thing1 .
> :order :contains :compound1 .
>
> :thing1 :unitOf :item1 .
> :thing2 :unitOf :item2 .
> :thing3 :unitOf :item1 .
>
> :item2 :price 2 .
> :item1 :price 2 .
>
> :compound1 :contains :thing2 .
> :compound1 :contains :thing3 .
>
>
> ==== Q1.rq
>
> PREFIX : <http://example/>
>
> SELECT (SUM(?itemPrice) AS ?price)
> {
>   :order :contains+/:unitOf/:price ?itemPrice .
> }
>
>
>
> -------- Original Message --------
> Subject: Re: Comments about the semantics of property paths
> Date: Tue, 25 Jan 2011 15:55:08 -0300
> From: jorge perez <jorge.perez.rojas@gmail.com>
> To: Andy Seaborne <andy.seaborne@epimorphics.com>
> CC: public-rdf-dawg-comments@w3.org
>
> Hello Andy,
>
> Thanks for your email. Yes, my comments have been answered. An
> additional comment is below.
>
> On Tue, Jan 25, 2011 at 1:08 PM, Andy Seaborne
> <andy.seaborne@epimorphics.com> wrote:
>> Hi Jorge,
>>
>> XPath is designed for XML processing where XML nodes and values are treated
>> in different ways. XPath evaluation returns distinct XML nodes, but
>> duplicate values. One evaluation of an XPath expression can't mix XML nodes
>> and values - see the numbered list in [1]. "XQuery 1.0 and XPath 2.0
>> Functions and Operators" has an operation fn:distinct-values to make values
>> unique in a sequence [2].
>>
>> An RDF graph does not have this distinction of nodes and values. Graph nodes
>> (vertexes) are IRIs, blank nodes or literals with no separation. Repetition
>> of literals is significant, consider SUM applied to a purchase order where
>> two items have the same price, so multiple paths to the same endpoint do
>> matter.
>
> Aggregation is actually another reason of why multiple paths to the
> same endpoint *do not* have to be considered.
>
> Consider a network of friends, and assume that you want to obtain the
> SUM of the age of all your network (friends of your friends). Then a
> very natural way to do this is with the query (simplified syntax)
>
> SUM (?A)
> :me (:friend)+/:age ?A
>
> The query is navigating to all the friends of my friends, then to the
> age value of every one, and then taking the SUM. Isn't this natural?
> But, consider the following data
>
> :me :friend :f1
> :me :friend :f2
> :f1 :friend :f2
> :f1 :age 20
> :f2 :age 20
>
> I would expect 40 as the result of the above query, but the expression
>
> :me (:fiend)+/:age ?A
>
> returns
>
> ?A
> 20 (for the path :me->:f1)
> 20 (for the path :me->:f2)
> 20 (for the path :me->:f1->:f2)
>
> and thus, the answer of the SUM would be 60. How do you explain the
> result of this query to a user? Notice that using DISTINCT does not
> solve the problem, since with DISTINCT you would obtain 20 as the SUM
> which is also wrong.
>
> Is there a way to correctly answer the above query with the current
> design of property paths?
>
> Thanks,
> - jorge
>
>>
>> SPARQL property paths do not apply uniqueness to property paths and the
>> property path expression is, where appropriate, the same the expansion in
>> terms of triple patterns. It is not a matter of efficiency because the
>> answers concerning duplicate literal values would be rather unexpected if
>> only distinct values were returned.
>>
>> This leaves the ArbitraryLengthPath operation for the use of "+" in paths.
>> This traverses cycles once by terminating the search on encountering an edge
>> already traversed for that evaluation of ArbitraryLengthPath. In an earlier
>> design, cycle termination was by detecting visiting nodes but the WG
>> considers the edge traversal a better choice. The new design is one more
>> step of evaluation on a cycle than the first design and leaves better
>> prospects for future standardization.
>>
>> SPARQL has the keyword DISTINCT so an application can choose between
>> duplicates and no duplicates. A query engine can exploit this if it chooses
>> to; use with sub-queries mean that solution modifiers can be applied to
>> specific parts of the query such as a path.
>>
>> An implementation is free to implement evaluation in anyway it chooses
>> proved it results in the same answers. The WG felt that using an algorithm
>> was the most helpful way to specify the feature, especially to implementers.
>>
>> Property paths have been implemented in a number of systems (see [3] for a
>> partial list) and found to be useful.
>>
>> We would be grateful if you would acknowledge that your comment has been
>> answered by sending a reply to this mailing list.
>>
>> Andy
>> On behalf of the SPARQL working group.
>>
>> [1] http://www.w3.org/TR/xpath20/#id-path-expressions
>> [2] http://www.w3.org/TR/xpath-functions/#func-distinct-values
>> [3] http://esw.w3.org/SPARQL/Extensions/Paths
>>
>> On 15/12/10 18:34, jorge perez wrote:
>>>
>>> Hello Andy,
>>>
>>> Thank you very much for your response and for considering my comments,
>>> and sorry for the late reply.
>>>
>>> There is a couple of comments that you have not answered.
>>>
>>> ""
>>> As a separate but very important issue, notice that the XPath language
>>> does not consider duplicate paths when evaluating expressions (XPath
>>> is evaluated in the "there exists" way that I mentioned before). Thus,
>>> counting paths in SPARQL would be somewhat in contradiction with
>>> previously proposed path languages considered by the W3C.
>>> ""
>>>
>>> I think that if this W3C Recommendation is in discordance with a
>>> previous Recommendation about a similar topic, then DAWG should have
>>> strong reasons for that, and make them clear in the specification. The
>>> specification should also advice the reader about this issue.
>>>
>>> Besides that comment, you have said nothing about efficiency of
>>> evaluation. Notice that this not related to a particular way of
>>> implementing the language. It is about the huge efficiency impact that
>>> any implementation will suffer in practice. You have not acknowledge
>>> that in your response. Have you consider this as an issue?
>>>
>>> Another comment that is not covered by your response is whether there
>>> exists a use case that demand counting different paths. In your
>>> response, it seems that the reason for counting paths is to make
>>> easier the job of the implementors (by reusing algebra operators).
>>> Opposite to what the group think, I think that not counting paths
>>> gives the implementor more freedom since paths could be implemented in
>>> several different ways, being just one of them by reusing algebra
>>> operators. Can you please clarify whether there are use cases about
>>> this? This would help a lot.
>>>
>>> If you respond to the comments above I can consider my comments answered.
>>>
>>> I have a couple of additional words. Please do not consider them as a
>>> formal objection to the process, but just as my opinion.
>>>
>>> I still strongly disagree with your design decisions about property
>>> paths. In particular, I insist that it is a mistake to define the
>>> semantics in the presence of cycles in a non-standard way and by
>>> forcing a particular algorithm to evaluate them. In your response you
>>> say that there can be corner cases, but it is not only a problem of
>>> corner cases. From my point of view it will become a problem of
>>> adoption of the standard. In this point I think that the group should
>>> not neglect that there is a lot of related (theoretical and practical)
>>> work in this area that have handled cycles in a completely different
>>> way.
>>>
>>> To conclude, I do think that the property-paths material in the
>>> current specification is far from being mature. Considering that the
>>> group is in a tight schedule, I think that it would be better to not
>>> include property paths in this round of standardization, than
>>> including them in their current form.
>>>
>>> Thank you very much for considering my comments.
>>> - jorge
>>>
>>> On Thu, Dec 2, 2010 at 8:20 AM, Andy Seaborne
>>> <andy.seaborne@epimorphics.com>  wrote:
>>>>
>>>> Jorge,
>>>>
>>>> Thank very much for your comments.
>>>>
>>>> The working group considered a number of factors in designing the
>>>> property
>>>> path features. In addition to the points you raise, the WG also included
>>>> consideration that, while this working group is not adding a path
>>>> datatype
>>>> (needed to inquire about any path matched later in the query), nor the
>>>> specific case of access to path length, the WG should leave open as many
>>>> possibilities here for future work. Another factor in the design is the
>>>> relationship of some property path expressions to triple pattern forms.
>>>>
>>>> Although not specifying returning the path length of a match, nor
>>>> specifying
>>>> returning the matched path itself, the WG felt that, on balance, the
>>>> design
>>>> in the working draft gave maximum scope for any later standardization
>>>> work.
>>>> The issue of path length particularly was considered as a feature for
>>>> this
>>>> round of work but, when considered against all the other work items the
>>>> WG
>>>> has taken on, it didn't make the final list of work items. This lead to
>>>> the
>>>> conclusion that counting path possibilities, not a "there exists"
>>>> condition,
>>>> was the better choice for this round of standardization. Adding access
>>>> the
>>>> the path matched is better served if all paths are considered.
>>>>
>>>> Another consideration was the relationship of property paths and existing
>>>> queries using triple patterns.
>>>>
>>>> { ?x :p{2} ?y }
>>>>
>>>> and
>>>>
>>>> { ?x :p ?Z . ?Z :p ?y }, with ?Z projected away.
>>>>
>>>> The WG decided to make these equivalent, including in terms of numbers of
>>>> solutions. This gives the semantics of many path forms in terms of SPARQL
>>>> graph pattern operators. This was felt to be intuitive and to utilize the
>>>> capabilities of query engines: rather that requiring yet another
>>>> mechanism,
>>>> the equivalence means that join-technology (for example) can be used to
>>>> solve the pattern.
>>>>
>>>> This then leaves the issue of cycles in the "+" operator. The design is
>>>> one
>>>> in which the cycles in "+" operator are handled by traversing a directed
>>>> edge (triple in the data) once. This will be explained in the final
>>>> version
>>>> of the query specification - there is a placeholder for it in the current
>>>> editors working draft. The current working draft has been clarified to
>>>> use
>>>> "multiset-union" for the union in the ArbitraryLengthPath definition.
>>>>
>>>> This overall design is a tradeoff of implementation, future
>>>> possibilities,
>>>> and equivalence of patterns on graphs. The WG is aware that there can be
>>>> corner cases can arise where different intuitions are not compatible. On
>>>> balance, the WG feels that the current design is most suitable for this
>>>> round of standardization.
>>>>
>>>> Again, that you for your helpful comments.
>>>>
>>>> We would be grateful if you would acknowledge that your comment has been
>>>> answered by sending a reply to this mailing list.
>>>>
>>>> Andy
>>>> on behalf of the SPARQL Working Group.
>>>>
>>
>
Received on Wednesday, 26 January 2011 13:54:47 UTC