Re: Comments about the semantics of property paths from Andy Seaborne on 2011-01-25 (public-rdf-dawg-comments@w3.org from January 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 25 Jan 2011 16:08:02 +0000
To: jorge perez <jorge.perez.rojas@gmail.com>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <4D3EF562.3040107@epimorphics.com>
Hi Jorge,

XPath is designed for XML processing where XML nodes and values are 
treated in different ways. XPath evaluation returns distinct XML nodes, 
but duplicate values. One evaluation of an XPath expression can't mix 
XML nodes and values - see the numbered list in [1]. "XQuery 1.0 and 
XPath 2.0 Functions and Operators" has an operation fn:distinct-values 
to make values unique in a sequence [2].

An RDF graph does not have this distinction of nodes and values. Graph 
nodes (vertexes) are IRIs, blank nodes or literals with no separation. 
Repetition of literals is significant, consider SUM applied to a 
purchase order where two items have the same price, so multiple paths to 
the same endpoint do matter.

SPARQL property paths do not apply uniqueness to property paths and the 
property path expression is, where appropriate, the same the expansion 
in terms of triple patterns. It is not a matter of efficiency because 
the answers concerning duplicate literal values would be rather 
unexpected if only distinct values were returned.

This leaves the ArbitraryLengthPath operation for the use of "+" in 
paths. This traverses cycles once by terminating the search on 
encountering an edge already traversed for that evaluation of 
ArbitraryLengthPath. In an earlier design, cycle termination was by 
detecting visiting nodes but the WG considers the edge traversal a 
better choice. The new design is one more step of evaluation on a cycle 
than the first design and leaves better prospects for future 
standardization.

SPARQL has the keyword DISTINCT so an application can choose between 
duplicates and no duplicates. A query engine can exploit this if it 
chooses to; use with sub-queries mean that solution modifiers can be 
applied to specific parts of the query such as a path.

An implementation is free to implement evaluation in anyway it chooses 
proved it results in the same answers. The WG felt that using an 
algorithm was the most helpful way to specify the feature, especially to 
implementers.

Property paths have been implemented in a number of systems (see [3] for 
a partial list) and found to be useful.

We would be grateful if you would acknowledge that your comment has been 
answered by sending a reply to this mailing list.

Andy
On behalf of the SPARQL working group.

[1] http://www.w3.org/TR/xpath20/#id-path-expressions
[2] http://www.w3.org/TR/xpath-functions/#func-distinct-values
[3] http://esw.w3.org/SPARQL/Extensions/Paths

On 15/12/10 18:34, jorge perez wrote:
> Hello Andy,
>
> Thank you very much for your response and for considering my comments,
> and sorry for the late reply.
>
> There is a couple of comments that you have not answered.
>
> ""
> As a separate but very important issue, notice that the XPath language
> does not consider duplicate paths when evaluating expressions (XPath
> is evaluated in the "there exists" way that I mentioned before). Thus,
> counting paths in SPARQL would be somewhat in contradiction with
> previously proposed path languages considered by the W3C.
> ""
>
> I think that if this W3C Recommendation is in discordance with a
> previous Recommendation about a similar topic, then DAWG should have
> strong reasons for that, and make them clear in the specification. The
> specification should also advice the reader about this issue.
>
> Besides that comment, you have said nothing about efficiency of
> evaluation. Notice that this not related to a particular way of
> implementing the language. It is about the huge efficiency impact that
> any implementation will suffer in practice. You have not acknowledge
> that in your response. Have you consider this as an issue?
>
> Another comment that is not covered by your response is whether there
> exists a use case that demand counting different paths. In your
> response, it seems that the reason for counting paths is to make
> easier the job of the implementors (by reusing algebra operators).
> Opposite to what the group think, I think that not counting paths
> gives the implementor more freedom since paths could be implemented in
> several different ways, being just one of them by reusing algebra
> operators. Can you please clarify whether there are use cases about
> this? This would help a lot.
>
> If you respond to the comments above I can consider my comments answered.
>
> I have a couple of additional words. Please do not consider them as a
> formal objection to the process, but just as my opinion.
>
> I still strongly disagree with your design decisions about property
> paths. In particular, I insist that it is a mistake to define the
> semantics in the presence of cycles in a non-standard way and by
> forcing a particular algorithm to evaluate them. In your response you
> say that there can be corner cases, but it is not only a problem of
> corner cases. From my point of view it will become a problem of
> adoption of the standard. In this point I think that the group should
> not neglect that there is a lot of related (theoretical and practical)
> work in this area that have handled cycles in a completely different
> way.
>
> To conclude, I do think that the property-paths material in the
> current specification is far from being mature. Considering that the
> group is in a tight schedule, I think that it would be better to not
> include property paths in this round of standardization, than
> including them in their current form.
>
> Thank you very much for considering my comments.
> - jorge
>
> On Thu, Dec 2, 2010 at 8:20 AM, Andy Seaborne
> <andy.seaborne@epimorphics.com>  wrote:
>> Jorge,
>>
>> Thank very much for your comments.
>>
>> The working group considered a number of factors in designing the property
>> path features. In addition to the points you raise, the WG also included
>> consideration that, while this working group is not adding a path datatype
>> (needed to inquire about any path matched later in the query), nor the
>> specific case of access to path length, the WG should leave open as many
>> possibilities here for future work. Another factor in the design is the
>> relationship of some property path expressions to triple pattern forms.
>>
>> Although not specifying returning the path length of a match, nor specifying
>> returning the matched path itself, the WG felt that, on balance, the design
>> in the working draft gave maximum scope for any later standardization work.
>> The issue of path length particularly was considered as a feature for this
>> round of work but, when considered against all the other work items the WG
>> has taken on, it didn't make the final list of work items. This lead to the
>> conclusion that counting path possibilities, not a "there exists" condition,
>> was the better choice for this round of standardization. Adding access the
>> the path matched is better served if all paths are considered.
>>
>> Another consideration was the relationship of property paths and existing
>> queries using triple patterns.
>>
>> { ?x :p{2} ?y }
>>
>> and
>>
>> { ?x :p ?Z . ?Z :p ?y }, with ?Z projected away.
>>
>> The WG decided to make these equivalent, including in terms of numbers of
>> solutions. This gives the semantics of many path forms in terms of SPARQL
>> graph pattern operators. This was felt to be intuitive and to utilize the
>> capabilities of query engines: rather that requiring yet another mechanism,
>> the equivalence means that join-technology (for example) can be used to
>> solve the pattern.
>>
>> This then leaves the issue of cycles in the "+" operator. The design is one
>> in which the cycles in "+" operator are handled by traversing a directed
>> edge (triple in the data) once. This will be explained in the final version
>> of the query specification - there is a placeholder for it in the current
>> editors working draft. The current working draft has been clarified to use
>> "multiset-union" for the union in the ArbitraryLengthPath definition.
>>
>> This overall design is a tradeoff of implementation, future possibilities,
>> and equivalence of patterns on graphs. The WG is aware that there can be
>> corner cases can arise where different intuitions are not compatible. On
>> balance, the WG feels that the current design is most suitable for this
>> round of standardization.
>>
>> Again, that you for your helpful comments.
>>
>> We would be grateful if you would acknowledge that your comment has been
>> answered by sending a reply to this mailing list.
>>
>> Andy
>> on behalf of the SPARQL Working Group.
>>
Received on Tuesday, 25 January 2011 16:08:39 UTC