Re: Possible long response for WW-1 from Sandro Hawke on 2012-05-08 (public-rdf-dawg@w3.org from April to June 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 08 May 2012 09:32:45 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <1336483965.24034.65.camel@waldron>
On Thu, 2012-04-26 at 12:57 +0100, Andy Seaborne wrote:
> Here is some possible draft content for a response to William Waites. 
> His message provides the opportunity to highlight the judgement calls 
> the design makes to balance property paths considered in isolation and 
> property paths considered along side other SPARQL 1.1 features.  He is 
> using the analogy to strings of (non)equivalence of "aa*" and "a+" which 
> is a consideration, but not the only one as I see it.
> 
> Whether this is best sent as a WG formal response or as a discussion 
> item on the comments list (or elsewhere) I don't know - I leave that to 
> suggestions; I'm willing to have a discussion with people if they are 
> prepared to discuss the use cases.
> 
>  Andy
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0011.html
> 
> On 19/04/12 14:59, William Waites wrote:
>  > I was happy to see the revisions earlier in the week to the SPARQL 1.1
>  > working draft that saw the default semantics of the *, + and ?
>  > operators changed from counting to existential.
>  >
>  > Is it to be expected that a similar revision will be forthcoming for
>  > the simple walk vs. regular path problem signalled by Wim Martens et
>  > al?
>  >
>  > As I understand it, the non-standard W3C simple-walk semantics mean
>  > that evaluating path expressions containing those operators is
>  > intractable even with counting semantics. See Wim's earlier mail at
>  > [1].
>  >
>  > I understand that the WG is already over time and there is pressure to
>  > carve the spec into stone, but it seems to me better to be late than
>  > to release something containing a known serious error especially when
>  > the fix is clear.
>  >
>  > I would also like to point out -- and this is the reason I call the
>  > simple-walk semantics non-standard -- that basic things like the
>  > equivalence of "aa*" and "a+" are not true under the current
>  > draft. Apart from questions of tractability, this could be quite
>  > confusing to users.
>  >
>  > Cheers,
>  > -w
>  >
>  > [1] 
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Feb/0029.html
> 
> William,
> 
> Using the analogy with regular expressions over strings needs some care. 
>   There are significant ways in which a graph does not behave like a 
> string.  For example, in RDF, when a literal value, 5 say, is used as 
> the object of multiple triples, there is only one occurrence of the 
> literal as a graph node (literals are "tidy" in the language of the RDF 
> 2004 Working Group). When there is a divergence of between string-like 
> and graph-like characteristics there is a design decision to be made. 
> Duplicate nodes is one such point.
> 
> Consider this example of a simple purchase order, with two items.
> 
> ---- Data 1 ----
> @prefix : <http://example/> .
> 
> :order :hasItem :item1 .
> :order :hasItem :item2 .
> 
> :item1 :price 5 .
> :item2 :price 2 .
> ---- Data 1 ----
> 
> 
> SPARQL 1.1 introduces grouping and aggregation - it's one of the major 
> features missing and a major use feature request [F&R].  These queries 
> give the total price of the order.
> 
> SELECT (SUM(?price) AS ?total)
> WHERE { :order :item/:price ?price }

That should be   :hastItem/:price  I think.   And below, I guess you
changed :item to :hasItem in the data but not the queries.

    -- Sandro

> SELECT (SUM(?price) AS ?total)
> WHERE { :order :item [ :price ?price ] }
>
> SELECT (SUM(?price) AS ?total)
> WHERE
> { :order :item ?x .
>    ?x :price ?price }
> 
> The first uses property paths.
> 
> ---------
> | total |
> =========
> | 7     |
> ---------
> 
> 
> ---- Data 2 ----
> @prefix : <http://example/> .
> 
> :order :hasItem :item1 .
> :order :hasItem :item2 .
> 
> :item1 :price 5 . # Same price.
> :item2 :price 5 .
> ---- Data 2 ----
> 
> Query 1:
> ---------
> | total |
> =========
> | 5     |
> ---------
> 
> Queries 2 and 3:
> ---------
> | total |
> =========
> | 10    |
> ---------
> 
> In data 2, the price of the items is the same.  The first query now does 
> not return the total price of the order if only connectivity is 
> considered.  That is considered to be confusing.
> 
> Query 1 is the same as:
> 
> SELECT (SUM(DISTINCT ?price) AS ?total)
> WHERE
> { :order :item ?x .
>    ?x :price ?price
> }
> 
> using DISTINCT in the sum() aggregation.  This shows it's possible to 
> write a SPARQL query with non-counting results given the current 
> proposed design (option 6).  A subquery with DISTINCT is the more 
> general case but it does suggest that the short syntax ought to favour 
> the application writer task.  The converse is not possible - starting 
> with unique results and modifying the results for duplicates.

> Property paths can be divided into syntactic short forms and 
> connectivity operators.  There are distinct features; they could have 
> been described in two different sections of the query spec.
> 
> The connection is that short forms can be used as a sub-part of a 
> connectivity property path expression.  The working group could have 
> decided not to cover that and only have connectivity via a single 
> property.
> 
> 
> When combined, the semantics of the connectivity wins and { ?x (:a/:b)* 
> ?y } returns a *set* of matches for (?x, ?y).  Computational complexity 
> is not increased because the "/" operator can be evaluated in context to 
> only yield the necessary unique results which is a simple optimization.
> 
> So there is a choice for the sequence property path operator - syntactic 
> short cut, or connectivity operator (putting everything as a 
> connectivity path).  Which is the best choice is not a technical issue - 
> it's a judgement.  The analogous equivalence is aa* and a+ (assuming"/" 
> for the sequence on connectivity use cases - it could be a different 
> operator) needs to be treated with care - it's not automatic that it is 
> desirable when the whole collection of desirable features for SPARQL 1.1 
> is considered.
> 
> If considered in isolation, and considering only whether a pattern 
> matches or not, then connectivity semantics (non-counting), can be 
> justified on design symmetry grounds.
> 
> Looked at in wider context other factors come into play: there has been 
> a lack of recognition of this wider context.  SPARQL 1.1 is a not a 
> fundamental redesign of SPARQL around property paths.
> 
> The working group has decided that the syntactic short cuts and 
> arbitrary length connectivity operators is the best choice in the 
> context of SPARQL 1.1 overall.
> 
> As an aside, XPath [XP] treats sequences of atomic values differently to 
> sequences of nodes.  Sequences of atomic values are not distinct, 
> precisely so operations like "sum" and "count" do the expected thing. 
> There is an operation in XPath/XQuery functions and operators to make a 
> sequence of atomic distinct (fn:distinct-values).  Obviously, the 
> reverse case of distinct being expanded to duplicates does not work.
> 
> In RDF we can not make the distinction between literals and resources - 
> again we have a design decision point.
>
> One final example:
> 
> An example in the F&R document is list access.  In property paths, this 
> might address that use case:
> 
> { :list rdf:rest*/rdf:first ?member }
> 
> and lists can have duplicates.  The judgement is again whether the short 
> form should make the use case easily addressable or choose symmetry and 
> expect the application writer to know to use the longer form:
> 
> { :list rdf:rest* ?x .
>    ?x rdf:first ?member }
> 
> when duplicates are possible.
> 
> 
>  Andy
> 
> 
> [F&R] http://www.w3.org/TR/sparql-features/
> [XP] http://www.w3.org/TR/xpath20/#id-path-expressions , bullet 2.
> 
>
Received on Tuesday, 8 May 2012 13:33:00 UTC