- From: Sandro Hawke <sandro@w3.org>
- Date: Tue, 08 May 2012 09:32:45 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
On Thu, 2012-04-26 at 12:57 +0100, Andy Seaborne wrote:
> Here is some possible draft content for a response to William Waites.
> His message provides the opportunity to highlight the judgement calls
> the design makes to balance property paths considered in isolation and
> property paths considered along side other SPARQL 1.1 features. He is
> using the analogy to strings of (non)equivalence of "aa*" and "a+" which
> is a consideration, but not the only one as I see it.
>
> Whether this is best sent as a WG formal response or as a discussion
> item on the comments list (or elsewhere) I don't know - I leave that to
> suggestions; I'm willing to have a discussion with people if they are
> prepared to discuss the use cases.
>
> Andy
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0011.html
>
> On 19/04/12 14:59, William Waites wrote:
> > I was happy to see the revisions earlier in the week to the SPARQL 1.1
> > working draft that saw the default semantics of the *, + and ?
> > operators changed from counting to existential.
> >
> > Is it to be expected that a similar revision will be forthcoming for
> > the simple walk vs. regular path problem signalled by Wim Martens et
> > al?
> >
> > As I understand it, the non-standard W3C simple-walk semantics mean
> > that evaluating path expressions containing those operators is
> > intractable even with counting semantics. See Wim's earlier mail at
> > [1].
> >
> > I understand that the WG is already over time and there is pressure to
> > carve the spec into stone, but it seems to me better to be late than
> > to release something containing a known serious error especially when
> > the fix is clear.
> >
> > I would also like to point out -- and this is the reason I call the
> > simple-walk semantics non-standard -- that basic things like the
> > equivalence of "aa*" and "a+" are not true under the current
> > draft. Apart from questions of tractability, this could be quite
> > confusing to users.
> >
> > Cheers,
> > -w
> >
> > [1]
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Feb/0029.html
>
> William,
>
> Using the analogy with regular expressions over strings needs some care.
> There are significant ways in which a graph does not behave like a
> string. For example, in RDF, when a literal value, 5 say, is used as
> the object of multiple triples, there is only one occurrence of the
> literal as a graph node (literals are "tidy" in the language of the RDF
> 2004 Working Group). When there is a divergence of between string-like
> and graph-like characteristics there is a design decision to be made.
> Duplicate nodes is one such point.
>
> Consider this example of a simple purchase order, with two items.
>
> ---- Data 1 ----
> @prefix : <http://example/> .
>
> :order :hasItem :item1 .
> :order :hasItem :item2 .
>
> :item1 :price 5 .
> :item2 :price 2 .
> ---- Data 1 ----
>
>
> SPARQL 1.1 introduces grouping and aggregation - it's one of the major
> features missing and a major use feature request [F&R]. These queries
> give the total price of the order.
>
> SELECT (SUM(?price) AS ?total)
> WHERE { :order :item/:price ?price }
That should be :hastItem/:price I think. And below, I guess you
changed :item to :hasItem in the data but not the queries.
-- Sandro
> SELECT (SUM(?price) AS ?total)
> WHERE { :order :item [ :price ?price ] }
>
> SELECT (SUM(?price) AS ?total)
> WHERE
> { :order :item ?x .
> ?x :price ?price }
>
> The first uses property paths.
>
> ---------
> | total |
> =========
> | 7 |
> ---------
>
>
> ---- Data 2 ----
> @prefix : <http://example/> .
>
> :order :hasItem :item1 .
> :order :hasItem :item2 .
>
> :item1 :price 5 . # Same price.
> :item2 :price 5 .
> ---- Data 2 ----
>
> Query 1:
> ---------
> | total |
> =========
> | 5 |
> ---------
>
> Queries 2 and 3:
> ---------
> | total |
> =========
> | 10 |
> ---------
>
> In data 2, the price of the items is the same. The first query now does
> not return the total price of the order if only connectivity is
> considered. That is considered to be confusing.
>
> Query 1 is the same as:
>
> SELECT (SUM(DISTINCT ?price) AS ?total)
> WHERE
> { :order :item ?x .
> ?x :price ?price
> }
>
> using DISTINCT in the sum() aggregation. This shows it's possible to
> write a SPARQL query with non-counting results given the current
> proposed design (option 6). A subquery with DISTINCT is the more
> general case but it does suggest that the short syntax ought to favour
> the application writer task. The converse is not possible - starting
> with unique results and modifying the results for duplicates.
> Property paths can be divided into syntactic short forms and
> connectivity operators. There are distinct features; they could have
> been described in two different sections of the query spec.
>
> The connection is that short forms can be used as a sub-part of a
> connectivity property path expression. The working group could have
> decided not to cover that and only have connectivity via a single
> property.
>
>
> When combined, the semantics of the connectivity wins and { ?x (:a/:b)*
> ?y } returns a *set* of matches for (?x, ?y). Computational complexity
> is not increased because the "/" operator can be evaluated in context to
> only yield the necessary unique results which is a simple optimization.
>
> So there is a choice for the sequence property path operator - syntactic
> short cut, or connectivity operator (putting everything as a
> connectivity path). Which is the best choice is not a technical issue -
> it's a judgement. The analogous equivalence is aa* and a+ (assuming"/"
> for the sequence on connectivity use cases - it could be a different
> operator) needs to be treated with care - it's not automatic that it is
> desirable when the whole collection of desirable features for SPARQL 1.1
> is considered.
>
> If considered in isolation, and considering only whether a pattern
> matches or not, then connectivity semantics (non-counting), can be
> justified on design symmetry grounds.
>
> Looked at in wider context other factors come into play: there has been
> a lack of recognition of this wider context. SPARQL 1.1 is a not a
> fundamental redesign of SPARQL around property paths.
>
> The working group has decided that the syntactic short cuts and
> arbitrary length connectivity operators is the best choice in the
> context of SPARQL 1.1 overall.
>
> As an aside, XPath [XP] treats sequences of atomic values differently to
> sequences of nodes. Sequences of atomic values are not distinct,
> precisely so operations like "sum" and "count" do the expected thing.
> There is an operation in XPath/XQuery functions and operators to make a
> sequence of atomic distinct (fn:distinct-values). Obviously, the
> reverse case of distinct being expanded to duplicates does not work.
>
> In RDF we can not make the distinction between literals and resources -
> again we have a design decision point.
>
> One final example:
>
> An example in the F&R document is list access. In property paths, this
> might address that use case:
>
> { :list rdf:rest*/rdf:first ?member }
>
> and lists can have duplicates. The judgement is again whether the short
> form should make the use case easily addressable or choose symmetry and
> expect the application writer to know to use the longer form:
>
> { :list rdf:rest* ?x .
> ?x rdf:first ?member }
>
> when duplicates are possible.
>
>
> Andy
>
>
> [F&R] http://www.w3.org/TR/sparql-features/
> [XP] http://www.w3.org/TR/xpath20/#id-path-expressions , bullet 2.
>
>
Received on Tuesday, 8 May 2012 13:33:00 UTC