- From: Sandro Hawke <sandro@w3.org>
- Date: Tue, 08 May 2012 09:32:45 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
On Thu, 2012-04-26 at 12:57 +0100, Andy Seaborne wrote: > Here is some possible draft content for a response to William Waites. > His message provides the opportunity to highlight the judgement calls > the design makes to balance property paths considered in isolation and > property paths considered along side other SPARQL 1.1 features. He is > using the analogy to strings of (non)equivalence of "aa*" and "a+" which > is a consideration, but not the only one as I see it. > > Whether this is best sent as a WG formal response or as a discussion > item on the comments list (or elsewhere) I don't know - I leave that to > suggestions; I'm willing to have a discussion with people if they are > prepared to discuss the use cases. > > Andy > > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0011.html > > On 19/04/12 14:59, William Waites wrote: > > I was happy to see the revisions earlier in the week to the SPARQL 1.1 > > working draft that saw the default semantics of the *, + and ? > > operators changed from counting to existential. > > > > Is it to be expected that a similar revision will be forthcoming for > > the simple walk vs. regular path problem signalled by Wim Martens et > > al? > > > > As I understand it, the non-standard W3C simple-walk semantics mean > > that evaluating path expressions containing those operators is > > intractable even with counting semantics. See Wim's earlier mail at > > [1]. > > > > I understand that the WG is already over time and there is pressure to > > carve the spec into stone, but it seems to me better to be late than > > to release something containing a known serious error especially when > > the fix is clear. > > > > I would also like to point out -- and this is the reason I call the > > simple-walk semantics non-standard -- that basic things like the > > equivalence of "aa*" and "a+" are not true under the current > > draft. Apart from questions of tractability, this could be quite > > confusing to users. > > > > Cheers, > > -w > > > > [1] > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Feb/0029.html > > William, > > Using the analogy with regular expressions over strings needs some care. > There are significant ways in which a graph does not behave like a > string. For example, in RDF, when a literal value, 5 say, is used as > the object of multiple triples, there is only one occurrence of the > literal as a graph node (literals are "tidy" in the language of the RDF > 2004 Working Group). When there is a divergence of between string-like > and graph-like characteristics there is a design decision to be made. > Duplicate nodes is one such point. > > Consider this example of a simple purchase order, with two items. > > ---- Data 1 ---- > @prefix : <http://example/> . > > :order :hasItem :item1 . > :order :hasItem :item2 . > > :item1 :price 5 . > :item2 :price 2 . > ---- Data 1 ---- > > > SPARQL 1.1 introduces grouping and aggregation - it's one of the major > features missing and a major use feature request [F&R]. These queries > give the total price of the order. > > SELECT (SUM(?price) AS ?total) > WHERE { :order :item/:price ?price } That should be :hastItem/:price I think. And below, I guess you changed :item to :hasItem in the data but not the queries. -- Sandro > SELECT (SUM(?price) AS ?total) > WHERE { :order :item [ :price ?price ] } > > SELECT (SUM(?price) AS ?total) > WHERE > { :order :item ?x . > ?x :price ?price } > > The first uses property paths. > > --------- > | total | > ========= > | 7 | > --------- > > > ---- Data 2 ---- > @prefix : <http://example/> . > > :order :hasItem :item1 . > :order :hasItem :item2 . > > :item1 :price 5 . # Same price. > :item2 :price 5 . > ---- Data 2 ---- > > Query 1: > --------- > | total | > ========= > | 5 | > --------- > > Queries 2 and 3: > --------- > | total | > ========= > | 10 | > --------- > > In data 2, the price of the items is the same. The first query now does > not return the total price of the order if only connectivity is > considered. That is considered to be confusing. > > Query 1 is the same as: > > SELECT (SUM(DISTINCT ?price) AS ?total) > WHERE > { :order :item ?x . > ?x :price ?price > } > > using DISTINCT in the sum() aggregation. This shows it's possible to > write a SPARQL query with non-counting results given the current > proposed design (option 6). A subquery with DISTINCT is the more > general case but it does suggest that the short syntax ought to favour > the application writer task. The converse is not possible - starting > with unique results and modifying the results for duplicates. > Property paths can be divided into syntactic short forms and > connectivity operators. There are distinct features; they could have > been described in two different sections of the query spec. > > The connection is that short forms can be used as a sub-part of a > connectivity property path expression. The working group could have > decided not to cover that and only have connectivity via a single > property. > > > When combined, the semantics of the connectivity wins and { ?x (:a/:b)* > ?y } returns a *set* of matches for (?x, ?y). Computational complexity > is not increased because the "/" operator can be evaluated in context to > only yield the necessary unique results which is a simple optimization. > > So there is a choice for the sequence property path operator - syntactic > short cut, or connectivity operator (putting everything as a > connectivity path). Which is the best choice is not a technical issue - > it's a judgement. The analogous equivalence is aa* and a+ (assuming"/" > for the sequence on connectivity use cases - it could be a different > operator) needs to be treated with care - it's not automatic that it is > desirable when the whole collection of desirable features for SPARQL 1.1 > is considered. > > If considered in isolation, and considering only whether a pattern > matches or not, then connectivity semantics (non-counting), can be > justified on design symmetry grounds. > > Looked at in wider context other factors come into play: there has been > a lack of recognition of this wider context. SPARQL 1.1 is a not a > fundamental redesign of SPARQL around property paths. > > The working group has decided that the syntactic short cuts and > arbitrary length connectivity operators is the best choice in the > context of SPARQL 1.1 overall. > > As an aside, XPath [XP] treats sequences of atomic values differently to > sequences of nodes. Sequences of atomic values are not distinct, > precisely so operations like "sum" and "count" do the expected thing. > There is an operation in XPath/XQuery functions and operators to make a > sequence of atomic distinct (fn:distinct-values). Obviously, the > reverse case of distinct being expanded to duplicates does not work. > > In RDF we can not make the distinction between literals and resources - > again we have a design decision point. > > One final example: > > An example in the F&R document is list access. In property paths, this > might address that use case: > > { :list rdf:rest*/rdf:first ?member } > > and lists can have duplicates. The judgement is again whether the short > form should make the use case easily addressable or choose symmetry and > expect the application writer to know to use the longer form: > > { :list rdf:rest* ?x . > ?x rdf:first ?member } > > when duplicates are possible. > > > Andy > > > [F&R] http://www.w3.org/TR/sparql-features/ > [XP] http://www.w3.org/TR/xpath20/#id-path-expressions , bullet 2. > >
Received on Tuesday, 8 May 2012 13:33:00 UTC