W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2012

property paths

From: Gregory Williams <greg@evilfunhouse.com>
Date: Mon, 26 Mar 2012 15:30:46 -0400
Message-Id: <DD068F30-02F4-4D3F-A785-7CEF245A32E4@evilfunhouse.com>
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Having unfortunately missed last week's call where there was a resolution about property paths, I wanted to at least send some thoughts on the course we seem to be taking.

After having done some re-implementation over the last week to address some bugs in my implementation, I added support for distinct paths using (what I think is) the new DISTINCT(path) syntax. After this work, I think I agree with Andy that it's not a big implementation burden (and as noted by many commenters, has some obvious benefits for performance). That being said, I think I also agree with Steve's concerns: I suspect my ease in implementing the second path semantics is a result of not trying to have a high-performance implementation. Doing optimization on property paths well is probably pretty hard, and having a second set of path semantics certainly doesn't help.

Beyond the implementation burden, I'm very concerned that we're rushing into this. Assuming that two path semantics are required at this point (it seems we've mostly agreed on this), I'm worried about the direction we seem to be taking syntactically. From last week's resolution, I take it that the current thinking on design is that something like { ?s pathexpr ?o } will do the counting semantics (the old design) and { ?s DISTINCT(pathexpr) ?o } will do the distinct semantics. Was there any discussion last week about similar syntax for the counting semantics? I know at one point there was discussion of an ALLPATHS keyword.

I know there was concern from some that DISTINCT() is a rather wordy way of asking for the distinct semantics. I'm concerned that we may end up in a situation where it turns out that most people do end up wanting the distinct semantics, and with our current design, we will have done the Huffman coding all wrong -- distinct semantics will require a long keyword, while the counting semantics won't. An alternative I had thought about was having keywords for both semantics (e.g. DISTINCT and ALLPATHS), and leaving the choice of semantics for queries not specifying a keyword to either the implementation or a future WG. I think that might make some implementors happy, at the expense of result cardinality differences between implementations in the near term (that is, until a future WG might nail down which semantics should be the default).

thanks,
.greg
Received on Monday, 26 March 2012 19:31:11 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:47 GMT