Re: SPARQL WG action on property paths

On Tue, Apr 3, 2012 at 11:19 AM, Lee Feigenbaum <lee@thefigtrees.net> wrote:
> Hi Wim, Jorge, Jeen, Marcelo, and Sebastian,
>
> (Please note that this is not an official working group response to your
> respective comments on property paths in the current SPARQL 1.1 Query last
> call working draft.)
>
> I want to thank you all again for your research, experiences, suggestions,
> and comments on SPARQL 1.1 property paths. They've been very valuable to the
> working group.
>
> The group has spent some time in the past few weeks considering various
> options in an attempt to address the implementation and evaluation
> challenges that you have all raised while still respecting our group's
> schedule, implementers' burdens, and the use cases we've identified for
> property paths.
>
> Today, we reached consensus within the group on an approach that we feel
> addresses your concerns while still leaving room for implementation
> experience going forward to inform additional design decisions in the
> future.
>
> We haven't yet worked this design into the query document, which is why this
> isn't an official WG response to your comments. Yet before we go ahead and
> publish a new Last call, we'd like to know if you support this new design
> and if you believe that it does indeed address your comments.
>
> The design is summarized in these two emails by Andy Seaborne:
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2012JanMar/0285.html
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2012JanMar/0286.html
>
> I'd very much appreciate it if you can take a look at this and let me know
> what you think.

Hi Lee,

I have followed the discussion regarding property paths in detail for
more than one year, including the two links mentioned above. Regarding
what I think about this last proposal, I think that it is not a good
design decision. Making some property-path operators counting and some
others not is just not natural. From my point of view, it would be
really difficult to tell the users what is exactly going-on with the
semantics. Thus, I am not Ok with this new proposal. Personally, I
still do not understand the need for counting at all. If I can be
honest with you, I cannot see any really strong use case for making
counting a default (and, moreover, Marcelo in his previous email
showed that all the use cases proposed so far can be more naturally
expressed with an existential semantics plus ordinary SPARQL
operators). As far as I can see, having a counting semantics for
property paths was just "an accident" when the group decided to define
property paths by translating them into SPARQL 1.0 operators. At that
time the group did not have enough information to make a clear choice.

On the other hand, and as opposed as what I think it has been said in
some discussions, there is a lot implementation experience in
different contexts on path queries with existential semantics, and
also a huge amount of research. Even there is implementation
experience in SPARQL. Please see:

Gleen: http://sig.biostr.washington.edu/projects/ontviews/gleen/
PSPARQL: http://exmo.inrialpes.fr/software/psparql/
RPL: http://rpl.pms.ifi.lmu.de/

The three of them implement path queries with existential semantics
(non counting), and they work great!

In contrast, there is no experience on implementing path queries that
count, and current implementations of SPARQL 1.1 spec give different
results for the same queries (see
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Feb/0006.html).
This shows that a counting semantics is difficult to understand even
for experienced developers. Moreover, this topic is still an open
research question. Please notice that the two papers that we have made
public to the group are going to be presented in two of the most
important conferences on the subjects of Web (WWW 2012) and databases
(PODS 2012), and are only the first efforts in trying to understand
the issue.

On the positive side, and only if the group insists in the need for
counting for some property path operators, I personally like more the
proposal of DISTINCT/ALL over path expressions (that was also in the
mailing list), but only if DISTINCT is the default. Please notice that
this kind of design is not really different to some SQL operators.
Just recall the "UNION ALL" in SQL. The rationale is that UNION is
essentially a "set" operator, and this is the natural way to be
defined. Thus if you want to retain duplicates in a SQL UNION query,
an additional keyword should be provided. My personal view here is
that for path queries it should be similar: the natural semantics
(used for years in graph databases, XML and also in the RDF/SPARQL
context) is an existential semantics (no duplicates), thus if you want
to retain duplicates (in whatever form the group decide to count
duplicates) you should provide an additional keyword such as ALL.

Please let me know if it is OK with you if I forward this response
together with your message to the public-rdf-dawg-comments list (I
think we can attract more commenters and opinions to this subject if
we openly discuss it).

Cheers,
- jorge


>
> thanks,
> Lee

Received on Friday, 6 April 2012 06:28:24 UTC