Re: summary on options for JP-4 Comment about the semantics of property paths from Lee Feigenbaum on 2012-02-11 (public-rdf-dawg@w3.org from January to March 2012)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Sat, 11 Feb 2012 13:12:48 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4F36AFA0.5060703@thefigtrees.net>
Andy, I really appreciate all this work.

The existence of these 4 design options is the (sort of) reason that I 
prefer option 1. We just don't know which distinct approach would be 
best. As Greg said, let's let it be as-is, and see how implementations 
and use cases shake out in the real world.

Lee

On 2/11/2012 11:55 AM, Andy Seaborne wrote:
> == Short version
>
> I can live with either option.
>
> I prefer option 2 when done properly but as things stand, I have to say
> that I can't see option 2 + quick 3LC as viable.
>
> I've implemented, experimentally, a form of option 2. It's available to
> all.
>
> == Consequences of option 2 (DISTINCT)
>
> Before we decide, I want to point out what I think are the implications
> and consequences of this option. I've done a quick implementation (for
> functionality, not performance) and looked at the impact on the spec.
>
> This is not to put barriers in the way; it is because we do need to have
> a single proposal here before deciding on this route. There are
> different choices within the description Axel gave and details to work out.
>
> Design 1:
> DISTINCT() is a new path operator and makes everything inside the ()
> yield unique solutions. This is probably what we have been meaning but
> it's not the only choice.
>
> Design 2:
> As design 1 but DISTINCT is only allowed at the top level of a path.
> Probably it similar to SELECT DISTINCT bracketing (needs checking though).
>
> Matt's
> { :a :p1{2,3}/DISTINCT(:p2*)/:p3{1,2} ?d }
> fails this.
>
> Design 3:
> DISTINCT() modifies the * and + operators to make them deal in unique
> solutions but leaves other operators used inside () alone.
>
> This design preserves the property path transformation rewrite rules.
>
> DISTINCT is a modifier on * and + alone; I have to check whether this is
> just a change to ALP or not. (ALP is defined in terms of node->var so
> what happens about ?x :p* ?y, two vars?)
>
> It is very confusing syntax. It would be better to have unique-*
> operators, say !* and !+. Not sure what happens to {n,m}
>
> I'm not sure of all the consequences of this is P in P!* can itself be a
> compound path.
>
> Design 4:
> Only DISTINCT(:p*) and DISTINCT(:p+) are allowed. :p is an IRI.
>
> == Spec
>
> The changes to the descriptive section look OK.
>
> The formal section is impacted more widely.
>
> The definition of path evaluation is defined by transformation for most
> of the operators, leaving operators for *, +, {0} and
> NegatedPropertySet. There are no operators for "/", "|", "^" or much of
> {n,m} in the evaulation.
>
> This is how we maximised the interactions of property paths and entailment.
>
> If we add DISTINCT(path) for any path, we can either leave the
> definitions as-is and wrap in a PathDistinct that evaluates like
> (distinct (project ... )). [This is an outline and must to be checked.]
>
> We may need to redo ALP or the operators for * to and + to make it clear
> what happens in the uniqueness case although this is technically
> unnecessary.
>
> However, that is just doing the minimum and I don't think it makes for a
> clear spec because it is not explicit about combined expressions and
> cardinality.
>
> A proper incorporation is to define operators for all the path
> expressions, with cardinality rules and execution. We can note
> transformations for non-distinct cases.
>
> This then accounts for the cardinality of mixed expressions like:
>
> { :a :p1{2,3}/DISTINCT(:p2*)/:p3{1,2} ?d }
>
> Also, the effect on implementation might be to push them to have to
> implement all the path operators at runtime, not rewrite them to other
> forms to leave a few core path operators. I don't know how other
> implementations work.
>
> == Implementation
>
> I've done a version of ARQ that implements design 1. The grammar change
> is trivial: change PathPrimary to have the DISTINCT(path) case.
>
> The evaluation change is trivial but that's because of the how ARQ
> works. It already had a complete evaluator that works on un-transformed
> property paths so I was reusing that.
>
> The implementation is to add a new path expression node type for
> distinct. It's a modifier like {n,m} or *, just the syntax is prefix
> function form.
>
> My current implementation is not performant; it's simple and functional
> for experimentation - it simply makes unique the results of evaluating
> the subpart so it's not exploiting the DISTINCT condition yet and does
> not change the evaluation costs. Because ARQ uses a recursive evaluator
> without caching of partial results, it looks simple to add a different
> evaluator.
>
> The "caching" (memo-izing) is important for a time-space tradeoff of
> evaluation and will change the complexity growth function even in the
> counting case but it's not the UC that ARQ is targeted at.
>
> Andy
>
> On 09/02/12 21:10, Axel Polleres wrote:
>> (in completion of ACTION-587)
>>
>> Dear all, as discussed in the last Telco, we have several options on
>> how to proceed with addressing comment JP-4 [1]. If possible, I
>> would like to get consensus on how to proceed here in the next
>> Telco.
>>
>> In the previous Telco [2], we seemed to have consensus that we do not
>> aim to switch the default behaviour from counting semantics to
>> distinct paths.
>>
>> Now two possibilities to proceed were discussed:
>>
>> Option 1... keep everything as it is in the grammar, and explain
>> which DISTINCT path subqueries can be optimized: As outlined in my
>> email below, it might not be entirely trivial to argue in response to
>> the comment that this would be equivalent to the JP-4 proposed
>> semantics, I am not 100% sure whether/how to define a rewriting to
>> wrap all path expressions into DISTINCT subqueries, such that it
>> would be equivalent to their semantics (e.g. regarding bnode []
>> shortcuts).
>>
>> Option 2 ... add DISTINCT around paths: It seems that sticking to our
>> intended semantics and allowing - orthogonally to their ALLPATHS
>> keyword proposal the keyword DISTINCT( ) around path expressions
>> switching to existential paths semantics would be equivalent to the
>> JP-4 existential paths semantics as outlined in Section 7.1 of their
>> paper, and thus optimizable.
>>
>> Unlike someone sees a 3rd alternative, I would like to propose to
>> decide between those two options next time and proceed, discussion
>> prior to the call on email would be appreciated.
>>
>> Option 2 might be easier to implement, but also requires us to go for
>> another LC round, as it would change the grammar. I think, in case we
>> skip PR and manage to republish very soon, we would still manage to
>> stay within time limits, but I would also like to know the team
>> contacts' opinion on that.
>
>
Received on Saturday, 11 February 2012 18:13:15 UTC