- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Sat, 11 Feb 2012 16:55:19 +0000
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
== Short version
I can live with either option.
I prefer option 2 when done properly but as things stand, I have to say
that I can't see option 2 + quick 3LC as viable.
I've implemented, experimentally, a form of option 2. It's available to
all.
== Consequences of option 2 (DISTINCT)
Before we decide, I want to point out what I think are the implications
and consequences of this option. I've done a quick implementation (for
functionality, not performance) and looked at the impact on the spec.
This is not to put barriers in the way; it is because we do need to have
a single proposal here before deciding on this route. There are
different choices within the description Axel gave and details to work out.
Design 1:
DISTINCT() is a new path operator and makes everything inside the ()
yield unique solutions. This is probably what we have been meaning but
it's not the only choice.
Design 2:
As design 1 but DISTINCT is only allowed at the top level of a path.
Probably it similar to SELECT DISTINCT bracketing (needs checking though).
Matt's
{ :a :p1{2,3}/DISTINCT(:p2*)/:p3{1,2} ?d }
fails this.
Design 3:
DISTINCT() modifies the * and + operators to make them deal in unique
solutions but leaves other operators used inside () alone.
This design preserves the property path transformation rewrite rules.
DISTINCT is a modifier on * and + alone; I have to check whether this is
just a change to ALP or not. (ALP is defined in terms of node->var so
what happens about ?x :p* ?y, two vars?)
It is very confusing syntax. It would be better to have unique-*
operators, say !* and !+. Not sure what happens to {n,m}
I'm not sure of all the consequences of this is P in P!* can itself be a
compound path.
Design 4:
Only DISTINCT(:p*) and DISTINCT(:p+) are allowed. :p is an IRI.
== Spec
The changes to the descriptive section look OK.
The formal section is impacted more widely.
The definition of path evaluation is defined by transformation for most
of the operators, leaving operators for *, +, {0} and
NegatedPropertySet. There are no operators for "/", "|", "^" or much of
{n,m} in the evaulation.
This is how we maximised the interactions of property paths and entailment.
If we add DISTINCT(path) for any path, we can either leave the
definitions as-is and wrap in a PathDistinct that evaluates like
(distinct (project ... )). [This is an outline and must to be checked.]
We may need to redo ALP or the operators for * to and + to make it clear
what happens in the uniqueness case although this is technically
unnecessary.
However, that is just doing the minimum and I don't think it makes for a
clear spec because it is not explicit about combined expressions and
cardinality.
A proper incorporation is to define operators for all the path
expressions, with cardinality rules and execution. We can note
transformations for non-distinct cases.
This then accounts for the cardinality of mixed expressions like:
{ :a :p1{2,3}/DISTINCT(:p2*)/:p3{1,2} ?d }
Also, the effect on implementation might be to push them to have to
implement all the path operators at runtime, not rewrite them to other
forms to leave a few core path operators. I don't know how other
implementations work.
== Implementation
I've done a version of ARQ that implements design 1. The grammar change
is trivial: change PathPrimary to have the DISTINCT(path) case.
The evaluation change is trivial but that's because of the how ARQ
works. It already had a complete evaluator that works on un-transformed
property paths so I was reusing that.
The implementation is to add a new path expression node type for
distinct. It's a modifier like {n,m} or *, just the syntax is prefix
function form.
My current implementation is not performant; it's simple and functional
for experimentation - it simply makes unique the results of evaluating
the subpart so it's not exploiting the DISTINCT condition yet and does
not change the evaluation costs. Because ARQ uses a recursive evaluator
without caching of partial results, it looks simple to add a different
evaluator.
The "caching" (memo-izing) is important for a time-space tradeoff of
evaluation and will change the complexity growth function even in the
counting case but it's not the UC that ARQ is targeted at.
Andy
On 09/02/12 21:10, Axel Polleres wrote:
> (in completion of ACTION-587)
>
> Dear all, as discussed in the last Telco, we have several options on
> how to proceed with addressing comment JP-4 [1]. If possible, I
> would like to get consensus on how to proceed here in the next
> Telco.
>
> In the previous Telco [2], we seemed to have consensus that we do not
> aim to switch the default behaviour from counting semantics to
> distinct paths.
>
> Now two possibilities to proceed were discussed:
>
> Option 1... keep everything as it is in the grammar, and explain
> which DISTINCT path subqueries can be optimized: As outlined in my
> email below, it might not be entirely trivial to argue in response to
> the comment that this would be equivalent to the JP-4 proposed
> semantics, I am not 100% sure whether/how to define a rewriting to
> wrap all path expressions into DISTINCT subqueries, such that it
> would be equivalent to their semantics (e.g. regarding bnode []
> shortcuts).
>
> Option 2 ... add DISTINCT around paths: It seems that sticking to our
> intended semantics and allowing - orthogonally to their ALLPATHS
> keyword proposal the keyword DISTINCT( ) around path expressions
> switching to existential paths semantics would be equivalent to the
> JP-4 existential paths semantics as outlined in Section 7.1 of their
> paper, and thus optimizable.
>
> Unlike someone sees a 3rd alternative, I would like to propose to
> decide between those two options next time and proceed, discussion
> prior to the call on email would be appreciated.
>
> Option 2 might be easier to implement, but also requires us to go for
> another LC round, as it would change the grammar. I think, in case we
> skip PR and manage to republish very soon, we would still manage to
> stay within time limits, but I would also like to know the team
> contacts' opinion on that.
Received on Saturday, 11 February 2012 16:55:46 UTC