- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Sat, 11 Feb 2012 16:55:19 +0000
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
== Short version I can live with either option. I prefer option 2 when done properly but as things stand, I have to say that I can't see option 2 + quick 3LC as viable. I've implemented, experimentally, a form of option 2. It's available to all. == Consequences of option 2 (DISTINCT) Before we decide, I want to point out what I think are the implications and consequences of this option. I've done a quick implementation (for functionality, not performance) and looked at the impact on the spec. This is not to put barriers in the way; it is because we do need to have a single proposal here before deciding on this route. There are different choices within the description Axel gave and details to work out. Design 1: DISTINCT() is a new path operator and makes everything inside the () yield unique solutions. This is probably what we have been meaning but it's not the only choice. Design 2: As design 1 but DISTINCT is only allowed at the top level of a path. Probably it similar to SELECT DISTINCT bracketing (needs checking though). Matt's { :a :p1{2,3}/DISTINCT(:p2*)/:p3{1,2} ?d } fails this. Design 3: DISTINCT() modifies the * and + operators to make them deal in unique solutions but leaves other operators used inside () alone. This design preserves the property path transformation rewrite rules. DISTINCT is a modifier on * and + alone; I have to check whether this is just a change to ALP or not. (ALP is defined in terms of node->var so what happens about ?x :p* ?y, two vars?) It is very confusing syntax. It would be better to have unique-* operators, say !* and !+. Not sure what happens to {n,m} I'm not sure of all the consequences of this is P in P!* can itself be a compound path. Design 4: Only DISTINCT(:p*) and DISTINCT(:p+) are allowed. :p is an IRI. == Spec The changes to the descriptive section look OK. The formal section is impacted more widely. The definition of path evaluation is defined by transformation for most of the operators, leaving operators for *, +, {0} and NegatedPropertySet. There are no operators for "/", "|", "^" or much of {n,m} in the evaulation. This is how we maximised the interactions of property paths and entailment. If we add DISTINCT(path) for any path, we can either leave the definitions as-is and wrap in a PathDistinct that evaluates like (distinct (project ... )). [This is an outline and must to be checked.] We may need to redo ALP or the operators for * to and + to make it clear what happens in the uniqueness case although this is technically unnecessary. However, that is just doing the minimum and I don't think it makes for a clear spec because it is not explicit about combined expressions and cardinality. A proper incorporation is to define operators for all the path expressions, with cardinality rules and execution. We can note transformations for non-distinct cases. This then accounts for the cardinality of mixed expressions like: { :a :p1{2,3}/DISTINCT(:p2*)/:p3{1,2} ?d } Also, the effect on implementation might be to push them to have to implement all the path operators at runtime, not rewrite them to other forms to leave a few core path operators. I don't know how other implementations work. == Implementation I've done a version of ARQ that implements design 1. The grammar change is trivial: change PathPrimary to have the DISTINCT(path) case. The evaluation change is trivial but that's because of the how ARQ works. It already had a complete evaluator that works on un-transformed property paths so I was reusing that. The implementation is to add a new path expression node type for distinct. It's a modifier like {n,m} or *, just the syntax is prefix function form. My current implementation is not performant; it's simple and functional for experimentation - it simply makes unique the results of evaluating the subpart so it's not exploiting the DISTINCT condition yet and does not change the evaluation costs. Because ARQ uses a recursive evaluator without caching of partial results, it looks simple to add a different evaluator. The "caching" (memo-izing) is important for a time-space tradeoff of evaluation and will change the complexity growth function even in the counting case but it's not the UC that ARQ is targeted at. Andy On 09/02/12 21:10, Axel Polleres wrote: > (in completion of ACTION-587) > > Dear all, as discussed in the last Telco, we have several options on > how to proceed with addressing comment JP-4 [1]. If possible, I > would like to get consensus on how to proceed here in the next > Telco. > > In the previous Telco [2], we seemed to have consensus that we do not > aim to switch the default behaviour from counting semantics to > distinct paths. > > Now two possibilities to proceed were discussed: > > Option 1... keep everything as it is in the grammar, and explain > which DISTINCT path subqueries can be optimized: As outlined in my > email below, it might not be entirely trivial to argue in response to > the comment that this would be equivalent to the JP-4 proposed > semantics, I am not 100% sure whether/how to define a rewriting to > wrap all path expressions into DISTINCT subqueries, such that it > would be equivalent to their semantics (e.g. regarding bnode [] > shortcuts). > > Option 2 ... add DISTINCT around paths: It seems that sticking to our > intended semantics and allowing - orthogonally to their ALLPATHS > keyword proposal the keyword DISTINCT( ) around path expressions > switching to existential paths semantics would be equivalent to the > JP-4 existential paths semantics as outlined in Section 7.1 of their > paper, and thus optimizable. > > Unlike someone sees a 3rd alternative, I would like to propose to > decide between those two options next time and proceed, discussion > prior to the call on email would be appreciated. > > Option 2 might be easier to implement, but also requires us to go for > another LC round, as it would change the grammar. I think, in case we > skip PR and manage to republish very soon, we would still manage to > stay within time limits, but I would also like to know the team > contacts' opinion on that.
Received on Saturday, 11 February 2012 16:55:46 UTC