- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Sat, 11 Feb 2012 13:12:48 -0500
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Andy, I really appreciate all this work. The existence of these 4 design options is the (sort of) reason that I prefer option 1. We just don't know which distinct approach would be best. As Greg said, let's let it be as-is, and see how implementations and use cases shake out in the real world. Lee On 2/11/2012 11:55 AM, Andy Seaborne wrote: > == Short version > > I can live with either option. > > I prefer option 2 when done properly but as things stand, I have to say > that I can't see option 2 + quick 3LC as viable. > > I've implemented, experimentally, a form of option 2. It's available to > all. > > == Consequences of option 2 (DISTINCT) > > Before we decide, I want to point out what I think are the implications > and consequences of this option. I've done a quick implementation (for > functionality, not performance) and looked at the impact on the spec. > > This is not to put barriers in the way; it is because we do need to have > a single proposal here before deciding on this route. There are > different choices within the description Axel gave and details to work out. > > Design 1: > DISTINCT() is a new path operator and makes everything inside the () > yield unique solutions. This is probably what we have been meaning but > it's not the only choice. > > Design 2: > As design 1 but DISTINCT is only allowed at the top level of a path. > Probably it similar to SELECT DISTINCT bracketing (needs checking though). > > Matt's > { :a :p1{2,3}/DISTINCT(:p2*)/:p3{1,2} ?d } > fails this. > > Design 3: > DISTINCT() modifies the * and + operators to make them deal in unique > solutions but leaves other operators used inside () alone. > > This design preserves the property path transformation rewrite rules. > > DISTINCT is a modifier on * and + alone; I have to check whether this is > just a change to ALP or not. (ALP is defined in terms of node->var so > what happens about ?x :p* ?y, two vars?) > > It is very confusing syntax. It would be better to have unique-* > operators, say !* and !+. Not sure what happens to {n,m} > > I'm not sure of all the consequences of this is P in P!* can itself be a > compound path. > > Design 4: > Only DISTINCT(:p*) and DISTINCT(:p+) are allowed. :p is an IRI. > > == Spec > > The changes to the descriptive section look OK. > > The formal section is impacted more widely. > > The definition of path evaluation is defined by transformation for most > of the operators, leaving operators for *, +, {0} and > NegatedPropertySet. There are no operators for "/", "|", "^" or much of > {n,m} in the evaulation. > > This is how we maximised the interactions of property paths and entailment. > > If we add DISTINCT(path) for any path, we can either leave the > definitions as-is and wrap in a PathDistinct that evaluates like > (distinct (project ... )). [This is an outline and must to be checked.] > > We may need to redo ALP or the operators for * to and + to make it clear > what happens in the uniqueness case although this is technically > unnecessary. > > However, that is just doing the minimum and I don't think it makes for a > clear spec because it is not explicit about combined expressions and > cardinality. > > A proper incorporation is to define operators for all the path > expressions, with cardinality rules and execution. We can note > transformations for non-distinct cases. > > This then accounts for the cardinality of mixed expressions like: > > { :a :p1{2,3}/DISTINCT(:p2*)/:p3{1,2} ?d } > > Also, the effect on implementation might be to push them to have to > implement all the path operators at runtime, not rewrite them to other > forms to leave a few core path operators. I don't know how other > implementations work. > > == Implementation > > I've done a version of ARQ that implements design 1. The grammar change > is trivial: change PathPrimary to have the DISTINCT(path) case. > > The evaluation change is trivial but that's because of the how ARQ > works. It already had a complete evaluator that works on un-transformed > property paths so I was reusing that. > > The implementation is to add a new path expression node type for > distinct. It's a modifier like {n,m} or *, just the syntax is prefix > function form. > > My current implementation is not performant; it's simple and functional > for experimentation - it simply makes unique the results of evaluating > the subpart so it's not exploiting the DISTINCT condition yet and does > not change the evaluation costs. Because ARQ uses a recursive evaluator > without caching of partial results, it looks simple to add a different > evaluator. > > The "caching" (memo-izing) is important for a time-space tradeoff of > evaluation and will change the complexity growth function even in the > counting case but it's not the UC that ARQ is targeted at. > > Andy > > On 09/02/12 21:10, Axel Polleres wrote: >> (in completion of ACTION-587) >> >> Dear all, as discussed in the last Telco, we have several options on >> how to proceed with addressing comment JP-4 [1]. If possible, I >> would like to get consensus on how to proceed here in the next >> Telco. >> >> In the previous Telco [2], we seemed to have consensus that we do not >> aim to switch the default behaviour from counting semantics to >> distinct paths. >> >> Now two possibilities to proceed were discussed: >> >> Option 1... keep everything as it is in the grammar, and explain >> which DISTINCT path subqueries can be optimized: As outlined in my >> email below, it might not be entirely trivial to argue in response to >> the comment that this would be equivalent to the JP-4 proposed >> semantics, I am not 100% sure whether/how to define a rewriting to >> wrap all path expressions into DISTINCT subqueries, such that it >> would be equivalent to their semantics (e.g. regarding bnode [] >> shortcuts). >> >> Option 2 ... add DISTINCT around paths: It seems that sticking to our >> intended semantics and allowing - orthogonally to their ALLPATHS >> keyword proposal the keyword DISTINCT( ) around path expressions >> switching to existential paths semantics would be equivalent to the >> JP-4 existential paths semantics as outlined in Section 7.1 of their >> paper, and thus optimizable. >> >> Unlike someone sees a 3rd alternative, I would like to propose to >> decide between those two options next time and proceed, discussion >> prior to the call on email would be appreciated. >> >> Option 2 might be easier to implement, but also requires us to go for >> another LC round, as it would change the grammar. I think, in case we >> skip PR and manage to republish very soon, we would still manage to >> stay within time limits, but I would also like to know the team >> contacts' opinion on that. > >
Received on Saturday, 11 February 2012 18:13:15 UTC