Jorge Perez' informal comments (was: RE: Description and thoughts behind option 6 (part 1 of 2)) from Polleres, Axel on 2012-04-03 (public-rdf-dawg@w3.org from April to June 2012)

From: Polleres, Axel <axel.polleres@siemens.com>
Date: Tue, 3 Apr 2012 07:12:07 +0200
To: Lee Feigenbaum <lee@thefigtrees.net>, Andy Seaborne <andy.seaborne@epimorphics.com>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <9DA51FFE5E84464082D7A089342DEEE80137B42B366E@ATVIES9917WMSX.ww300.siemens.net>
Dear all,

As input to the discussion, I forward in the end of this mail with the commenter's permission the informal discussion with Jorge Perez on JP-4 about the previous proposal involving +,*,{*},{+}.

I think some of his points might be also relevant to Option 6:

a) Jorge's question about the semantics of 
 (:a | :b)*
  should be answered, i.e., whether it counts the 
  duplicates of (:a | :b) and then discards only the duplicates 
  generated by * or whether it just discards all the duplicates. 

b) Jorge seems to have a strong preference for the restriction 
   to counting/non-counting on path-level, i.e., ALL()/DISTINCT() 
   That would be current options 7) and 8), however, that was against 
   the previous options which involved {+},{*}).

My guess is that the design of Option 6 not having {*} and {+} and not having {n,m} might resolve the last part of jorge's response below, since infinite paths aren't really an issue anymore with the new semantics of +,*, right? Also, by dropping {n,m} we may resolve Wim's (WM-1) concern. As mentioned in my previous mail, I'd personally prefer approaching all three commenters with a digest of options *the group can live with* above picking one only. From my side and current knowledge, I am ok with either Option 6), 7), and 8).

Best,
Axel

--------------------------------------
 
Hello Axel,

On Tue, Mar 6, 2012 at 12:58 PM, Axel Polleres <axel@polleres.net> wrote:
> Hi Jorge, Marcelo, (offlist)
>
> As you may have noticed, the group is not inactive about your comment, but discussing it intensively.
> The reason for silence over the official channels is that we want the issue solved before giving an official answer. This is the proposal where we stand at the moment [1]...
>
> Essentially, it suggests changing "*" and "+" to non-counting semantics, as you proposed,
> and have an additional counting version "{*}" and "{+}".
>
> Most importantly, I would like to "test the waters" whether you're ok with this way forward in principle,
> and please understand that the group is under particular time pressure.
> Please let me know as soon as possible!


I must admit that I need to think a bit more on it, but at a first
glance I think that this solution opens some new issues. For instance,
what is the meaning of something like

(:a | :b)*

are you going to count the duplicates of (:a | :b) and then discard
only the duplicates generated by *? or are you going to just discard
all the duplicates? in any case I think that it would be very
confusing for the users. My guess is that things would become more
complicated if * and {*} is combined with other operators.

Also please notice that Wim Martens showed in his paper that * is not
the only problem, as expressions like path(n,m) also suffer from
extreme complexity problems and they are not covered by the new
solution.

So going to your question, unfortunately I am not Ok with this new
solution having *, +, {*}, {+}, but, the only strong point to be
against this decision would be the work of Wim which is out of the
scope of our message. So I would considered my comment answered if the
answer is on the line of adding *, {*} (I would say, "I do not agree,
but I acknowledge that the group considered the issue raised by my
comment").

On the positive side, I personally like more the proposal of
DISTINCT/ALL-PATHS over path expressions (that was also in the mailing
list), something like

DISTINCT( (:a | :b)* )
ALL-PATHS( (:a | :b)* )

but only if DISTINCT and ALL-PATHS are used as a top level modifier of
path expressions. That is, something like (:a/DISTINCT(:b*))* should
not be allowed, since, otherwise one would reach to a similar problem
as of having * and {*}.

Also please notice that adding DISTINCT and ALL-PATHS at the top level
would affect just a very minor part of the current grammar.

I think that a fast way of getting out of all this problem is to
define both options in the specification ALL-PATHS and DISTINCT as top
level modifiers of path expressions, and say that when neither
ALL-PATHS nor DISTINCT are specified, then one of them is picked by
default. As you may guess, I would like the default to be DISTINCT
:-). This way of defining things would also allow a subsequent group
to define new modifiers like SIMPLE-PATHS, or ALL-SIMPLE-PATHS or some
other constructor that allow, for example, to select paths until some
length (for optimization or some other purposes).

In any case, and whatever the decision would be, my main concern is
still the semantics of ALL-PATHS(...) (or {*}). The question is, what
are you really counting when there are infinite paths? We have
discussed a bit with Marcelo but we haven't reached a clear consensus
on what to count. Nevertheless, if you do not make ALL-PATHS as the
default, then there is no problem at all on which semantics it has,
and users may decide to use this modifier at their own risk.

Hope my message helps (and I am open to continue with the discussion,
since it interests me a lot!).

Cheers,
- jorge
Received on Tuesday, 3 April 2012 05:15:30 UTC