Re: fed review from Lee Feigenbaum on 2011-07-20 (public-rdf-dawg@w3.org from July to September 2011)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Tue, 19 Jul 2011 22:57:51 -0400
To: Carlos Buil Aranda <cbuil@fi.upm.es>
CC: Gregory Williams <greg@evilfunhouse.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4E26442F.6020405@thefigtrees.net>
On 7/19/2011 5:02 PM, Carlos Buil Aranda wrote:
> just summarizing a bit, the main concern is the use of variables in
> SERVICE. The problem relies in the specification of the semantics and
> the boundedness condition which suggests an order for executing the
> query. The problem is first, the way of defining the execution of
> SERVICE VAR, which using the current join semantics is wrong because it
> is not possible to evaluate Join(G, Service(VAR, G, Transform(P),
> SilentOp)) because VAR still hasn't a value. Following this, the
> boundedness condition, which has to be completed for all the SPARQL 1.1
> operators. Am I right?
>
> So, the possible solutions are:
>   - For the SERVICE VAR semantics
>     - add a new operation that could allow the evaluation (operation
> which wouldn't be bottom-up)
>     - define its whole semantics in a new way
>   - For the boundedness restriction
>    - specify all cases: it may take a bit long
>    - remove it? I do not think this is a good idea, how do we specify
> then that a variable is bound (which is needed for the evaluation
> semantics of SERVICE VAR)?
>
> something missing? any other option?

As someone who was a bit uncomfortable including it in the first place, 
I'll add another, dramatic option: remove SERVICE VAR from the 
specification altogether.

Lee

>
> Carlos
>
> PS I notice that there are still pending issues in the previous email to
> be addressed, this is just a summary of what I think is the most
> important topic of the email
>
> 2011/7/19 Gregory Williams <greg@evilfunhouse.com
> <mailto:greg@evilfunhouse.com>>
>
>
>     On Jul 19, 2011, at 1:06 PM, Carlos Buil Aranda wrote:
>
>      > 2011/7/18 Gregory Williams <greg@evilfunhouse.com
>     <mailto:greg@evilfunhouse.com>>
>      >> >>
>      >> >> "Let G := Join(G, Service(VAR, G, Transform(P), SilentOp))"
>      >> >>      I don't think this works, as the evaluation semantics
>     should try to evaluate the Service() pattern without access to the
>     results of evaluating the G pattern (which are needed to bind VAR)
>      >> > I do not completely understand what you mean.
>      >>
>      >> What I mean here is that to execute the Service() part of this
>     expression in a bottom-up fashion, it must be able to evaluate
>     without data from G. The bottom-up semantics are defined by Query
>     1.1 section 18.5 ("Evaluation of Join(P1, P2)") and by Federation
>     1.1 section 3.1 ("Definition: Evaluation of a Service Pattern"). In
>     this case, the left-hand-side of the join (P1) is the pattern before
>     the SERVICE block (P2). I don't think you can evaluate Service(VAR,
>     G, Transform(P), SilentOp), because you can't invoke a service
>     operation on a variable. You need to substitute VAR for an actual
>     URL, but the URLs that need to be substituted are produced in an
>     entirely separate evaluation (eval(D(G), P1)).
>      >>
>      > thanks, now I see the problem. One solution could be to add a
>     restriction in which P1 must have been evaluated before P2, being P2
>     a SERVICE pattern and if only P1 contains VAR. This would make
>     explicit that VAR must exist before that join happens. The query
>     could be represented as a tree in which each leave is a pattern and
>     the nodes the operators.
>
>     I don't think such a restriction would work with the current
>     evaluation semantics. To do that, you'll need to not use Join() and
>     define your own join operation that isn't bottom-up.
>
>      >> I think this is a big problem, and it relates to another of my
>     comments:
>      >>
>      >> >> "foreach i in Ω(?var->i)"
>      >> >>      Where does Ω come from in this definition? I think it's
>     meant to refer to results from a join that is outside the scope of
>     this operation.
>      >> > yes, I will make that explicit.
>      >>
>      >> I don't think making it explicit will help, because as currently
>     defined it simply can't work with the existing Join() evaluation
>     semantics. Have I misunderstood something?
>      >>
>      > no, you are right. If we define Ω as a set of solutions that were
>     previously generated that would work, right?
>
>     Only if you're not using Join().
>
>
>      >> >> "if IRI is a service URL"
>      >> >> "if IRI is a SPARQL service"
>      >> >>      How do I know if it's a SPARQL service URL or just some
>     other URL?
>      >> > You can't, users are reponsible of knowing what they query in
>     the same way users should know what data they want to query in a SELECT
>      >>
>      >> Users being responsible for knowing that is irrelevant if the
>     spec is using language like "if IRI is a service URL" and "if IRI is
>     a SPARQL service". Where is the spec language defining what happens
>     if the IRI *isn't* a SPARQL service?
>      > you are right, I will add a note in the defn of what happens if
>     the IRI isn't a SPARQL service. It will make the query to fail
>     unless SILENT is present. Is that ok?
>
>     I would think the best action would be to simply drop the wording
>     about "is a service URL". The translation/evaluation should proceed
>     only on the distinction between IRI and VAR, not on what *type* of
>     IRI it is. Let failures during the service invocation handle the
>     cases where the IRI isn't actually a "service URL".
>
>
>      >> >> "UNBOUND is not a possible value for ?Xi in BindingValues"
>      >> >>      I don't know what "not a possible value" means. "?Xi is
>     not unbound in BindingValues"?
>      >> > it is related to the issue you noticed in the service04.arq
>     test. I will fix that.
>      >>
>      >> I'm not sure this is connected to the service04 issue. My
>     concern was with the use of "possible" in the description. I would
>     think UNBOUND is always a "possible" value, it just might not
>     actually be present in "BindingValues". This might just be me being
>     pedantic, but I'd prefer a different working that made more explicit
>     that the condition here is that UNBOUND can't appear in the
>     BindingValues clause for the ?Xi variable.
>      > If you think that a rewording is necessary, it is ok for me. I'm
>     not a native English speaker so any suggestion/correction is very
>     welcomed.
>
>     Sure. How about:
>
>     "* P = P1 BINDINGS ?X1 ... ?Xn {BindingValues } and ?X is either
>     strongly bound within P1 or ?X = ?Xi and UNBOUND is not a value
>     bound to ?Xi in BindingValues."
>
>     ?
>
>      >> Also, I think there are a lot more cases than described where
>     it's simply not possible to tell if the variable is bound at the
>     (syntactic) point in the query where the SERVICE is used. The
>     combination of BIND, RAND, IF, EXISTS, select expressions, extension
>     functions, etc. make it impossible to know if a variable is going to
>     be bound ahead of time, and these cases aren't mentioned. The
>     definition of "strongly bound" seems intentionally conservative, so
>     maybe these are all cases meant to be an error. If that's the case,
>     I think this needs to be pointed out explicitly.
>      > I think it is possible if a variable may be bound or not
>     syntactically, but I have not worked out all the cases you are
>     pointing. The idea is that a variable is bounded if it the pattern
>     that contains that variable can be executed beforehand. You are
>     right that many cases and each of them has to be studied in detail.
>     The boundedness condition has to be checked in detail. The
>     boundedness condition could go as a note until each case is studied,
>     which could be done after LC, do you agree? if I'm mistaken we can
>     propose a different thing to make sure that a variable is bounded at
>     execution time.
>
>     I'm worried that the boundedness condition being "checked in detail"
>     might go well beyond our current timeline.
>
>     The two definitions (strongly boundedness and service safeness) are
>     defined in section 3.1 which is referenced as the conformance
>     criteria, but I still don't know how these definitions related to
>     conformance.
>
>      >> The discussion of "service-safeness" and "boundedness" (which
>     elsewhere is actually 'strong boundedness') in section 2.4 seems
>     rather disconnected from the rest of the text. These two things are
>     defined at the end of section 3.1, but there isn't any text in 3.1
>     that refers to them. After these definitions are included, only in
>     section 4.1 is "service safeness" mentioned, and then only weakly
>     ("The Service Safeness definition ***suggests the use*** of a
>     specific order in the execution", emphasis mine). MUST a conforming
>     implementation execute patterns in an order suggested by the
>     "service safeness" definition? I think this either needs much
>     stronger definitions and normative text, or we should consider
>     dropping the variable-endpoint form of federation entirely (punting
>     it until next time, I suppose).
>      > I do not think that dropping the variable-endpoint is a good
>     idea, I find it very handy, for instance to gather data from a set
>     of endpoints and make a copy in a specific server. I can regroup
>     everything in a SERVICE VAR section, using the safeness definition,
>     and making explicitly that are cases that might have not considered,
>     warning that an order in the execution is needed. After LC add
>     everything needed.
>
>     I'm not arguing that it's "very handy," but I think it's enough
>     underspecified that I'm worried sorting out all the issues could
>     impact our schedule.
>
>     I'm not sure what you mean by "After LC add everything needed," but
>     I wouldn't want to publish the spec in its current form while
>     relying on the publication of some future Note to sort out problems.
>     I'd be surprised if that were acceptable for a Rec.
>
>      >> Section 3.1 "Definition: Evaluation of a Service Pattern" says
>     "Execution failures cause the query to fail." Section 4.1 says "If a
>     solution does not bind the variable, or binds it to something which
>     cannot resolve to a SPARQL service, that solution is eliminated."
>     How does "execution failure" differ from not being able to "resolve
>     [a URL] to a SPARQL service"? If you get back a HTTP 400 or 500 (or,
>     I guess, any other response code without a valid protocol response
>     body), how is an implementation supposed to determine if this is an
>     "execution failure" or a situation where the endpoint URL being used
>     failed to "resolve to a SPARQL service"?
>      >
>      > I will add the HTTP response codes accordingly.
>
>     My point here was that there's no way to distinguish the two cases,
>     but for one you're saying to drop the result, and the other you're
>     saying to abort the query. I think both cases (being
>     indistinguishable) need to result in the same action. Moreover, I
>     think "execution failure" and "resolve to a SPARQL service" need to
>     be defined properly.
>
>
>     thanks,
>     .greg
>
>
Received on Wednesday, 20 July 2011 02:58:33 UTC