Fwd: fed review

2011/7/18 Gregory Williams <greg@evilfunhouse.com>

>  I'm very concerned by the issues I've highlighted regarding the evaluation
> semantics and the conditions surrounding the variable-endpoint form of
> federation. Details below, along with some followup to the other issues.
>
> You already note that this example "requires the first SERVICE to be a
> federated query processor in order to be executed". Would you consider
> adding a reference to the service description document as a way to determine
> if this condition is in fact true? The SD document defines a feature for
> this exact situation:
> http://www.w3.org/TR/sparql11-service-description/#sd-basicfederatedquery
>

sure, I will add it, no problem.

>
>
> >> Section 3.1
>
> I just noticed a typo in 3.1: s/addtion/addition/
>

ok, I will fix it.

>
> >>
> >> "Let G := Join(G, Service(VAR, G, Transform(P), SilentOp))"
> >>      I don't think this works, as the evaluation semantics should try to
> evaluate the Service() pattern without access to the results of evaluating
> the G pattern (which are needed to bind VAR)
> > I do not completely understand what you mean.
>
> What I mean here is that to execute the Service() part of this expression
> in a bottom-up fashion, it must be able to evaluate without data from G. The
> bottom-up semantics are defined by Query 1.1 section 18.5 ("Evaluation of
> Join(P1, P2)") and by Federation 1.1 section 3.1 ("Definition: Evaluation of
> a Service Pattern"). In this case, the left-hand-side of the join (P1) is
> the pattern before the SERVICE block (P2). I don't think you can evaluate
> Service(VAR, G, Transform(P), SilentOp), because you can't invoke a service
> operation on a variable. You need to substitute VAR for an actual URL, but
> the URLs that need to be substituted are produced in an entirely separate
> evaluation (eval(D(G), P1)).
>

thanks, now I see the problem. One solution could be to add a restriction in
which P1 must have been evaluated before P2, being P2 a SERVICE pattern and
if only P1 contains VAR. This would make explicit that VAR must exist before
that join happens. The query could be represented as a tree in which each
leave is a pattern and the nodes the operators.

>
> I think this is a big problem, and it relates to another of my comments:
>
> >> "foreach i in Ω(?var->i)"
> >>      Where does Ω come from in this definition? I think it's meant to
> refer to results from a join that is outside the scope of this operation.
> > yes, I will make that explicit.
>
> I don't think making it explicit will help, because as currently defined it
> simply can't work with the existing Join() evaluation semantics. Have I
> misunderstood something?
>
no, you are right. If we define Ω as a set of solutions that were previously
generated that would work, right?

>
> >> "if IRI is a service URL"
> >> "if IRI is a SPARQL service"
> >>      How do I know if it's a SPARQL service URL or just some other URL?
> > You can't, users are reponsible of knowing what they query in the same
> way users should know what data they want to query in a SELECT
>
> Users being responsible for knowing that is irrelevant if the spec is using
> language like "if IRI is a service URL" and "if IRI is a SPARQL service".
> Where is the spec language defining what happens if the IRI *isn't* a SPARQL
> service?

you are right, I will add a note in the defn of what happens if the IRI
isn't a SPARQL service. It will make the query to fail unless SILENT is
present. Is that ok?

>
>
> >> "eval(D(G), Service(IRI,G,P,SilentOp)) = Invocation( IRI, vars, P,
> Bindings(G, vars), SilentOp )"
> >>      Where does 'vars' come from here?
> > it comes from the definition header:
> > Definition: Evaluation of a Service Pattern
> >
> >    if IRI is a service URL and vars is the set of variables in-scope in
> pattern P, Ω0 a solution set with one empty solution.
>
> OK. I missed that.
>
> >> "with no default-graph-uri or named-graph-uri"
> >>      Why aren't these allowable in the service IRI?
> > Because the idea of SERVICE is to query remote SPARQL endpoints, not
> named graphs.
>
> Yes, I understand that. What I'm asking is why we shouldn't allow users to
> specify named and default graphs in the service URL for use during the
> remote service invocation. Something like:
>
> SERVICE <
> http://example.org/sparql?default-graph-uri=http%3A%2F%2Fwww.other.example%2Fbooks>
> {
>  ?s ?p ?o
> }
>
> for me that's ok to add graphs then (I did not think much of it when I
joined the group), I can add a note to it, meaning that it is possible to
query graphs different than the default one. Any other opinion about it?.

>
> >> "Definition: Strongly bound variable"
> >>      I think there's missing clauses for BIND and property paths (e.g.
> ?s :p{0} ?o should result in both ?s and ?o being strongly bound).
> >>
> >> "P = SELECT E1 ... En WHERE { P1 } and ?X is strongly bound in P1 and ?X
> = Ei"
> >>      Should include the required 'AS ?var' syntax for expressions that
> aren't variables
> >>      Should include the option for the select expression to strongly
> bind the variable: (if "(Ej AS ?X)" is one of the select expressions)
> >>      This ignores the possibility of ?X being strongly bound in GROUP BY
> or HAVING clauses
> >>
> >> "P = P1 GROUP BY E1 ... En such that either there is an Ei of the form
> ?X or ?X is strongly bound in P1"
> >>      Needs to also consider grouping expressions that are aliased
> ("GROUP BY (Ej AS ?X)")
> >>
> >> "P = P1 HAVING ( E1 ) and ?X is strongly bound within P1"
> >>      This ignores the possibility of ?X being strongly bound in a GROUP
> BY clause.
> >>
> > yes, you are right, I will add all the suggestions to the boundedness
> definition.
>
> This seemed like it was missing a lot of conditions. Are we sure we've got
> them all now? Can somebody with fresh eyes look over this, please?
>
yes, please

>
> >> "UNBOUND is not a possible value for ?Xi in BindingValues"
> >>      I don't know what "not a possible value" means. "?Xi is not unbound
> in BindingValues"?
> > it is related to the issue you noticed in the service04.arq test. I will
> fix that.
>
> I'm not sure this is connected to the service04 issue. My concern was with
> the use of "possible" in the description. I would think UNBOUND is always a
> "possible" value, it just might not actually be present in "BindingValues".
> This might just be me being pedantic, but I'd prefer a different working
> that made more explicit that the condition here is that UNBOUND can't appear
> in the BindingValues clause for the ?Xi variable.
>
If you think that a rewording is necessary, it is ok for me. I'm not a
native English speaker so any suggestion/correction is very welcomed.

>
> >> Section 4.1
> >>
> >> "It is considered a syntax error to use a variable as the first argument
> of a ServiceGraphPattern if that variable is not bound (at least optionally)
> before the execution of the SERVICE pattern"
> >>      How is a query writer supposed to know in what order evaluation
> takes place? Asserting a syntax error based on evaluation order seems overly
> confusing.
> > the boundedness condition allows to check when a variable is going to be
> bounded or not, it implicitly determines the execution order, so it would be
> possible to throw a syntax error. Maybe a syntax error is not the best error
> it could be there, any idea?
>
> My main concern here is the reference to the actual "execution". The
> wording here would seem to imply that implementations cannot re-order joins,
> for example.
>
I did not want to mean this at all. This is related to the previous comment
you had about where Ω came from. I think that defining that a pattern has to
be evaluated beforehand to the SERVICE VAR evaluation would work, allowing
reordering.

>
> Also, I think there are a lot more cases than described where it's simply
> not possible to tell if the variable is bound at the (syntactic) point in
> the query where the SERVICE is used. The combination of BIND, RAND, IF,
> EXISTS, select expressions, extension functions, etc. make it impossible to
> know if a variable is going to be bound ahead of time, and these cases
> aren't mentioned. The definition of "strongly bound" seems intentionally
> conservative, so maybe these are all cases meant to be an error. If that's
> the case, I think this needs to be pointed out explicitly.
>
I think it is possible if a variable may be bound or not syntactically, but
I have not worked out all the cases you are pointing. The idea is that a
variable is bounded if it the pattern that contains that variable can be
executed beforehand. You are right that many cases and each of them has to
be studied in detail. The boundedness condition has to be checked in detail.
The boundedness condition could go as a note until each case is studied,
which could be done after LC, do you agree? if I'm mistaken we can propose a
different thing to make sure that a variable is bounded at execution time.

>
> The discussion of "service-safeness" and "boundedness" (which elsewhere is
> actually 'strong boundedness') in section 2.4 seems rather disconnected from
> the rest of the text. These two things are defined at the end of section
> 3.1, but there isn't any text in 3.1 that refers to them. After these
> definitions are included, only in section 4.1 is "service safeness"
> mentioned, and then only weakly ("The Service Safeness definition
> ***suggests the use*** of a specific order in the execution", emphasis
> mine). MUST a conforming implementation execute patterns in an order
> suggested by the "service safeness" definition? I think this either needs
> much stronger definitions and normative text, or we should consider dropping
> the variable-endpoint form of federation entirely (punting it until next
> time, I suppose).
>
I do not think that dropping the variable-endpoint is a good idea, I find it
very handy, for instance to gather data from a set of endpoints and make a
copy in a specific server. I can regroup everything in a SERVICE VAR
section, using the safeness definition, and making explicitly that are cases
that might have not considered, warning that an order in the execution is
needed. After LC add everything needed.

>
>
>
> Re-reading the text about service-safeness, I notice a few more issues:
>
> "A variable ?X is strongly bound within a graph pattern P if ... P =
> SERVICE t { P1 } then ?X is not strongly bound in P1 (It is not possible to
> guarantee that a variable will be bounded after a SERVICE execution)."
>
> This isn't worded correctly. The whole list is introduced as a set of
> conditions under which ?x is strongly bound, but this list item turns right
> around and says ?x is not strongly bound. Moreover, I'm not sure why this
> wouldn't guarantee that ?X is strongly bound if it is strongly bound in P1.
> Without the use of SILENT, either 1) there are going to be no results, 2)
> there are results where ?X is bound, or 3) the entire query evaluation
> fails. Is that correct?
>
I will check the wording and I will reword accordingly. Let me some time to
look carefully at it.


> "* P is a list group graph patterns P1 ... Pn and ?X is strongly bound
> within some Pi."
>
> I'm not sure where the phrase "list group graph pattern" comes from (are P1
> ... Pn graph patterns contained by a single group graph pattern?).
> Intuitively I understand what's going on here, but I don't think it's
> well-defined. If you're talking about a list of graph patterns, that seems
> like a syntax-level thing, but you also talk about graph patterns like "P1
> FILTER ( E1 )" which I don't think should be understood at the syntax-level,
> because something like "P1 FILTER(E1) P2" (where P1 and P2 are triple
> patterns) is really like "P1 . P2 . FILTER(E1)". I don't think the actual
> evaluation will be hurt by this (since the "list group graph pattern"
> condition will end up getting the right answer), but the intermediate steps
> end up being confusing and/or wrong with respect to what variables are
> actually strongly bound.
>
I will check this in detail.

>
>
>
> Section 3.1 "Definition: Evaluation of a Service Pattern" says "Execution
> failures cause the query to fail." Section 4.1 says "If a solution does not
> bind the variable, or binds it to something which cannot resolve to a SPARQL
> service, that solution is eliminated." How does "execution failure" differ
> from not being able to "resolve [a URL] to a SPARQL service"? If you get
> back a HTTP 400 or 500 (or, I guess, any other response code without a valid
> protocol response body), how is an implementation supposed to determine if
> this is an "execution failure" or a situation where the endpoint URL being
> used failed to "resolve to a SPARQL service"?
>
> I will add the HTTP response codes accordingly.

>
>
>
>
> thanks,
>
again, thanks to you for looking into this in more detail.

Carlos


> .greg
>
>
>

Received on Tuesday, 19 July 2011 17:07:43 UTC