Re: fed review from Gregory Williams on 2011-07-19 (public-rdf-dawg@w3.org from July to September 2011)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Tue, 19 Jul 2011 13:25:23 -0400
To: Carlos Buil Aranda <cbuil@fi.upm.es>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <01A74175-498B-4257-81DF-A9F761CFE65B@evilfunhouse.com>
On Jul 19, 2011, at 1:06 PM, Carlos Buil Aranda wrote:

> 2011/7/18 Gregory Williams <greg@evilfunhouse.com>
>> >>
>> >> "Let G := Join(G, Service(VAR, G, Transform(P), SilentOp))"
>> >>      I don't think this works, as the evaluation semantics should try to evaluate the Service() pattern without access to the results of evaluating the G pattern (which are needed to bind VAR)
>> > I do not completely understand what you mean.
>> 
>> What I mean here is that to execute the Service() part of this expression in a bottom-up fashion, it must be able to evaluate without data from G. The bottom-up semantics are defined by Query 1.1 section 18.5 ("Evaluation of Join(P1, P2)") and by Federation 1.1 section 3.1 ("Definition: Evaluation of a Service Pattern"). In this case, the left-hand-side of the join (P1) is the pattern before the SERVICE block (P2). I don't think you can evaluate Service(VAR, G, Transform(P), SilentOp), because you can't invoke a service operation on a variable. You need to substitute VAR for an actual URL, but the URLs that need to be substituted are produced in an entirely separate evaluation (eval(D(G), P1)).
>> 
> thanks, now I see the problem. One solution could be to add a restriction in which P1 must have been evaluated before P2, being P2 a SERVICE pattern and if only P1 contains VAR. This would make explicit that VAR must exist before that join happens. The query could be represented as a tree in which each leave is a pattern and the nodes the operators.

I don't think such a restriction would work with the current evaluation semantics. To do that, you'll need to not use Join() and define your own join operation that isn't bottom-up.

>> I think this is a big problem, and it relates to another of my comments:
>> 
>> >> "foreach i in Ω(?var->i)"
>> >>      Where does Ω come from in this definition? I think it's meant to refer to results from a join that is outside the scope of this operation.
>> > yes, I will make that explicit.
>> 
>> I don't think making it explicit will help, because as currently defined it simply can't work with the existing Join() evaluation semantics. Have I misunderstood something?
>> 
> no, you are right. If we define Ω as a set of solutions that were previously generated that would work, right?

Only if you're not using Join().


>> >> "if IRI is a service URL"
>> >> "if IRI is a SPARQL service"
>> >>      How do I know if it's a SPARQL service URL or just some other URL?
>> > You can't, users are reponsible of knowing what they query in the same way users should know what data they want to query in a SELECT
>> 
>> Users being responsible for knowing that is irrelevant if the spec is using language like "if IRI is a service URL" and "if IRI is a SPARQL service". Where is the spec language defining what happens if the IRI *isn't* a SPARQL service?
> you are right, I will add a note in the defn of what happens if the IRI isn't a SPARQL service. It will make the query to fail unless SILENT is present. Is that ok?

I would think the best action would be to simply drop the wording about "is a service URL". The translation/evaluation should proceed only on the distinction between IRI and VAR, not on what *type* of IRI it is. Let failures during the service invocation handle the cases where the IRI isn't actually a "service URL".


>> >> "UNBOUND is not a possible value for ?Xi in BindingValues"
>> >>      I don't know what "not a possible value" means. "?Xi is not unbound in BindingValues"?
>> > it is related to the issue you noticed in the service04.arq test. I will fix that.
>> 
>> I'm not sure this is connected to the service04 issue. My concern was with the use of "possible" in the description. I would think UNBOUND is always a "possible" value, it just might not actually be present in "BindingValues". This might just be me being pedantic, but I'd prefer a different working that made more explicit that the condition here is that UNBOUND can't appear in the BindingValues clause for the ?Xi variable.
> If you think that a rewording is necessary, it is ok for me. I'm not a native English speaker so any suggestion/correction is very welcomed. 

Sure. How about:

"* P = P1 BINDINGS ?X1 ... ?Xn {BindingValues } and ?X is either strongly bound within P1 or ?X = ?Xi and UNBOUND is not a value bound to ?Xi in BindingValues."

?

>> Also, I think there are a lot more cases than described where it's simply not possible to tell if the variable is bound at the (syntactic) point in the query where the SERVICE is used. The combination of BIND, RAND, IF, EXISTS, select expressions, extension functions, etc. make it impossible to know if a variable is going to be bound ahead of time, and these cases aren't mentioned. The definition of "strongly bound" seems intentionally conservative, so maybe these are all cases meant to be an error. If that's the case, I think this needs to be pointed out explicitly.
> I think it is possible if a variable may be bound or not syntactically, but I have not worked out all the cases you are pointing. The idea is that a variable is bounded if it the pattern that contains that variable can be executed beforehand. You are right that many cases and each of them has to be studied in detail. The boundedness condition has to be checked in detail. The boundedness condition could go as a note until each case is studied, which could be done after LC, do you agree? if I'm mistaken we can propose a different thing to make sure that a variable is bounded at execution time.

I'm worried that the boundedness condition being "checked in detail" might go well beyond our current timeline.

The two definitions (strongly boundedness and service safeness) are defined in section 3.1 which is referenced as the conformance criteria, but I still don't know how these definitions related to conformance.

>> The discussion of "service-safeness" and "boundedness" (which elsewhere is actually 'strong boundedness') in section 2.4 seems rather disconnected from the rest of the text. These two things are defined at the end of section 3.1, but there isn't any text in 3.1 that refers to them. After these definitions are included, only in section 4.1 is "service safeness" mentioned, and then only weakly ("The Service Safeness definition ***suggests the use*** of a specific order in the execution", emphasis mine). MUST a conforming implementation execute patterns in an order suggested by the "service safeness" definition? I think this either needs much stronger definitions and normative text, or we should consider dropping the variable-endpoint form of federation entirely (punting it until next time, I suppose).
> I do not think that dropping the variable-endpoint is a good idea, I find it very handy, for instance to gather data from a set of endpoints and make a copy in a specific server. I can regroup everything in a SERVICE VAR section, using the safeness definition, and making explicitly that are cases that might have not considered, warning that an order in the execution is needed. After LC add everything needed.

I'm not arguing that it's "very handy," but I think it's enough underspecified that I'm worried sorting out all the issues could impact our schedule.

I'm not sure what you mean by "After LC add everything needed," but I wouldn't want to publish the spec in its current form while relying on the publication of some future Note to sort out problems. I'd be surprised if that were acceptable for a Rec.

>> Section 3.1 "Definition: Evaluation of a Service Pattern" says "Execution failures cause the query to fail." Section 4.1 says "If a solution does not bind the variable, or binds it to something which cannot resolve to a SPARQL service, that solution is eliminated." How does "execution failure" differ from not being able to "resolve [a URL] to a SPARQL service"? If you get back a HTTP 400 or 500 (or, I guess, any other response code without a valid protocol response body), how is an implementation supposed to determine if this is an "execution failure" or a situation where the endpoint URL being used failed to "resolve to a SPARQL service"?
> 
> I will add the HTTP response codes accordingly. 

My point here was that there's no way to distinguish the two cases, but for one you're saying to drop the result, and the other you're saying to abort the query. I think both cases (being indistinguishable) need to result in the same action. Moreover, I think "execution failure" and "resolve to a SPARQL service" need to be defined properly.


thanks,
.greg
Received on Tuesday, 19 July 2011 18:56:04 UTC