Re: fed review from Carlos Buil Aranda on 2011-07-19 (public-rdf-dawg@w3.org from July to September 2011)

From: Carlos Buil Aranda <cbuil@fi.upm.es>
Date: Tue, 19 Jul 2011 17:02:16 -0400
To: Gregory Williams <greg@evilfunhouse.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <CABdcz9FD7tswh+KNjT_2LzZ5rgtkgjp2H8FZ3Eajs9CbKiXtrQ@mail.gmail.com>
just summarizing a bit, the main concern is the use of variables in SERVICE.
The problem relies in the specification of the semantics and the boundedness
condition which suggests an order for executing the query. The problem is
first, the way of defining the execution of SERVICE VAR, which using the
current join semantics is wrong because it is not possible to evaluate Join(G,
Service(VAR, G, Transform(P), SilentOp)) because VAR still hasn't a value.
Following this, the boundedness condition, which has to be completed for all
the SPARQL 1.1 operators. Am I right?

So, the possible solutions are:
 - For the SERVICE VAR semantics
   - add a new operation that could allow the evaluation (operation which
wouldn't be bottom-up)
   - define its whole semantics in a new way
 - For the boundedness restriction
  - specify all cases: it may take a bit long
  - remove it? I do not think this is a good idea, how do we specify then
that a variable is bound (which is needed for the evaluation semantics of
SERVICE VAR)?

something missing? any other option?

Carlos

PS I notice that there are still pending issues in the previous email to be
addressed, this is just a summary of what I think is the most important
topic of the email

2011/7/19 Gregory Williams <greg@evilfunhouse.com>

>
> On Jul 19, 2011, at 1:06 PM, Carlos Buil Aranda wrote:
>
> > 2011/7/18 Gregory Williams <greg@evilfunhouse.com>
> >> >>
> >> >> "Let G := Join(G, Service(VAR, G, Transform(P), SilentOp))"
> >> >>      I don't think this works, as the evaluation semantics should try
> to evaluate the Service() pattern without access to the results of
> evaluating the G pattern (which are needed to bind VAR)
> >> > I do not completely understand what you mean.
> >>
> >> What I mean here is that to execute the Service() part of this
> expression in a bottom-up fashion, it must be able to evaluate without data
> from G. The bottom-up semantics are defined by Query 1.1 section 18.5
> ("Evaluation of Join(P1, P2)") and by Federation 1.1 section 3.1
> ("Definition: Evaluation of a Service Pattern"). In this case, the
> left-hand-side of the join (P1) is the pattern before the SERVICE block
> (P2). I don't think you can evaluate Service(VAR, G, Transform(P),
> SilentOp), because you can't invoke a service operation on a variable. You
> need to substitute VAR for an actual URL, but the URLs that need to be
> substituted are produced in an entirely separate evaluation (eval(D(G),
> P1)).
> >>
> > thanks, now I see the problem. One solution could be to add a restriction
> in which P1 must have been evaluated before P2, being P2 a SERVICE pattern
> and if only P1 contains VAR. This would make explicit that VAR must exist
> before that join happens. The query could be represented as a tree in which
> each leave is a pattern and the nodes the operators.
>
> I don't think such a restriction would work with the current evaluation
> semantics. To do that, you'll need to not use Join() and define your own
> join operation that isn't bottom-up.
>
> >> I think this is a big problem, and it relates to another of my comments:
> >>
> >> >> "foreach i in Ω(?var->i)"
> >> >>      Where does Ω come from in this definition? I think it's meant to
> refer to results from a join that is outside the scope of this operation.
> >> > yes, I will make that explicit.
> >>
> >> I don't think making it explicit will help, because as currently defined
> it simply can't work with the existing Join() evaluation semantics. Have I
> misunderstood something?
> >>
> > no, you are right. If we define Ω as a set of solutions that were
> previously generated that would work, right?
>
> Only if you're not using Join().
>
>
> >> >> "if IRI is a service URL"
> >> >> "if IRI is a SPARQL service"
> >> >>      How do I know if it's a SPARQL service URL or just some other
> URL?
> >> > You can't, users are reponsible of knowing what they query in the same
> way users should know what data they want to query in a SELECT
> >>
> >> Users being responsible for knowing that is irrelevant if the spec is
> using language like "if IRI is a service URL" and "if IRI is a SPARQL
> service". Where is the spec language defining what happens if the IRI
> *isn't* a SPARQL service?
> > you are right, I will add a note in the defn of what happens if the IRI
> isn't a SPARQL service. It will make the query to fail unless SILENT is
> present. Is that ok?
>
> I would think the best action would be to simply drop the wording about "is
> a service URL". The translation/evaluation should proceed only on the
> distinction between IRI and VAR, not on what *type* of IRI it is. Let
> failures during the service invocation handle the cases where the IRI isn't
> actually a "service URL".
>
>
> >> >> "UNBOUND is not a possible value for ?Xi in BindingValues"
> >> >>      I don't know what "not a possible value" means. "?Xi is not
> unbound in BindingValues"?
> >> > it is related to the issue you noticed in the service04.arq test. I
> will fix that.
> >>
> >> I'm not sure this is connected to the service04 issue. My concern was
> with the use of "possible" in the description. I would think UNBOUND is
> always a "possible" value, it just might not actually be present in
> "BindingValues". This might just be me being pedantic, but I'd prefer a
> different working that made more explicit that the condition here is that
> UNBOUND can't appear in the BindingValues clause for the ?Xi variable.
> > If you think that a rewording is necessary, it is ok for me. I'm not a
> native English speaker so any suggestion/correction is very welcomed.
>
> Sure. How about:
>
> "* P = P1 BINDINGS ?X1 ... ?Xn {BindingValues } and ?X is either strongly
> bound within P1 or ?X = ?Xi and UNBOUND is not a value bound to ?Xi in
> BindingValues."
>
> ?
>
> >> Also, I think there are a lot more cases than described where it's
> simply not possible to tell if the variable is bound at the (syntactic)
> point in the query where the SERVICE is used. The combination of BIND, RAND,
> IF, EXISTS, select expressions, extension functions, etc. make it impossible
> to know if a variable is going to be bound ahead of time, and these cases
> aren't mentioned. The definition of "strongly bound" seems intentionally
> conservative, so maybe these are all cases meant to be an error. If that's
> the case, I think this needs to be pointed out explicitly.
> > I think it is possible if a variable may be bound or not syntactically,
> but I have not worked out all the cases you are pointing. The idea is that a
> variable is bounded if it the pattern that contains that variable can be
> executed beforehand. You are right that many cases and each of them has to
> be studied in detail. The boundedness condition has to be checked in detail.
> The boundedness condition could go as a note until each case is studied,
> which could be done after LC, do you agree? if I'm mistaken we can propose a
> different thing to make sure that a variable is bounded at execution time.
>
> I'm worried that the boundedness condition being "checked in detail" might
> go well beyond our current timeline.
>
> The two definitions (strongly boundedness and service safeness) are defined
> in section 3.1 which is referenced as the conformance criteria, but I still
> don't know how these definitions related to conformance.
>
> >> The discussion of "service-safeness" and "boundedness" (which elsewhere
> is actually 'strong boundedness') in section 2.4 seems rather disconnected
> from the rest of the text. These two things are defined at the end of
> section 3.1, but there isn't any text in 3.1 that refers to them. After
> these definitions are included, only in section 4.1 is "service safeness"
> mentioned, and then only weakly ("The Service Safeness definition
> ***suggests the use*** of a specific order in the execution", emphasis
> mine). MUST a conforming implementation execute patterns in an order
> suggested by the "service safeness" definition? I think this either needs
> much stronger definitions and normative text, or we should consider dropping
> the variable-endpoint form of federation entirely (punting it until next
> time, I suppose).
> > I do not think that dropping the variable-endpoint is a good idea, I find
> it very handy, for instance to gather data from a set of endpoints and make
> a copy in a specific server. I can regroup everything in a SERVICE VAR
> section, using the safeness definition, and making explicitly that are cases
> that might have not considered, warning that an order in the execution is
> needed. After LC add everything needed.
>
> I'm not arguing that it's "very handy," but I think it's enough
> underspecified that I'm worried sorting out all the issues could impact our
> schedule.
>
> I'm not sure what you mean by "After LC add everything needed," but I
> wouldn't want to publish the spec in its current form while relying on the
> publication of some future Note to sort out problems. I'd be surprised if
> that were acceptable for a Rec.
>
> >> Section 3.1 "Definition: Evaluation of a Service Pattern" says
> "Execution failures cause the query to fail." Section 4.1 says "If a
> solution does not bind the variable, or binds it to something which cannot
> resolve to a SPARQL service, that solution is eliminated." How does
> "execution failure" differ from not being able to "resolve [a URL] to a
> SPARQL service"? If you get back a HTTP 400 or 500 (or, I guess, any other
> response code without a valid protocol response body), how is an
> implementation supposed to determine if this is an "execution failure" or a
> situation where the endpoint URL being used failed to "resolve to a SPARQL
> service"?
> >
> > I will add the HTTP response codes accordingly.
>
> My point here was that there's no way to distinguish the two cases, but for
> one you're saying to drop the result, and the other you're saying to abort
> the query. I think both cases (being indistinguishable) need to result in
> the same action. Moreover, I think "execution failure" and "resolve to a
> SPARQL service" need to be defined properly.
>
>
> thanks,
> .greg
>
>
Received on Tuesday, 19 July 2011 21:10:05 UTC