Re: Federation Query (was: Re: SPARQL WG Agenda - Tuesda 2011-08-09) from Carlos Buil Aranda on 2011-08-19 (public-rdf-dawg@w3.org from July to September 2011)

From: Carlos Buil Aranda <cbuil@fi.upm.es>
Date: Fri, 19 Aug 2011 08:42:11 -0400
To: Gregory Williams <greg@evilfunhouse.com>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <CABdcz9F8cFBkrTPyeBehAdgNGT3OVuvTFKi7rgEOwjDZAqt6vQ@mail.gmail.com>
1 Introduction

>
> "direct a portion of a query to a particular SPARQL endpoint to be executed
> against local graphs" -- "local" to whom?
>
Removed local graphs, just pointing to a remote SPARQL endpoint

>
> 1.1.1 Namespaces
>
> "This document uses the same conventions as and terminology from the SPARQL
> 1.1 Query document" -- terminology is discussed in 1.1.3, so its mention
> here seems strange.
>
removed the reference to terminoligy

>
> 1.1.3 Terminology
>
> "IRI" just links to the query document. I assume it should link to
> something more specific?
>
fixed

>
> "There are three variables: x, y and z (shown as column headers)..." -- is
> this talking about the example already described in 1.1.2?
>
removed example, it should be in the results section

>
> 2 SPARQL 1.1 Federated Query Extension
>
> "invoke a portion of a SPARQL query against a remote SPARQL protocol
> service" -- should this link to the protocol document?
>
removed the reference to the protocol, I think it is better in this way, and
it was also in Alex's comments

>
> 2.1 Simple query to a remote SPARQL endpoint
>
> "... and join the returned data with the data from the local RDF data
> store." -- Should "local RDF data store" instead talk about an RDF Dataset?
> "RDF data store" isn't defined anywhere (although it is used offhandedly
> once in Query).
>
now it says dataset

>
> "Consider a query to find the names of the people we know." -- "we"? I
> think this would be better with concrete names (e.g. 'the people Bob
> knows').
>
fixed, also in Alex's comments

>
> ":people15  foaf:name     "Alice" ." -- I think this would read better if
> :people15 was instead :person15 (likewise for the other instances).
> Obviously this is a minor and subjective issue.
>
by the time being I will keep it as it is

>
> 2.2 SPARQL query with OPTIONAL to two remote SPARQL endpoints
>
> "@prefix : http://example.org/ ." -- missing <> around the IRI in both
> data examples.
>
fixed

>
> 2.3 Service Execution Failure
>
> "...under such circumstances the invoking query containing a SERVICE
> pattern fails as a whole." -- s/invoking/invoked/
>
fixed

>
> "SPARQL 1.1 allows to explicitly allow failed SERVICE requests by the
> keyword SILENT ." -- too many "allows", and I'm not sure "SPARQL 1.1" is
> something that is doing the allowing here. Perhaps "Queries may explicitly
> allow failed SERVICE requests with the use of the SILENT keyword" or
> something similar.
>
reworded

>
> "The SILENT keyword indicates that error encountered while ..." --
> s/error/errors/
>
fixed

>
> "... if an error happens when" -- s/happens when/occurs while/
>
fixed

>
> 2.4 Interplay of SERVICE and BINDINGS (Informative)
>
> " @prefix : http://example.org/ ." -- missing <> in both data examples.
>
fixed

>
> "... may return a large number of results (if the endpoint
> http://example.org/sparql contains a very large database)" -- this sounds
> awkward to me because the text just defined for me what that endpoint holds,
> and I wouldn't consider 3 results "a large number". Perhaps s/a large number
> of results/more results than necessary/ or some other wording that gets the
> point across without asserting that the toy example given actually
> demonstrates "too large" or "too many" results being returned.
>
I reworded a bit

>
> "... or not return all of them: many existing SPARQL endpoints ..." -- I
> think this should be a separate sentence.
>
fixed

>
> "... may miss the ones matching subjects ?s from the default graph." -- I
> know what's being said here, but "the default graph" seems too imprecise as
> there are two default graphs being dealt with in the example. Similarly in
> the next sentence: "first the bindings from the default graph are
> evaluated". Perhaps discuss the "local default graph"?
>
removed local default graph

>
> "... a query planner for federated queries, may decide ..." -- s/,//
>
fixed

>
> "SELECT ?s ?o { ?s a foaf:Person } " -- why is ?o projected? The query
> results table shown for it does not have an ?o column, and it is
> unnecessary.
>
?o removed

>
> "PREFIX foaf:   <http://xmlns.com/foaf/0.1/> SELECT * {?s foaf:knows ?o }
> BINDINGS ?s { (:a) (:b) }" -- missing a PREFIX declaration for :. Also, the
> results for this should be shown immediately, emphasizing the effect of the
> BINDINGS clause, before showing the entire query again.
>
I do not agree, in the way it is I think is more consistent with the
document and I think the results can be foreseen before getting to the end
of the section

>
> 3.1 Translation to the SPARQL Algebra
>
> " If E is of the form SERVICE SILENT IRI {P}" -- this seems like it only
> covers the SILENT case. Is there a way to emphasize that both SILENT and
> non-SILENT forms are handled similarly? The connection between the presence
> of SILENT and the SilentOp argument needs to be defined.
>
I'm not sure how to do that, any proposal? Maybe just removing SILENT? I do
not know

>
> "Let G := Join(G, Service(IRI, G, Transform(P), SilentOp))" -- this seems
> REALLY strange to me. On the right hand side of the assignment, G is being
> used as both the LHS of the Join, and an argument to Service(). I take it
> this is meant to aid in the construction of service invocations using
> BINDINGS clauses? If so, I don't think that belongs in the evaluation
> semantics.
>
you are right, removed the extra G in the RHS

>
> 3.2 SPARQL 1.1 Simple Federation Extension Algebra
>
> "if IRI" -- what does this mean?
>
that IRI is an IRI, any suggestion on how to describe this? I just added a
sentence before to specify that, not sure if it is enough

>
> "eval(D(G), Service(IRI,G,P,SilentOp)) = Invocation( IRI, vars, P,
> Bindings(G, vars), SilentOp )" -- the inclusion of "Bindings(G, vars)" isn't
> explained anywhere, and it's not clear to me exactly how its meant to be
> evaluated. As stated above, I don't think it belongs in the evaluation
> semantics.
>
remobed the extra G

>
> "* with no default-graph-uri or named-graph-uri" -- I'm still asking if we
> should allow their inclusion in the endpoint IRI.
>
I think this is written in this way because SERVICE it is meant for using
only the endpoint URL. Can't be used a subquery to access a specific graph
in the remote endpoint? if this is possible I'd leave it as it is

>
> 3.2.1 SERVICE Examples
>
> The examples in this section aren't using the 4-argument form of Service()
> that is defined in section 3.2. I believe they are missing the LHS (G)
> argument, which as I've said is how I believe the definition should be.
> Either way, though, this needs alignment with 3.2.
>
I aligned it, G removed

>
> In my browser the text sizes of the two examples in this section are
> different. Is that true universally? If so, can it be fixed?
>
fixed

>
> 4 SERVICE and GRAPH Variables (Informative)
>
> "A variable used in place of a service IRI or a GRAPH pattern..." -- I'm
> not sure how to interpret the "GRAPH pattern" part of this statement.
>
I removed GRAPH

>
> "... about project endpointsWe assume ..." -- s/endpointsWe/endpoints. We/
>
fixed

>
> I think either this section needs to be dropped (as we aren't really
> defining the semantics) or it needs to be fleshed out. Before diving in to
> an example, the text needs to explain that an implementation could make use
> of the "data about which services contain data about project endpoints" to
> invoke service calls on each of those endpoints. Right now those dots aren't
> connected.
>
I dropped the definition part since this only an informative section, now
there are only examples.

>
> "_:project1  doap:name    "Querying remote RDF Data" ." -- In the data
> example, this has "Querying", but the results example below has "Query".
>
fixed

>
> The use of bnodes in the remote endpoints, while not used in these
> examples, brings up another point. Should this document be mentioning that
> bnodes need to be made unique across service invocations, or is that obvious
> enough from the rest of the specs? I'd worry that without such a note, a
> naive implementation might end up joining _:a from one service with _:a from
> another service, which shouldn't happen.
>
I agree, what about: Notice that bnodes in each remote SPARQL endpoint are
unique to each of them. Using variables returning bnodes in SERVICE will
make join to fail.

>
> "A SERVICE or GRAPH clause involving a variable can be executed..." --
> again, I don't know what the "GRAPH clause" part is about.
>
GRAPH removed

>
> "foreach i in Ω(?var->i)" -- This has the same major problem as I mentioned
> last time: you can't use Ω here as it's bottom-up evaluation and not
> available at this point. Saying below the evaluation semantics that "the
> exact mechanism for [determine the possible target SPARQL query services] is
> not defined" isn't acceptable when you've just tried to define it, but the
> definition isn't valid. I thought we had agreed that we weren't going to
> define the evaluation semantics for variable-endpoint SERVICE queries?
>
I removed this part, just examples as it is informative

>
> "It must be done in a way that is compatible with the rest of the query
> results, such as constraints on variables from other graph pattern matching
> in the query." -- Why is this true, and not just potentially massively
> inefficient? Joins higher up the evaluation should take care of filtering
> out any results stemming from a service invocation on an endpoint that
> wasn't "compatible with the rest of the query results".
>
I removed that sentence.

>
> "The example above should be executed in an specific order. " -- SHOULD?
> MUST? MAY? I'm not sure "should" is appropriate here (insofar as any
> discussion of variable-endpoint service evaluation is appropriate).
>
Don't know which word would be better, any suggestion?

>
> "One possible query engine ..." -- s/query engine/approach/
>
fixed

sorry for the late reply,

Carlos
Received on Friday, 19 August 2011 12:43:09 UTC