Federation Query (was: Re: SPARQL WG Agenda - Tuesda 2011-08-09) from Gregory Williams on 2011-08-08 (public-rdf-dawg@w3.org from July to September 2011)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Mon, 8 Aug 2011 18:26:21 -0400
To: Lee Feigenbaum <lee@thefigtrees.net>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <3A28F88D-DF43-430B-B318-58193C5A4BEC@evilfunhouse.com>

On Aug 8, 2011, at 2:20 PM, Lee Feigenbaum wrote:

> Federated query
> Where are we with Greg's review and comments?

I've just read the latest draft, and think this document still has some major issues (the biggest being the still-wrong evaluation semantics for variable-endpoint service queries, and the introduction of BINDINGS-related information to the algebra and veal semantics). My review below.

thanks,
.greg

1 Introduction

"direct a portion of a query to a particular SPARQL endpoint to be executed against local graphs" -- "local" to whom?

1.1.1 Namespaces

"This document uses the same conventions as and terminology from the SPARQL 1.1 Query document" -- terminology is discussed in 1.1.3, so its mention here seems strange.

1.1.3 Terminology

"IRI" just links to the query document. I assume it should link to something more specific?

"There are three variables: x, y and z (shown as column headers)..." -- is this talking about the example already described in 1.1.2?

2 SPARQL 1.1 Federated Query Extension

"invoke a portion of a SPARQL query against a remote SPARQL protocol service" -- should this link to the protocol document?

2.1 Simple query to a remote SPARQL endpoint

"... and join the returned data with the data from the local RDF data store." -- Should "local RDF data store" instead talk about an RDF Dataset? "RDF data store" isn't defined anywhere (although it is used offhandedly once in Query).

"Consider a query to find the names of the people we know." -- "we"? I think this would be better with concrete names (e.g. 'the people Bob knows').

":people15 foaf:name "Alice" ." -- I think this would read better if :people15 was instead :person15 (likewise for the other instances). Obviously this is a minor and subjective issue.

2.2 SPARQL query with OPTIONAL to two remote SPARQL endpoints

"@prefix : http://example.org/ ." -- missing <> around the IRI in both data examples.

2.3 Service Execution Failure

"...under such circumstances the invoking query containing a SERVICE pattern fails as a whole." -- s/invoking/invoked/

"SPARQL 1.1 allows to explicitly allow failed SERVICE requests by the keyword SILENT ." -- too many "allows", and I'm not sure "SPARQL 1.1" is something that is doing the allowing here. Perhaps "Queries may explicitly allow failed SERVICE requests with the use of the SILENT keyword" or something similar.

"The SILENT keyword indicates that error encountered while ..." -- s/error/errors/

"... if an error happens when" -- s/happens when/occurs while/

2.4 Interplay of SERVICE and BINDINGS (Informative)

" @prefix : http://example.org/ ." -- missing <> in both data examples.

"... may return a large number of results (if the endpoint http://example.org/sparql contains a very large database)" -- this sounds awkward to me because the text just defined for me what that endpoint holds, and I wouldn't consider 3 results "a large number". Perhaps s/a large number of results/more results than necessary/ or some other wording that gets the point across without asserting that the toy example given actually demonstrates "too large" or "too many" results being returned.

"... or not return all of them: many existing SPARQL endpoints ..." -- I think this should be a separate sentence.

"... may miss the ones matching subjects ?s from the default graph." -- I know what's being said here, but "the default graph" seems too imprecise as there are two default graphs being dealt with in the example. Similarly in the next sentence: "first the bindings from the default graph are evaluated". Perhaps discuss the "local default graph"?

"... a query planner for federated queries, may decide ..." -- s/,//

"SELECT ?s ?o { ?s a foaf:Person } " -- why is ?o projected? The query results table shown for it does not have an ?o column, and it is unnecessary.

"PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT * {?s foaf:knows ?o } BINDINGS ?s { (:a) (:b) }" -- missing a PREFIX declaration for :. Also, the results for this should be shown immediately, emphasizing the effect of the BINDINGS clause, before showing the entire query again.

3.1 Translation to the SPARQL Algebra

" If E is of the form SERVICE SILENT IRI {P}" -- this seems like it only covers the SILENT case. Is there a way to emphasize that both SILENT and non-SILENT forms are handled similarly? The connection between the presence of SILENT and the SilentOp argument needs to be defined.

"Let G := Join(G, Service(IRI, G, Transform(P), SilentOp))" -- this seems REALLY strange to me. On the right hand side of the assignment, G is being used as both the LHS of the Join, and an argument to Service(). I take it this is meant to aid in the construction of service invocations using BINDINGS clauses? If so, I don't think that belongs in the evaluation semantics.

3.2 SPARQL 1.1 Simple Federation Extension Algebra

"if IRI" -- what does this mean?

"eval(D(G), Service(IRI,G,P,SilentOp)) = Invocation( IRI, vars, P, Bindings(G, vars), SilentOp )" -- the inclusion of "Bindings(G, vars)" isn't explained anywhere, and it's not clear to me exactly how its meant to be evaluated. As stated above, I don't think it belongs in the evaluation semantics.

"* with no default-graph-uri or named-graph-uri" -- I'm still asking if we should allow their inclusion in the endpoint IRI.

3.2.1 SERVICE Examples

The examples in this section aren't using the 4-argument form of Service() that is defined in section 3.2. I believe they are missing the LHS (G) argument, which as I've said is how I believe the definition should be. Either way, though, this needs alignment with 3.2.

In my browser the text sizes of the two examples in this section are different. Is that true universally? If so, can it be fixed?

4 SERVICE and GRAPH Variables (Informative)

"A variable used in place of a service IRI or a GRAPH pattern..." -- I'm not sure how to interpret the "GRAPH pattern" part of this statement.

"... about project endpointsWe assume ..." -- s/endpointsWe/endpoints. We/

I think either this section needs to be dropped (as we aren't really defining the semantics) or it needs to be fleshed out. Before diving in to an example, the text needs to explain that an implementation could make use of the "data about which services contain data about project endpoints" to invoke service calls on each of those endpoints. Right now those dots aren't connected.

"_:project1 doap:name "Querying remote RDF Data" ." -- In the data example, this has "Querying", but the results example below has "Query".

The use of bnodes in the remote endpoints, while not used in these examples, brings up another point. Should this document be mentioning that bnodes need to be made unique across service invocations, or is that obvious enough from the rest of the specs? I'd worry that without such a note, a naive implementation might end up joining _:a from one service with _:a from another service, which shouldn't happen.

"A SERVICE or GRAPH clause involving a variable can be executed..." -- again, I don't know what the "GRAPH clause" part is about.

"foreach i in Ω(?var->i)" -- This has the same major problem as I mentioned last time: you can't use Ω here as it's bottom-up evaluation and not available at this point. Saying below the evaluation semantics that "the exact mechanism for [determine the possible target SPARQL query services] is not defined" isn't acceptable when you've just tried to define it, but the definition isn't valid. I thought we had agreed that we weren't going to define the evaluation semantics for variable-endpoint SERVICE queries?

"It must be done in a way that is compatible with the rest of the query results, such as constraints on variables from other graph pattern matching in the query." -- Why is this true, and not just potentially massively inefficient? Joins higher up the evaluation should take care of filtering out any results stemming from a service invocation on an endpoint that wasn't "compatible with the rest of the query results".

"The example above should be executed in an specific order. " -- SHOULD? MUST? MAY? I'm not sure "should" is appropriate here (insofar as any discussion of variable-endpoint service evaluation is appropriate).

"One possible query engine ..." -- s/query engine/approach/

Received on Monday, 8 August 2011 22:27:31 UTC