ACTION-284 Fed. Query review (part 1) from Axel Polleres on 2011-02-14 (public-rdf-dawg@w3.org from January to March 2011)

From: Axel Polleres <axel.polleres@deri.org>
Date: Mon, 14 Feb 2011 02:00:16 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <4F1520CC-0155-4FB3-91E4-58683AED7DA0@deri.org>
This partially completes my ACTION-284 on reviewing fed. query...  find below part 1 of my review.

I didn't really get to the meat of Carlos' changes yet, I believe, but mainly have feedback on the examples so far, in general I think that the examples should make clearer what they illustrate and apart from that I have some editorial feedback.


------------------------------------

1) Remove:
"Please refer to the errata for this document, which may
      include some normative corrections.

The previous errata for this document, are also available.

See also translations.

This document is also available in these non-normative formats: XML
and XHTML with color-coded revision indicators.
"


2)

"This specification defines the syntax and semantics of a SPARQL 1.1
Query extension for executing distributed queries."
       
- better? ->

"This specification defines the syntax and semantics of a SPARQL 1.1
Query extension for executing queries distributed over different endpoints."
       

3) We should have this  either in all or none of our documents:

"The documents produced by this Working Group are:

    * SPARQL 1.1 Query
    * SPARQL 1.1 Federation Extensions (this document)
    * SPARQL 1.1 Update
    * SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs
    * SPARQL 1.1 Protocol for RDF
    * SPARQL 1.1 Service Description
    * SPARQL 1.1 Entailment Regimes
    * SPARQL 1.1 Property Paths
    * SPARQL 1.1 Conformance Tests
"

4)
"This publication includes the extension SERVICE to the SPARQL 1.1
Query specification. The structure of this document will change to fully integrate the new features."

-->

"This publication describes the SERVICE extension to the SPARQL 1.1
Query specification."


5) Remove:
"The design of the features presented here is work-in-progress and does not represent
      the final decisions of the working group.  Implementers and application writers should
      not assume that the designs in this document will not change.
"


6)

"This document will be presented to the SPARQL Working Group, which is
part of the W3C Semantic Web Activity."
-->
"This document was produced by the SPARQL Working Group, which is part
of the W3C Semantic Web Activity."

7) 
Add:

"Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress."

8) Section 1

"The growing suite of SPARQL query services offer consumers an opportunity to merge data distributed across the web. A small number of extensions to SPARQL 1.1 enable expression of the merging queries. In particular, a SERVICE allows one to direct a portion of a query to a particular SPARQL query service, just as a GRAPH directs queries to particular named graphs. This specification defines the syntax and semantics of these extensions.
"

-->

"The growing number of SPARQL query services offer consumers an opportunity to merge data distributed across the web. The SERVICE extension allows one to direct a portion of a query to a particular SPARQL query service, similar a GRAPH graph pattern, which "directs" queries to particular named graphs in the (local) dataset . This specification defines the syntax and semantics of this extension."


9) Meta-remark across all documents: we should hav econsistent capitalization of "Web" vs "web", "Semantic Web" vs "semantic web", etc.


10)
Remove:

"The SPARQL query language is closely related to the following specifications:

    * The SPARQL Query 
        for RDF [SQRY] specification defines a language for matching and reporting on RDF data.
    * The SPARQL Protocol 
        for RDF [SPROT] specification defines the remote protocol for issuing SPARQL queries and receiving the results.
    * The SPARQL Query 
        Results XML Format [RESULTS] specification defines an XML document format for representing the results of SPARQL SELECT and ASK queries."

11) Section 1.1

You refer to fn: and rdfs: both of which aren't used in the document...
In general, I suggest, you just say:

"This document uses the same conventions as and terminology from the SPARQL1.1 Query document [Ref]."

12) Editorial note in the beginning of the doc:

"Editorial note	 
The BINDINGS section will be moved to the SPARQL query main document: SPARQL 1.1 Query . All references to BINDINGS in this document will be removed."


Not sure, but wouldn't we want to actually leave the BINDINGS *example* in the document. The example in the query doc is not about the combination of SERVICE with BINDINGS. I think the example at least makes sense


13) SECTION 2 

Given that BINDINGS is now defined in Query, this should be renamed to 

"SPARQL 1.1 Basic Federation Extension"

and I'd change

"Queries over distributed data often entail querying one source and using the acquired information to constrain queries of the next source. This section covers the SERVICE operator giving examples of how to use it and its behavior."

to

"Queries over distributed SPARQL endpoints often involves querying one source and using the acquired information to constrain queries of the next source. This section illiustrates how this can be achieved using SPAQL1.1's SERVICE Graph patterns by examples."

I'd then remove subsection heading 2.1 and make subsubsections

2.1.1 -> 2.1
2.1.2 -> 2.2
2.1.3 -> 2.3
2.1.4 -> 2.4
2.1.5 -> 2.5
2.2 BINDINGS -> 2.6  Using SERVICE in combination with BINDINGS

(in the following comments I will still use the old section numbers)



14) 2.1.1
"For instance, an endpoint which contains information about people working:

 Data in <http://people.example/sparql> endpoint:"


not a sentence... 

Next, I'm not sure about the names. Are these names of real people?
I would rather use fictitious ones.

Also, I don't find the example very useful to just query a remote endpoint, without joining the data with any local data (in that case, I can directly query the endpoint, why should I want to use SERVICE here)... so I suggest, rather to rewrite the whole example as follows:

==============================
For instance, let us assume a SPARQL service endpoint available at <http://people.example/sparql> that contains the following data in its default graph:

   <http://example.org/people/people15>  <http://xmlns.com/foaf/0.1/name>     "Alice" .
   <http://example.org/people/people16>  <http://xmlns.com/foaf/0.1/name>     "Bob" .
   <http://example.org/people/people17>  <http://xmlns.com/foaf/0.1/name>     "Charles" .
   <http://example.org/people/people18>  <http://xmlns.com/foaf/0.1/name>     "Daisy" .

which I want to combine with my local FOAF file at <http://example.org/myfoaf.rdf> that contains the single triple:

    <http://example.org/myfoaf/I> <http://xmlns.com/foaf/0.1/knows>  <http://example.org/people/people15> .
 
The following query allows to get the name of persons I know from the remote SPARQL service.

Query:

SELECT ?name
FROM <http://example.org/myfoaf.rdf>
WHERE
{
  <http://example.org/myfoaf/I> <http://xmlns.com/foaf/0.1/knows> ?person .
  SERVICE <http://people.example/sparql> { 
    ?person <http://xmlns.com/foaf/0.1/name>   ?name . } 
}

This query, on the data above, has one solution.

Query Result:

  name
  "Alice"
==============================

15) Section 2.1.2

 Again, I'd change the name to "Alice"

Is this example illustrating something that the first example doesn't illustrate? Is it so much different to have two service queries? It would be good to have a senctence in the beginning for each example that explains what it should show.

"For instance, an endpoint which contains information about people working:"
-->
"Several SERVICE patterns can be combined in the same query to join results from different SPARQL service endpoints. For example, let us now assume two service endpoints which contain information about people and projects as follows."

16) Section 2.1.3

Again, there's no rationale what this example should illustrate. I assume something like
"SERVICE patterns can be nested and used within other complex patterns, e.g. within OPTIONAL patterns. We again assume two SPARQL endpoints containing information about people and projects." 

I don't think the example is correct as it stands, BTW...  I think as you wrote it, it should only 
return the first three results.

Isn't what you want to write rather:


PREFIX people:  <http://people.example/ns#> 
PREFIX project:  <http://project.example/ns#> 
PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
SELECT ?name ?projectName
WHERE
{
  SERVICE <http://people.example/sparql> { 
    ?people foaf:name   ?name .  
  OPTIONAL { ?people people:worksIn   ?project .
    SERVICE <http://project.example/sparql> { 
      ?project project:hasTitle   ?projectName . } }
  }    
}

That would IMO return the results you put, and also illustrate nested SERVICE patterns.

17) Section 2.1.4

the use of dcterms:subject for a numeric id is a bit akward, dcterms:subject is meant to point at a subject/topic.
I suggest to change the example something like  as follows:

==================================
We assume the following data on sparql endpoints about various projects in certain subject categories in the default graph:

@prefix void:    <http://rdfs.org/ns/void#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix doap: <http://usefulinc.com/ns/doap#> .

[] dc:subject "Querying RDF" ;
   void:sparqlEndpoint <http://projects1.example/SPARQL> .
[] dc:subject "Querying RDF remotely" ;
   void:sparqlEndpoint <http://projects2.example/SPARQL> .
[] dc:subject "Updating RDF remotely"  ;
   void:sparqlEndpoint <http://projects3.example/SPARQL> .

Data in default graph at SPARQL service endpoint http://projects2.example/SPARQL: 

_:project1  doap:name    "Querying remote RDF Data" .
_:project1  doap:created "2011-02-12"^^xsd:date .
_:project2  doap:name    "Querying multiple SPARQL endpoints" .
_:project2  doap:created "2011-02-13"^^xsd:date .

Data in default graph at SPARQL service endpoint http://projects3.example/SPARQL: 


_:project3  doap:name    "Update remote RDF Data" .
_:project3  doap:created "2011-02-14"^^xsd:date .

We now want to query the project names of projects on the subject "remote"


Query:

PREFIX  void: <http://rdfs.org/ns/void#>
PREFIX  dc:   <http://purl.org/dc/elements/1.1/>
PREFIX  doap: <http://usefulinc.com/ns/doap#> 

SELECT ?service ?projectName
WHERE {
  # Find the service with subject "remote".
  ?p dc:subject ?projectSubject ;
     void:sparqlEndpoint ?service  
     FILTER regex(?projectSubject, "remote")

  # Query that service projects.
  SERVICE ?service {
     ?project  doap:name ?projectName . } 
}
 

The bindings of ?service provide the location of the service to query, yielding:

Query result:

service	title
<http://projects2.example/SPARQL>	"Query remote RDF Data"
<http://projects2.example/SPARQL>	"Querying multiple SPARQL endpoints"
<http://projects3.example/SPARQL>	"Update remote RDF Data"

=====================================

18)
"Editorial note	 
When having variables for specifying the address of a SPARQL endpoint in a SERVICE operation this variable must be bounded. In order to clearly define what "must be bounded" mean we point to a boundedness definition. This is still an issue for the SPARQL Working Group, as it the question of having variables in SERVICE calls at all. Feedback from the community is encouraged."

Is this Ed note still appropriate here?

19) 2.1.5
*
"SERVICE execution may fail due to several reasons: server down, wrong endpoint IRI, or there may be no results from the query. In order to allow users to continue with the other parts of t he query we propose to use a service silent operation Service(IRI,G,P,SilentOpt) which is false by default."

-->
"The execution of a SERVICE pattern may fail due to several reasons: the remote service may be down, the service IRI may not be dereferenceable, or the endpoint may return an error to the query. Normally, under such circumstances the invoking query containing a SERVICE pattern fails as a whole. However, SPARQL 1.1 allows to explicitly allow failed SERVICE requests by the keyword 'SILENT'."

*
Again, I'd prefer "Alice"

*
"Query result if an error happens when querying the remote SPARQL endpoint::"
-->
"Query result if an error happens when querying the remote SPARQL endpoint:"


20) Section 2.1.6 is obscure to me... it talks a bout two results when there is one, it talkes about a query, when there is no query, I suggest to simply remove that section..


21) Section 2.2 BINDINGS

"In order to efficiently communicate constraints to sparql endpoints, the queryier may follow the WHERE clause with BINDINGS. In order to efficiently address the constraints, the query on http://people.example/data could be expressed as follows:"

I don't understand entirely, as in case BINDINGS doesn't appear in the SERVICE clause, the "constraints" don't even reach the remote endpoint... shouldn't we reformulate the example to actually have the BINDINGS *within* the SERVICE pattern?

That would make more sense to me.

Accordingly, I would suggest to rephrase:

"In order to efficiently communicate constraints to sparql endpoints, the requester may use SERVICE in combination with a BINDINGS clause (see [SPARQL 1.1 Query], Section 18.2.5.6 BINDINGS). In order to efficiently address the constraints, the query on http://people.example/data could be expressed as follows:"

Also, note that the advantage of BINDINGS only comes across if you use several bindings, since a single binding can be written directly into the query. So, I would suggest to think of a better example or drop the BINDINGS section alltogether.


22) Section 3 on syntax can be dropped. The syntax is clear from the grammar and illustrated with the examples already, I don't think the schematic syntax adds anything.


to be continued... at section 4.
Received on Monday, 14 February 2011 02:00:51 UTC