Federated query tests - experiences from Andy Seaborne on 2012-01-27 (public-rdf-dawg@w3.org from January to March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 27 Jan 2012 20:18:22 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4F23068E.4040703@epimorphics.com>

(First attempt to send this was from the wrong address - please ignore)

I've managed to run the federated query tests - in case it helps anyone 
else, here's what I did.

There are two changes to the test needed:

== service06.srx

The query says:
SELECT ?s ?o1 ?o2
but ?o2 (which is never bound) wasn't listed in the in the <head> of the 
results.

I've corrected this in CVS.

== service05.rq

This one is because the test isn't quite what it seems.

SELECT ?service ?title
WHERE {
   # Find the service with subject "remote".
   ?p dc:subject ?projectSubject ;
      void:sparqlEndpoint ?service
      FILTER regex(?projectSubject, "remote")

   # Query that service projects.
   SERVICE ?service {
      ?project  doap:name ?title . }
}

Written like that, strict SPARQL says the FILTER is on the whole block 
and that includes the SERVICE.  It's as if it is after the SERVICE (an 
optimizer can move it around - I'm sure many do).  That makes ?service 
unfiltered at the point of naive execution.

But we want to force it to be only applying to the local part of the 
query: adding a pair of {} does that:

  {
    {
   # Find the service with subject "remote".
   ?p dc:subject ?projectSubject ;
      void:sparqlEndpoint ?service
      FILTER regex(?projectSubject, "remote")
    }

   # Query that service projects.
   SERVICE ?service {
      ?project  doap:name ?title . }
}

After talking to Greg about this on IRC, I have changed CVS as it seems 
to be the intention.  My EARL report assumes the change.

== Setup

The tests can't be executes as-is because they refer to various 
endpoints needed to be queries.  I was aiming to run the queries and 
manually check the results against the specified results.  Changing the 
ARQ test suite code looked like it would take longer than running the 
tests and doing some manual work.

The data for the same endpoint is different for different tests so 
instead I made every endpoint/data combination different.  I ran a copy 
of Fuseki with 9 different endpoints (some tests go to two places, #7 
does not need a real endpoint).

e.g.
service02.rq:

SELECT ?s ?o1 ?o2
{
   SERVICE <http://localhost:3030/ds2a/sparql> {
   ?s ?p ?o1 . }
   OPTIONAL {
     SERVICE <http://localhost:3030/ds2b/sparql> {
     ?s ?p2 ?o2 }
   }
}

Then I ran the tests queries in a single run, and checked the results 
were the same as the srx files.

The final state of my test area is available for inspection:

http://people.apache.org/~andy/service-tests-2012-01-25.zip

     Andy

Received on Friday, 27 January 2012 20:18:50 UTC