Proposition a new recommandation : SPARQL 1.1 test framework

Hello

6 month ago, I started an experimentation. We proposed a test framework
named TFT (Tests for Triple stores) to test the interoperability of the
SPARQL endpoint of RDF database systems. This framework can execute the
W3C's SPARQL 1.1 test suite and also its own tests of interoperability. To
help the developers and end-users of RDF databases, we perform daily tests.
With these tests, we have built a scoring system named SPARQLScore and
share our results on the website http://sparqlscore.com
(These systems are described in more details in our submission at the
semantic web challenge
http://challenge.semanticweb.org/2014/submissions/swc2014_submission_4.pdf )

It's time to share my conclusion. I have a good and a bad news.

The bad news is the protocol of SPARQL update is not exactly the same
between the databases. It is very hard to develop a web agent with SPARQL
without knowing the specificity of your software.
In the SPARQL's recommendation, the protocol about update queries is fuzzy
and each database implements a different flavor of the protocol.
TFT can test five RDF databases because I have implemented the
specificities of your databases to execute the same queries.

But the good news, a open benchmark is possible and can help to converge.
Very quickly after the launch of the website sparqlscore.com, four  vendors
contacted us to include their software in our tests
and three accepted to open their results. Three vendors have a specifically
set-up SPARQL endpoint for our tests.
And very quickly, the editors started to talk about the manner to interpret
the recommandation and several editors fixed some problems about the
interoperability of their databases.

My first conclusion is simple.
SPARQL 1.1 is a recommendation but not the tests. The official test suite
is a great job but it's only a starting point and each difference in the
result of a SPARQL query in your softwares is an obstacle to the deployment
of Linked Data in public institutions or a simple company.
I know that a first challenge of Big Data is the performance and the
quantity of data but interoperability is not an option in the Linked Data.

I propose the W3C re-open the part "SPARQL test suites" where a light
version of TFT can execute all the future official tests about SPARQL with
one unique protocol. The service can test remotely your endpoints every
day. The name of your software can be private or public and we can push the
report (for jenkins or EARL).
The aim is clearly to converge the softwares and create an real ecosystem
interoperable across our customers.

SPARQL 1.1 must have a official "test framework" where the editors and
customers can push their test suites.

What do you think about of this idea ?

Best regards
Karima Rafes

---------------------------------------------------------------------------------
We can see a sample of the last remarks/issues on Github :
https://github.com/BorderCloud/TFT-tests (Test suites)
https://github.com/BorderCloud/TFT (Test framework)

====> Omission in the recommendation SPARQL 1.1 and in the tests : the
operators with datetime
https://github.com/BorderCloud/TFT-tests/issues/6

>**Date subtraction not defined by SPARQL 1.1**
>The test for date subtraction in test Grid Observatory does not conform to
the SPARQL 1.1 specification.
>The operator "-" is only defined for numeric operands.
>http://www.w3.org/TR/sparql11-query/#OperatorMapping


In the recommendation, we can read "The following table associates each of
these grammatical productions with the appropriate operands and an operator
function defined by either XQuery 1.0 and XPath 2.0 Functions and Operators
[FUNCOP] or the SPARQL operators specified in section 17.4.".
In the "operator function defined by either XQuery 1.0 and XPath 2.0", we
can see ["Backs up the subtract, "-", operator on xs:dateTime values"](
http://www.w3.org/TR/xpath-functions/#func-subtract-dateTimes)

Proposition of new test for the substract's operator with datetime :
```
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
SELECT   ?date1 ?date2 ( ?date1-?date2 AS ?funcSubtractDateTimes)
{
    BIND (xsd:dateTime("2005-01-02T00:00:01Z") AS ?date1)
    BIND (xsd:dateTime("2005-01-01T00:00:00Z") AS ?date2)
}
```
Result expected :
```
----------------------------------------------------------------------------------------------------------------
| date1                                | date2
   | funcSubtractDateTimes          |
================================================================================================================
| "2005-01-02T00:00:01Z"^^xsd:dateTime |
"2005-01-01T00:00:00Z"^^xsd:dateTime |
"P1DT0H0M1.000S"^^xsd:dayTimeDuration |
----------------------------------------------------------------------------------------------------------------
```

====> JSON Result Tests: type "typed-literal" is invalid
>**JSON Result Tests: type "typed-literal" is invalid**
>The JSON tests expect typed results to have a field "type: 'typed-literal'
". According to the SPARQL JSON Protocol, such a value does not exist; it
should be "type: 'literal'" instead:
>
http://www.w3.org/TR/2013/REC-sparql11-results-json-20130321/#select-encode-terms

It's true.

Need to fix the JSON-results :
*
https://github.com/BorderCloud/TFT-tests/blob/master/sparql11-test-suite/json-res/jsonres01.srj
*
https://github.com/BorderCloud/TFT-tests/blob/master/sparql11-test-suite/json-res/jsonres02.srj

====> SPARQL Protocol : Status Code 400 for error should be honoured

>**SPARQL Syntax Tests: Status Code 400 for error should be honoured**
>For several of the SPARQL 1.1 syntax tests, the SPARQL testsuite simply
contains a dummy result with a variable named "error", indicating that
parsing is expected to fail. According to the SPARQL protocol, parse errors
should return an HTTP result with status code 400 (Bad Request):
>http://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#query-failure
>The SPARQLScore tests need to be changed accordingly.

Good remark. For the moment, I didn't test seriously the protocol. I wrote
your "enhancement" in my todo list.

====> Fuzzy requirement : returned datatype

>**ROUND, CEIL, FLOOR should not require returned datatype to be the same
as argument datatype**
>The functions ROUND, CEIL and FLOOR return by their definition an integer
number without fractional part. The SPARQL 1.1 specification only requires
the return value to be "numeric" and not that it matches the type of the
arguments. The tests should therefore in this case only check if the
results are numeric.

Can you show an example of test failed about it ?
May be the recommandation is fuzzy about it, but the unit test cannot be
fuzzy.
It would better that the return type is mandatory in recommendation in all
circumstances and make a unit test for each possibility.

====> Fuzzy requirement : AVG

>**AVG should accept all numeric return types**
>The tests currently require that the result of AVG always is the same type
as the input arguments. However, the SPARQL 1.1 specification defines AVG
as to return any numeric value. The tests should therefore only check if
the return type is numeric.
>This also makes sense when you take e.g. the example of the average of the
integers 1 and 2 - which is the double or decimal value 1.5.

For me the requirements are fuzzy because it's not possible to make unit
tests for all the cases. The AVG's test are logic but it is necessary to
add more unit tests for the function like AVG and clarify the requirements.

====> The queries need a BASE (Details :
https://github.com/BorderCloud/TFT-tests/issues/4 )

Now, TFT inserts in each query the base URI : BASE <http://...>
You can see the result in your new score: http://sparqlscore.com/

====> Result comparison should allow for extra "system" triples

You are not alone with this problem but in general, the editors implemented
 a specific solution.
For example, I can put OFF the security in the config and I can use "CLEAR
ALL" to delete all the triples of system.

If you give me a method to clear all the system's triples, I can add a
preprocessing before each test for your software. You can also add a new
flag for the security in your file config : "CLEAR_SYSTEM" with the command
"CLEAR ALL".

You can see the function about the pretreatments before the tests for the
other softwares : [clearAllTriples()](
https://github.com/BorderCloud/TFT/blob/master/Test.php)

(follow the discussion : https://github.com/BorderCloud/TFT-tests/issues/5 )

Received on Tuesday, 3 March 2015 08:34:37 UTC