Re: CSV/TSV results test cases and suggested adaption of JSON results test cases from Gregory Williams on 2011-08-07 (public-rdf-dawg@w3.org from July to September 2011)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Sun, 7 Aug 2011 15:39:18 -0400
To: Axel Polleres <axel.polleres@deri.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <106E33F0-BD22-47C3-9605-4D3D36AFD659@evilfunhouse.com>

On Aug 7, 2011, at 2:21 PM, Axel Polleres wrote:

> 1) I added (my understanding of) what CSV and TSV test cases should return in 
> 
>  http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/
> 
> the test cases are:
> 
>   http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#csv01

I'm finding these cases particularly hard to test due to potential canonicalization of datatyped literals on import. The data file contains this triple:

:s5 :p5 "5"^^xsd:decimal.

Which I believe (?) may be transformed into a canonical representation during import into the underlying store (as "5.0"^^xsd:decimal). If it is canonicalized on import and makes its way into the output serialization in the canonical form, though, it's not difficult to compare with the lossy csv results:

http://example.org/s5,http://example.org/p5,5,,

which has "5" as the corresponding csv value. In non-lossy result formats this wasn't a problem because the result record had the xsd:decimal type attached to the "5" value, and the comparison could be done using a D-entailment corresponding to the canonicalization process. Without that datatype information, though, it's impossible to know if "5" and "5.0" should compare as equal because "5" might have started out as an xsd:decimal (true), an xsd:string (false), or anything else that could produce that lexical form in the CSV results.

My questions are:

* Have I understood the issue correctly?
* If so, is this just something I'm going to have to work around?
* Could the tests be annotated in such a way as to indicate that this might be an issue (a la mf:feature)?
* Could we add csv/tsv tests that don't have this canonicalization problem for the common xsd datatypes?

thanks,
.greg

Received on Sunday, 7 August 2011 19:40:13 UTC