- From: Jeen Broekstra <jeen.broekstra@gmail.com>
- Date: Fri, 17 Feb 2012 09:51:57 +1300
- To: public-rdf-dawg-comments@w3.org
Dear WG, I am currently looking into Sesame's conformance testing framework wrt the tests at http://www.w3.org/2009/sparql/docs/tests/data-sparql11/ and am hitting a problem with the tests involving CSV result formats. This problem has to do with the CSV format's lack of support for recording datatypes. I am aware that this lack of support is specified in the introduction as "by design", however, I want to urge the WG to reconsider this design choice. As an example, the problem is demonstrated by test case csv-tsv-res/csv03. In this test case, the input data contains several typed literals. To pick one example: "a7"^^xsd:hexBinary. The test query is expected to return a result row containing this literal, however, the CSV result format records this expected result row as: http://example.org/s7,http://example.org/p7,a7 As you can see, there is no hint in the format that tells the parser what datatype 'a7' should be. In consequence, the current test fails in Sesame because the framework can not reconcile this result with the (typed) result from the query engine. IMO, this is a major flaw in the specification of the CSV format. Not just because Sesame's testing framework happens to fail on this, but more generally to allow CSV to throw away a significant part of the information it is supposed to record is inappropriate. I simply don't think that it is up to a recording format to make that kind of decision. I appreciate that the CSV format should be simple and that it is primarily aimed at importing results into spreadsheet tools (and not at driving testing frameworks :)). However, like any data recording format, minimally it should be expressive enough to ensure that round-trips for any valid input are possible. Like TSV, CSV should have provisions for recording datatyped literals in a way that a parser can reconstruct the exact value being recorded. If some client application requires only plain literals, then this can be expressed in the SPARQL query that produces the result (using STR() in the SELECT). That is the appropriate delegation of responsibility IMO. I suggest that a simple modification to CSV to fix this is to amend the section on serializing RDF terms (http://www.w3.org/2009/sparql/docs/csv-tsv-results/results-csv-tsv.html#csv-terms) to state that terms (or at the very least, datatyped literals) are recorded in Turtle format. This would be more in line with how TSV behaves, as well. Regards, Jeen Broekstra
Received on Thursday, 16 February 2012 20:52:31 UTC