Re: comment on SPARQL 1.1 CSV result format: datatyped literals from Andy Seaborne on 2012-03-06 (public-rdf-dawg-comments@w3.org from March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 06 Mar 2012 16:43:19 +0000
To: public-rdf-dawg-comments@w3.org
CC: Jeen Broekstra <jeen.broekstra@gmail.com>
Message-ID: <4F563EA7.4090305@epimorphics.com>
Jeen,

Thank you for your comments on the use of CSV for SPARQL results.

The goals for the TSV and CSV formats are slightly different.

The TSV format is a faithful representation of the data, including RDF 
details such as datatypes and distinguishing IRIs from literals. To work 
with the TSV format requires some level of RDF parsing to extract the 
information.

The CSV format aims at delivering data to applications, such as 
spreadsheets, without the need for parsing RDF details and leaving it to 
the application to decide on the data format. For example, fields in the 
CSV file that look like numbers are often treated as numbers, including 
formatting and alignment in a spreadsheet. This duck-typing approach is 
a different style to RDF.

As file formats, aside from their use for SPARQL, TSV and CSV are very 
similar and tools often support both. There less value making their use 
with SPARQL results cover the same use case of recording RDF-specific 
details when one can be used for that and the other for presentation in 
non-RDF aware applications.

For testing the CSV, the WG has arranged that the tests use canonical 
forms for data and results. This is in the hope that simply converting 
to strings in the test suite will enable the test of the test harness to 
handle the results equality testing.

We would be grateful if you would acknowledge that your comment has been 
answered by sending a reply to this mailing list.

Andy
On behalf of SPARQL-WG

On 16/02/12 20:51, Jeen Broekstra wrote:
> Dear WG,
>
> I am currently looking into Sesame's conformance testing framework wrt
> the tests at http://www.w3.org/2009/sparql/docs/tests/data-sparql11/ and
> am hitting a problem with the tests involving CSV result formats. This
> problem has to do with the CSV format's lack of support for recording
> datatypes.
>
> I am aware that this lack of support is specified in the introduction as
> "by design", however, I want to urge the WG to reconsider this design
> choice.
>
> As an example, the problem is demonstrated by test case
> csv-tsv-res/csv03. In this test case, the input data contains several
> typed literals. To pick one example: "a7"^^xsd:hexBinary. The test query
> is expected to return a result row containing this literal, however, the
> CSV result format records this expected result row as:
>
>   http://example.org/s7,http://example.org/p7,a7
>
> As you can see, there is no hint in the format that tells the parser
> what datatype 'a7' should be. In consequence, the current test fails in
> Sesame because the framework can not reconcile this result with the
> (typed) result from the query engine.
>
> IMO, this is a major flaw in the specification of the CSV format. Not
> just because Sesame's testing framework happens to fail on this, but
> more generally to allow CSV to throw away a significant part of the
> information it is supposed to record is inappropriate. I simply don't
> think that it is up to a recording format to make that kind of decision.
>
> I appreciate that the CSV format should be simple and that it is
> primarily aimed at importing results into spreadsheet tools (and not at
> driving testing frameworks :)). However, like any data recording format,
> minimally it should be expressive enough to ensure that round-trips for
> any valid input are possible.
>
> Like TSV, CSV should have provisions for recording datatyped literals in
> a way that a parser can reconstruct the exact value being recorded.
>
> If some client application requires only plain literals, then this can
> be expressed in the SPARQL query that produces the result (using STR()
> in the SELECT). That is the appropriate delegation of responsibility IMO.
>
> I suggest that a simple modification to CSV to fix this is to amend the
> section on serializing RDF terms
> (http://www.w3.org/2009/sparql/docs/csv-tsv-results/results-csv-tsv.html#csv-terms)
> to state that terms (or at the very least, datatyped literals) are
> recorded in Turtle format. This would be more in line with how TSV
> behaves, as well.
>
>
> Regards,
>
> Jeen Broekstra
>
Received on Tuesday, 6 March 2012 16:43:47 UTC