Re: CSV/TSV comments from Andy Seaborne on 2011-07-28 (public-rdf-dawg@w3.org from July to September 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 28 Jul 2011 09:35:58 +0100
To: public-rdf-dawg@w3.org
Message-ID: <4E311F6E.1000802@epimorphics.com>
On 27/07/11 20:40, Steve Harris wrote:
> Conversely, we are big fan of the TSV format, as written.
>
> We've used a very similar format inside Garlik for 4-5 years, as it's
> very efficient for Javascript/Perl/Python to process, without losing
> any typing information, and also easy for humans to read.
>
> The format has been supported in 4store since its public release, and
> it's reasonably widely used.
>
> The way I look at it is: CSV is for loading into spreadsheets, TSV is
> for processing by bespoke software.
>
> - Steve
>
> On 2011-07-27, at 19:22, Lee Feigenbaum wrote:
>
>> Danny Kahn, a colleague of mine at Cambridge Semantics,

My thanks to Danny for the time spent reviewing the document.

>> looked over
>> http://www.w3.org/2009/sparql/docs/csv-tsv-results/results-csv-tsv.html
>> . He compared it with how we currently implement CSV and TSV
>> results to SPARQL in Anzo.
>>
>> Here are the differences:
>>
>> 1. Both our CSV and TSV formats do not serialize the details of RDF
>> terms.
>>
>> 2. Our implementation optionally includes headers for CSV. We don't
>> use the header=absent content type parameter to indicate this.

How is it optionally controlled?  Simply by whatever the sending code 
decides?

Just to be clear: "header=absent" is part of the RFC 4180, sec 3, "MIME 
Type Registration of text/csv", not a feature of the SPARQL CSV result 
format.

The only thing SPARQL CSV adds is that if the field row is absent, than 
the "header=absent" must be present, which is not required by text/csv.

>> 3. Our TSV implementation makes the header line optional, just as
>> with CSV.

http://www.iana.org/assignments/media-types/text/tab-separated-values
says:
"""
The first line of this encoding is special, it contains the name of
each field, separated by tabs.
"""
which I read as it not being optional.  That said, general compliance to 
TSV and (more so) CSV "specs" is fairly loose in the wild.

>> I have not been that engaged in this discussion yet, but I'm
>> surprised to see these significant differences between CSV and TSV,
>> whereas I normally view these as basically the same format but with
>> a different separating character. I'm not a big fan of the TSV
>> format as currently specified.
>>
>> Looking briefly over the document, I think the section on
>> serializing CSV needs a bit of work -- it seems to specify the
>> order that solution bindings should emitted in terms of the header
>> row, but the header row is optional.

The CSV format without header line is just presenting a table of values, 
with no variable binding.

Steve and Greg have argued that it should be mandatory and, absent 
further comments, I plan to change the doc to make the header filed line 
mandatory.  A mandatory header line is strengthening the 
table-of-variable-bindings view.

I've now made this change so as to reflect

http://www.w3.org/2009/sparql/docs/csv-tsv-results/results-csv-tsv.html#csv-table

Let me know if you have any comments on the revised text.

"needs a bit of work" --> do you have other comments in this area?

>> Even in cases where the header
>> row is omitted, rows needs to emit variables in a consistent order,
>> right?

In CSV, if there is no field row, then it is just a table of strings to 
be processed by the client application.  There's no required 
relationship to variables, in particular, no relationship to the query 
SELECT line (SELECT *).

So without a header line, the results are just "some stuff" -- with no 
real constraints but an client application / query processor pair can 
agree further constraints.

>>
>> Lee

	Andy
Received on Thursday, 28 July 2011 08:36:31 UTC