W3C home > Mailing lists > Public > public-csv-wg-comments@w3.org > May 2014

New software to convert between CSV and tab-separated

From: Bert Bos <bert@w3.org>
Date: Wed, 14 May 2014 21:28:07 +0200
Message-Id: <C122CFD9-6DBC-4AAF-8C5F-CCDEFCBFBD10@w3.org>
To: public-csv-wg-comments@w3.org
Short version: I published a pair of C programs, csvtotab and tabtocsv, that convert between CSV and tab-separated values:

    http://dev.w3.org/cvsweb/csvtotab-vv/

Long version:

I often use tab-separated values, either as an intermediate step in some processing pipeline or as a format to store data. I'm on Unix, so tab-separated is the ideal format for a lot of things.

Occasionally, I get data in CSV format as output from some program and so since many years I've had a very simple program to convert CSV to tab-separated. Recently, I decided to make that program "complete" by adding the syntax features described in RFC 4180 that I've never needed and also add a conversion in the opposite direction. Right at that time, the first draft of http://www.w3.org/TR/tabular-data-model/ came out, so I added that syntax, too.

So now I have two programs that are intended to be compliant with that draft. (So you better not change it, otherwise I'll have to change my software. :-) ) With one exception: neither program checks or ensures that each record has the same number of fields.

There is no formal definition of "tab-separated," as fas as I know, but it is an even simpler and easier-to-parse format than CSV:

    file ::= (record newline)*
    record ::= (field (#x09 field)*)?
    field ::= ([^#x5C#x09#x0A#x0D] | '\\' | '\n' | '\r' | '\t')*
    newline ::= #x0A | #x0D | #x0A #x0D

(My tabtocsv program actually accepts a slight superset of this.)



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
Received on Wednesday, 14 May 2014 19:28:36 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:27:51 UTC