Re: additional tool testing of CSV files in Syntax document from Dan Brickley on 2014-02-19 (public-csv-wg@w3.org from February 2014)

From: Dan Brickley <danbri@google.com>
Date: Wed, 19 Feb 2014 14:58:37 +0000
To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <CAK-qy=5wo=7oSLf2Aa3V+-9PvgBbkURVLpf2X36PBd5UeNrafw@mail.gmail.com>

On 19 February 2014 10:58, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:
> All – I’ve checked how Excel 2007 on Win 7 Enterprise renders the files
> referenced in the Syntax document. I’ve created a wiki page (linked from
> “Tools”) with the results:
> https://www.w3.org/2013/csvw/wiki/MS_Excel_compatibility_tests

This (and the tests in git,
https://github.com/w3c/csvw/tree/gh-pages/syntax  ) are great.

Slight aside but, ...

I was thinking of ways in which we might express tests against CSV
files. Would this kind of structure make sense?:

For each CSVish file, e.g.
https://github.com/w3c/csvw/blob/gh-pages/syntax/test-utf8.csv

Have a corresponding tests directory, e.g. test-utf8-expected/

and then within that, one file per cell, so filenames and contents
something like this:

cell_labels_2.txt: test number

cell_0_0.txt: Я могу есть стекло, оно мне не вредит.

cell_1_2.txt: 2014-02-11

cell_2_0.txt: ""never again""

we said

In other words, the bytes (assumed utf-8?) in each text file would
correspond to cell values indicated in the filename. An alternative
would be to have some other canonically easy to parse tabular data
notation (such as how we used ntriples in the old RDF WG).

We could then use different CSV parsers and check whether the expected
contents match the parsed results.

</thinking out loud>

Dan

Received on Wednesday, 19 February 2014 14:59:05 UTC