- From: Alf Eaton <eaton.alf@gmail.com>
- Date: Wed, 12 Feb 2014 12:54:04 +0000
- To: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
One problem with sampling existing CSV files is that they’re quite likely to already be well structured, and limited in what they do by the existing constraints of the CSV format. What’s arguably more useful is to sample the range of Excel files that have been published, to see if there's more that needs to be supported. To start with, I’ve produced a list of URLs of Excel files that have been published as supporting information for articles on nature.com: https://gist.github.com/hubgit/8954821/ (Run `wget -i https://gist.github.com/hubgit/8954821/raw/nature-xls.txt` to fetch them all). These files show a wide range of structure that authors actually add to tabular data, many of which are possible in HTML tables but not in CSV files. Perhaps a JSON file accompanying a CSV file may be able to cover some of these features? Examples of features found in Excel spreadsheets published as supporting data for journal articles: * Table description and comment rows (sometimes starting with #) at the start of the sheet * Multiple tables in the same sheet, with a title row for each table * Merged cells, spanning multiple rows or columns * Text formatting (bold, italic), e.g. species names, or to show significance * Cell formatting (background colours), to highlight grouping or patterns * Caption (description), footer, footnotes * Subheadings/subsections within a single table, often with indented headings Alf On 11 February 2014 16:21, Dan Brickley <danbri@google.com> wrote: > On 11 February 2014 16:03, Jeni Tennison <jeni@jenitennison.com> wrote: >> Of interest to this group, this work from Max Ogden on putting together a set of test cases for CSV parsers: >> >> https://github.com/maxogden/csv-spectrum > > Oh, that's great. I went through the Open Office source tree last week > looking for similar, but didn't find anything suitable. > > Dan > >> Jeni >> -- >> Jeni Tennison >> http://www.jenitennison.com/ >> >
Received on Wednesday, 12 February 2014 12:54:55 UTC