Re: Syntax for Tabular Data on the Web from Andy Seaborne on 2014-02-12 (public-csv-wg@w3.org from February 2014)

From: Andy Seaborne <andy@apache.org>
Date: Wed, 12 Feb 2014 14:13:05 +0000
To: Leigh Dodds <leigh@ldodds.com>
CC: public-csv-wg@w3.org
Message-ID: <52FB8171.5040409@apache.org>

On 12/02/14 13:49, Leigh Dodds wrote:
> Hi,
>
> On Wed, Feb 12, 2014 at 11:57 AM, Andy Seaborne <andy@apache.org> wrote:
>> ...
>> Comment of whitespace and appearance padding triggered by:
>>
>> """
>> The first line of a CSV+ file MUST contain a comma-separated list of names
>> of columns.
>> """
>>
>> Data from a spreadsheet that itself is used for showing data can involve
>> additional padding :  a title, blank lines, blank columns.
>>
>> Maybe we need to focus on this more data-centric style.
>>
>> ----------------------
>> ,,,
>> ,,TITLE,
>> ,,,
>> ,Sales region ,Quarter,Sales
>> ,North,Q1,15
>> ,,Q2,25
>> ,,Q3,16
>> ,,Q4,180
>> ,South,Q1,18
>> ,,Q2,25
>> ,,Q3,13
>> ,,Q4,99
>> ----------------------
>
> I don't think that is a data-centric style, I think its human-centric.
> Titles, padding, blank lines are there for readers of the spreadsheet,
> rather than consumers of the tabular data.

It's a question of scope.

1/ A subset is a data table - do we want a way to say that?
2/ The repeat/non-repeat of cell like "North"

In machine generated files, it it more likely to have "North" in the 3 
cells below it but empty for "repeat" does occur.  Gregg described the 
de-normalization of foreign key relationship.

I think the WG could just consider "data tables" - fully populated 
cells, header row - and be a success but I'm asking whether it should do so.

One thing we could do is provide narrative in best practices that 
describes the issues and the consequences.

	Andy

Received on Wednesday, 12 February 2014 14:13:34 UTC