Re: Model / Syntax Updates

On 24/02/14 17:15, Jeni Tennison wrote:
> Hi Yakov,
>
> I take your point about the header and the names: this is a design
> decision that we have to make as a group. The main reasons I included
> it were:
>
> (a) that as designed the tabular data model (based on SQL) requires
> column names, and the header is the only place to get those

The package description may provide them?

I thinking of pointing into a larger table (e.g. totals row - a useful 
view) but I see that mysqldump writes CSV files without headers (trying 
it -- they are TSV files with .txt ext)

>
> (b) because existing good practices around CSV publication (ie from
> the Simple Data Format) require headers in the CSV file
>
> (c) because it’s a lot easier to write tools that can make the
> assumption that there are always headers than it is to write tools
> for an optional header line; it’s usually impossible to automatically
> detect whether a CSV file has a header line or not (people don’t
> publish CSV with the correct "Content-Type: text/csv;header=yes”
> header).
>
> An alternative design would be to have the data model say nothing
> about column names, and to treat column names as annotations (like
> types).
>
> Anyone else have any views on this?

I think we should prefer (or stronger) "with header" to make the 
individual files self-contained.

 Andy

>
> Jeni
>
> ------------------------------------------------------ From: Yakov
> Shafranovich yakov-ietf@shaftek.org Reply: Yakov Shafranovich
> yakov-ietf@shaftek.org Date: 24 February 2014 at 11:06:26 To: Jeni
> Tennison jeni@jenitennison.com Subject:  Re: Model / Syntax Updates
>
>>
>> Is there a particular reason why a header is always required,
>> column names must be unique, and case sensitive? The draft says it
>> is because of SQL compatibility, but it may be important to
>> elaborate as to why.
>>
>> Specifically: - regarding case sensitivity, I am not sure if all
>> SQL implementations are in fact case sensitive - regarding column
>> name uniqueness - it sounds like we are assuming that the column
>> name is the unique index to the data. However, I have seen often
>> that the assumption in CSV files maybe that the column *number*,
>> not the *name* serves as the index. This may also explain cases
>> where the header is missing but the two systems communicating via
>> CSV know the order of columns in the file and their significance
>>
>> Also, regarding the Unicode and end of line issues with RFC 4180 -
>> those can be fixed via an updated RFC.
>>
>> Thanks, Yakov
>>
>>
>>
>> On Sun, Feb 23, 2014 at 1:23 PM, Jeni Tennison wrote:
>>> Hi,
>>>
>>> Following the call last week, I have made some updates to the
>> "Syntax for Tabular Data on the Web" document at
>>>
>>> http://w3c.github.io/csvw/syntax/
>>>
>>> Namely:
>>>
>>> * I have separated out three levels of data model: * a core data
>>> model which is just tables/columns/rows/fields * an annotated
>>> data model in which each of these can be annotated * a grouped
>>> data model in which there are multiple tables in a
>> group
>>>
>>> * I have stated that the ordering of columns is significant in
>> the core data model
>>>
>>> I have defined the annotated data model extremely loosely:
>> it just says that tables, columns, rows, fields and regions can be
>> annotated, but it doesn't say anything about what those
>> annotations might look like (eg that one of the annotations might
>> be the *type* of a value). I think the direction I'd like to take
>> that is to retain this very loose definition and then state that
>> there are certain annotations (eg 'type', 'unique') that are
>> understood by particular types of applications (eg validators,
>> converters) in particular ways. Does that seem like a reasonable
>> approach?
>>>
>>> I haven't made any attempt to tackle the syntax for annotated
>> or grouped tables as yet.
>>>
>>> Jeni -- Jeni Tennison http://www.jenitennison.com/
>>>
>>
>>
>>
>>
>
> -- Jeni Tennison http://www.jenitennison.com/
>

Received on Monday, 24 February 2014 18:25:05 UTC