Re: Model / Syntax Updates from Yakov Shafranovich on 2014-02-24 (public-csv-wg@w3.org from February 2014)

From: Yakov Shafranovich <yakov-ietf@shaftek.org>
Date: Sun, 23 Feb 2014 21:45:26 -0500
To: Jeni Tennison <jeni@jenitennison.com>
Cc: public-csv-wg@w3.org
Message-ID: <CAPQd5oRUkT6KUshZTM9kTQ0ze6tyzxY-iYRO6cQO0+RxQso0bg@mail.gmail.com>

Is there a particular reason why a header is always required, column
names must be unique, and case sensitive? The draft says it is because
of SQL compatibility, but it may be important to elaborate as to why.

Specifically:
- regarding case sensitivity, I am not sure if all SQL implementations
are in fact case sensitive
- regarding column name uniqueness - it sounds like we are assuming
that the column name is the unique index to the data. However, I have
seen often that the assumption in CSV files maybe that the column
*number*, not the *name* serves as the index. This may also explain
cases where the header is missing but the two systems communicating
via CSV know the order of columns in the file and their significance

Also, regarding the Unicode and end of line issues with RFC 4180 -
those can be fixed via an updated RFC.

Thanks,
Yakov



On Sun, Feb 23, 2014 at 1:23 PM, Jeni Tennison <jeni@jenitennison.com> wrote:
> Hi,
>
> Following the call last week, I have made some updates to the "Syntax for Tabular Data on the Web" document at
>
>   http://w3c.github.io/csvw/syntax/
>
> Namely:
>
>   * I have separated out three levels of data model:
>     * a core data model which is just tables/columns/rows/fields
>     * an annotated data model in which each of these can be annotated
>     * a grouped data model in which there are multiple tables in a group
>
>   * I have stated that the ordering of columns is significant in the core data model
>
> I have defined the annotated data model extremely loosely: it just says that tables, columns, rows, fields and regions can be annotated, but it doesn't say anything about what those annotations might look like (eg that one of the annotations might be the *type* of a value). I think the direction I'd like to take that is to retain this very loose definition and then state that there are certain annotations (eg 'type', 'unique') that are understood by particular types of applications (eg validators, converters) in particular ways. Does that seem like a reasonable approach?
>
> I haven't made any attempt to tackle the syntax for annotated or grouped tables as yet.
>
> Jeni
> --
> Jeni Tennison
> http://www.jenitennison.com/
>

Received on Monday, 24 February 2014 11:01:53 UTC