Re: Model / Syntax Updates from Alfredo Serafini on 2014-02-24 (public-csv-wg@w3.org from February 2014)

From: Alfredo Serafini <seralf@gmail.com>
Date: Mon, 24 Feb 2014 12:13:52 +0100
To: Yakov Shafranovich <yakov-ietf@shaftek.org>
Cc: Jeni Tennison <jeni@jenitennison.com>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <CADawF4NXvLVsNAKtbwx9XDaQxLm+F1xz9FQ3cEgSgbnPcSw1KQ@mail.gmail.com>

Hi I am following this interesting discussion and I like to contribute with
some little ideas, if possible.

Handling the column names as unique does not limit the fact that they could
be used with their ordering, as for the associative arrays in languages
such as PHP. So for example:
name, email_1, address, email_2

could be used to detect that the 'email_*' columns refers to the same
property for parsers who are aware on the names, and who wants to use the
ordering will simply ignore the name and reference the columns as 1 and 3.
I mean:
1) a unique name is needed, from my point of view in both direction
(parsing from CSV and writing to CSV: thy should be consistent and without
loss of information)
2) ordering should be preserved in order to let people use direct ordering,
if there are many use cases on that

The problem with case is crucial: most of the widely used SQL
implementation use the case sensitivity, so this should be from my point of
view the default. Ignoring the case seems something more similar to a
configuration option for the parser, to me, than something that may be in
the specification.

Alfredo

2014-02-24 3:45 GMT+01:00 Yakov Shafranovich <yakov-ietf@shaftek.org>:

> Is there a particular reason why a header is always required, column
> names must be unique, and case sensitive? The draft says it is because
> of SQL compatibility, but it may be important to elaborate as to why.
>
> Specifically:
> - regarding case sensitivity, I am not sure if all SQL implementations
> are in fact case sensitive
> - regarding column name uniqueness - it sounds like we are assuming
> that the column name is the unique index to the data. However, I have
> seen often that the assumption in CSV files maybe that the column
> *number*, not the *name* serves as the index. This may also explain
> cases where the header is missing but the two systems communicating
> via CSV know the order of columns in the file and their significance
>
> Also, regarding the Unicode and end of line issues with RFC 4180 -
> those can be fixed via an updated RFC.
>
> Thanks,
> Yakov
>
>
>
> On Sun, Feb 23, 2014 at 1:23 PM, Jeni Tennison <jeni@jenitennison.com>
> wrote:
> > Hi,
> >
> > Following the call last week, I have made some updates to the "Syntax
> for Tabular Data on the Web" document at
> >
> >   http://w3c.github.io/csvw/syntax/
> >
> > Namely:
> >
> >   * I have separated out three levels of data model:
> >     * a core data model which is just tables/columns/rows/fields
> >     * an annotated data model in which each of these can be annotated
> >     * a grouped data model in which there are multiple tables in a group
> >
> >   * I have stated that the ordering of columns is significant in the
> core data model
> >
> > I have defined the annotated data model extremely loosely: it just says
> that tables, columns, rows, fields and regions can be annotated, but it
> doesn't say anything about what those annotations might look like (eg that
> one of the annotations might be the *type* of a value). I think the
> direction I'd like to take that is to retain this very loose definition and
> then state that there are certain annotations (eg 'type', 'unique') that
> are understood by particular types of applications (eg validators,
> converters) in particular ways. Does that seem like a reasonable approach?
> >
> > I haven't made any attempt to tackle the syntax for annotated or grouped
> tables as yet.
> >
> > Jeni
> > --
> > Jeni Tennison
> > http://www.jenitennison.com/
> >
>
>
>

Received on Monday, 24 February 2014 11:14:20 UTC