- From: Andy Seaborne <andy@apache.org>
- Date: Mon, 24 Feb 2014 18:24:35 +0000
- To: public-csv-wg@w3.org
On 24/02/14 17:15, Jeni Tennison wrote: > Hi Yakov, > > I take your point about the header and the names: this is a design > decision that we have to make as a group. The main reasons I included > it were: > > (a) that as designed the tabular data model (based on SQL) requires > column names, and the header is the only place to get those The package description may provide them? I thinking of pointing into a larger table (e.g. totals row - a useful view) but I see that mysqldump writes CSV files without headers (trying it -- they are TSV files with .txt ext) > > (b) because existing good practices around CSV publication (ie from > the Simple Data Format) require headers in the CSV file > > (c) because it’s a lot easier to write tools that can make the > assumption that there are always headers than it is to write tools > for an optional header line; it’s usually impossible to automatically > detect whether a CSV file has a header line or not (people don’t > publish CSV with the correct "Content-Type: text/csv;header=yes” > header). > > An alternative design would be to have the data model say nothing > about column names, and to treat column names as annotations (like > types). > > Anyone else have any views on this? I think we should prefer (or stronger) "with header" to make the individual files self-contained. Andy > > Jeni > > ------------------------------------------------------ From: Yakov > Shafranovich yakov-ietf@shaftek.org Reply: Yakov Shafranovich > yakov-ietf@shaftek.org Date: 24 February 2014 at 11:06:26 To: Jeni > Tennison jeni@jenitennison.com Subject: Re: Model / Syntax Updates > >> >> Is there a particular reason why a header is always required, >> column names must be unique, and case sensitive? The draft says it >> is because of SQL compatibility, but it may be important to >> elaborate as to why. >> >> Specifically: - regarding case sensitivity, I am not sure if all >> SQL implementations are in fact case sensitive - regarding column >> name uniqueness - it sounds like we are assuming that the column >> name is the unique index to the data. However, I have seen often >> that the assumption in CSV files maybe that the column *number*, >> not the *name* serves as the index. This may also explain cases >> where the header is missing but the two systems communicating via >> CSV know the order of columns in the file and their significance >> >> Also, regarding the Unicode and end of line issues with RFC 4180 - >> those can be fixed via an updated RFC. >> >> Thanks, Yakov >> >> >> >> On Sun, Feb 23, 2014 at 1:23 PM, Jeni Tennison wrote: >>> Hi, >>> >>> Following the call last week, I have made some updates to the >> "Syntax for Tabular Data on the Web" document at >>> >>> http://w3c.github.io/csvw/syntax/ >>> >>> Namely: >>> >>> * I have separated out three levels of data model: * a core data >>> model which is just tables/columns/rows/fields * an annotated >>> data model in which each of these can be annotated * a grouped >>> data model in which there are multiple tables in a >> group >>> >>> * I have stated that the ordering of columns is significant in >> the core data model >>> >>> I have defined the annotated data model extremely loosely: >> it just says that tables, columns, rows, fields and regions can be >> annotated, but it doesn't say anything about what those >> annotations might look like (eg that one of the annotations might >> be the *type* of a value). I think the direction I'd like to take >> that is to retain this very loose definition and then state that >> there are certain annotations (eg 'type', 'unique') that are >> understood by particular types of applications (eg validators, >> converters) in particular ways. Does that seem like a reasonable >> approach? >>> >>> I haven't made any attempt to tackle the syntax for annotated >> or grouped tables as yet. >>> >>> Jeni -- Jeni Tennison http://www.jenitennison.com/ >>> >> >> >> >> > > -- Jeni Tennison http://www.jenitennison.com/ >
Received on Monday, 24 February 2014 18:25:05 UTC