Re: Finding Metadata for CSV Files from Jeni Tennison on 2014-03-09 (public-csv-wg@w3.org from March 2014)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sun, 9 Mar 2014 15:17:22 +0000
To: Ivan Herman <ivan@w3.org>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <etPan.531c8602.643c9869.3e04@jenit.local>

From: Ivan Herman ivan@w3.org Date: 9 March 2014 at 11:14:03:
> _Personally_ I am a little bit wary on an approach that requires a modification of the 
> file itself. If we think (do use cases say that?) that the data is often produced by other  
> tools (excel or any other data dump) than an ulterior modification of a possibly big CSV  
> file seems to be problematic. The HTTP header and the naming convention approaches have  
> the merit of leaving the file intact…

Your argument holds for pre-existing files, but not for files newly created in Excel, or new application code created for dumping data.

If a file is generated from Excel then it is generated by someone editing the spreadsheet in Excel. So long as the syntax doesn’t require characters that are interpreted weirdly by Excel (eg start with = or something) then it doesn’t seem unreasonable to think that people can include extra things when they are generating the file. In fact our use cases show that they do this a lot, and I would argue that they are a lot more likely to add metadata while they are editing the spreadsheet in Excel than fire up a text editor and write a document.

Similarly, if new code is being written to create a dump then including metadata within that dump is going to be easier than creating and writing to a separate file to hold that metadata.

So don’t think of the embedding option as being about modifying existing files. Instead, think of it for when new tabular data files are being created.

Jeni
--  
Jeni Tennison
http://www.jenitennison.com/

Received on Sunday, 9 March 2014 15:17:46 UTC