Re: Finding Metadata for CSV Files from Ivan Herman on 2014-03-09 (public-csv-wg@w3.org from March 2014)

From: Ivan Herman <ivan@w3.org>
Date: Sun, 9 Mar 2014 16:52:47 +0100
To: Jeni Tennison <jeni@jenitennison.com>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <EE3F9965-F080-4DB6-A869-556D614EDD2E@w3.org>

True for Excel and similar cases. But there are lots of other tools. I run, for example, simple Database systems on my machine (Bento, TapForm). I can export my data in CSV without any problem, but the generated CSV is a regular mirror of my database records, and I have no possibility of adding any extra information to it. I do not know whether the CSV dump of data in, say, Filemaker or Access gives you more flexibility; but there are also hundreds of application out there that can be used to dump csv files, without giving access, necessarily, to the tables themselves.

I understand that, in some cases, this approach is fine. But the reason I think we must indeed include several alternatives is because I would be wary for this to be the only option...

(I do not think we really disagree:-)

Ivan



On 09 Mar 2014, at 16:17 , Jeni Tennison <jeni@jenitennison.com> wrote:

> From: Ivan Herman ivan@w3.org Date: 9 March 2014 at 11:14:03:
>> _Personally_ I am a little bit wary on an approach that requires a modification of the 
>> file itself. If we think (do use cases say that?) that the data is often produced by other  
>> tools (excel or any other data dump) than an ulterior modification of a possibly big CSV  
>> file seems to be problematic. The HTTP header and the naming convention approaches have  
>> the merit of leaving the file intact…
> 
> Your argument holds for pre-existing files, but not for files newly created in Excel, or new application code created for dumping data.
> 
> If a file is generated from Excel then it is generated by someone editing the spreadsheet in Excel. So long as the syntax doesn’t require characters that are interpreted weirdly by Excel (eg start with = or something) then it doesn’t seem unreasonable to think that people can include extra things when they are generating the file. In fact our use cases show that they do this a lot, and I would argue that they are a lot more likely to add metadata while they are editing the spreadsheet in Excel than fire up a text editor and write a document.
> 
> Similarly, if new code is being written to create a dump then including metadata within that dump is going to be easier than creating and writing to a separate file to hold that metadata.
> 
> So don’t think of the embedding option as being about modifying existing files. Instead, think of it for when new tabular data files are being created.
> 
> Jeni
> --  
> Jeni Tennison
> http://www.jenitennison.com/
> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

Received on Sunday, 9 March 2014 15:53:18 UTC