Re: CSVs and provenance from Ivan Herman on 2014-03-03 (public-csv-wg@w3.org from March 2014)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 3 Mar 2014 10:19:34 +0100
To: "Ceolin, D." <d.ceolin@vu.nl>
Cc: Jeni Tennison <jeni@jenitennison.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <B3B44532-97D5-47D1-A8D5-67D37EC87BB4@w3.org>
On 02 Mar 2014, at 19:17 , Ceolin, D. <d.ceolin@vu.nl> wrote:

> Hi Jeni,
> 
> I had the impression that provenance is a preeminent class of information, compared with other possible annotation types, since it allows modeling spreadsheet formulas (in a few use cases we talk about spreadsheets or excel files), etc., and therefore needs particular care (e.g. to uniformly represent provenance metadata that recur in many CSV files). But maybe I'm just biased towards this.
> Cheers,

While I agree that Provenance is preeminent, I am with Jeni that we should avoid reinventing the wheel. Prov, or simplified sub-vocabularies thereof like PAV[1], and others are out there to describe provenance and we all know that these description requires specialized knowledge. The metadata provided to a CSV file should define terms only which are very specifically CSV terms, and, just as importantly, should provide placeholders, 'hooks', to refer to existing vocabularies. This is not only the vocabulary case; we could, probably, get into huge email thread on defining the various legal/copyright/not-copyright/ etc terms for CSV data, but we should probably not do that but, instead, rely on existing terms when possible...

We can go as far as listing some vocabularies in some areas that we believe are important for CSV and we _advise_ users to use. I emphasize 'advise': ie, it should not be a recommended list (ie, it should not be part of a 'normative' section of any document) just an information.

Ivan



[1] http://pav-ontology.googlecode.com/svn/trunk/pav.html

> 
> Davide
> 
> 
> Il giorno 01/mar/2014, alle ore 23.59, Jeni Tennison ha scritto:
> 
>> Hi Davide,
>> 
>> I’d suggest that the provenance of a particular table/row/column/field is just one of the many kinds of annotations that you could have. If you did have a provenance annotation then it should use PROV [1]. I can’t think of anything that marks provenance as different from any other type of annotation (or in need of special handling), but perhaps you have something in mind?
>> 
>> Jeni
>> 
>> [1] http://www.w3.org/TR/prov-primer/
>> 
>> ------------------------------------------------------
>> From: Ceolin, D. d.ceolin@vu.nl
>> Reply: Ceolin, D. d.ceolin@vu.nl
>> Date: 27 February 2014 at 09:06:58
>> To: W3C CSV on the Web Working Group public-csv-wg@w3.org
>> Subject:  CSVs and provenance
>> 
>>> 
>>> Hi all,
>>> 
>>> I've seen some hints of provenance around, but I'd like to tackle  
>>> the problem a little bit deeper.
>>> I believe that there are at least two provenance issues, that  
>>> are related each other and that probably need a standardized  
>>> handling:
>>> - if a CSV file is obtained from a spreadsheet, it's likely that  
>>> one or more 'cells' result from formulas applied to other cells  
>>> in the same CSV. Probably (a simplified version of) PROV is a good  
>>> candidate to represent such relations? If I'm not wrong, there  
>>> was some related discussion floating around in the chat two telcos  
>>> ago (about "sum" cells?).
>>> - also, the whole CSV file may be the result of a specific process,  
>>> especially if it represents a DB dump and/or the result of a computation.  
>>> It would be useful to be able to annotate these files with their  
>>> provenance.
>>> 
>>> I'm not sure if this is in the scope of the working group, but I believe  
>>> that at least part of it is.
>>> Cheers,
>>> 
>>> Davide
>>> 
>>> 
>>> 
>>> 
>> 
>> --  
>> Jeni Tennison
>> http://www.jenitennison.com/
> 
> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf
Received on Monday, 3 March 2014 09:20:06 UTC