Re: CSVs and provenance from Ceolin, D. on 2014-02-28 (public-csv-wg@w3.org from February 2014)

From: Ceolin, D. <d.ceolin@vu.nl>
Date: Fri, 28 Feb 2014 14:07:54 +0000
To: Yakov Shafranovich <yakov-ietf@shaftek.org>
CC: Eric Stephan <ericphb@gmail.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <37B8334E-95B5-4116-90C3-279143568530@vu.nl>

I think that an HTTP user agent handles the request and delivery, but not necessarily the creation of the CSV.
If I got it correctly, we can identify at least two activities (prov:Activity):

- delivery: the activity of delivery of the CSV, that is attributed to an agent (the http user agent by default?). Can affect the rendering, etc.
- generation: the activity of generation of the CSV. Determines the value contained in the file, etc.

The two may share one or more element (e.g. the agent controlling them), but this is not mandatory.
Did I miss anything?

Davide


Il giorno 28/feb/2014, alle ore 14.21, Yakov Shafranovich ha scritto:

> Here is HTTP's definition in RFC 2616, section 14.3:
> 
> https://www.ietf.org/rfc/rfc2616.txt
> 
> Yakov
> 
> On Fri, Feb 28, 2014 at 1:07 AM, Eric Stephan <ericphb@gmail.com> wrote:
>> Yakov,
>> 
>> Yes it does fit within the concept of provenance, and yes I think it
>> would be good to capture.
>> 
>> Cheers,
>> 
>> Eric
>> 
>> On Thu, Feb 27, 2014 at 8:23 PM, Yakov Shafranovich
>> <yakov-ietf@shaftek.org> wrote:
>>> I think something similar to the concept of "User-Agent" in HTTP or
>>> email would be helpful. Knowing what software and version generated a
>>> given CSV file would help to interpret it.
>>> 
>>> Not sure if this fits within the concept of provenance.
>>> 
>>> Yakov
>>> 
>>> On Thu, Feb 27, 2014 at 2:39 PM, Ceolin, D. <d.ceolin@vu.nl> wrote:
>>>> Hi Eric,
>>>> 
>>>> I should have something, but not much. So yes please, that would be very helpful.
>>>> Thanks,
>>>> 
>>>> Davide
>>>> 
>>>> Il giorno 27/feb/2014, alle ore 15.48, Eric Stephan ha scritto:
>>>> 
>>>>> Davide,
>>>>> 
>>>>> Great idea, I feel this is very important and a huge problem for
>>>>> anyone who has to maintain a CSV and track changes.  I'd love to see a
>>>>> use case on this.  If you need any help with a real world use case let
>>>>> me know, there are plenty in the science arena.
>>>>> 
>>>>> 
>>>>> Eric
>>>>> 
>>>>> On Thu, Feb 27, 2014 at 1:01 AM, Ceolin, D. <d.ceolin@vu.nl> wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> I've seen some hints of provenance around, but I'd like to tackle the problem a little bit deeper.
>>>>>> I believe that there are at least two provenance issues, that are related each other and that probably need a standardized handling:
>>>>>> - if a CSV file is obtained from a spreadsheet, it's likely that one or more 'cells' result from formulas applied to other cells in the same CSV. Probably (a simplified version of) PROV is a good candidate to represent such relations? If I'm not wrong, there was some related discussion floating around in the chat two telcos ago (about "sum" cells?).
>>>>>> - also, the whole CSV file may be the result of a specific process, especially if it represents a DB dump and/or the result of a computation. It would be useful to be able to annotate these files with their provenance.
>>>>>> 
>>>>>> I'm not sure if this is in the scope of the working group, but I believe that at least part of it is.
>>>>>> Cheers,
>>>>>> 
>>>>>> Davide
>>>>>> 
>>>>>> 
>>>> 
>>>>

Received on Friday, 28 February 2014 14:08:24 UTC