Re: Provenance

OKdokes. There's several possible issues, as it's easy to confuse a URI 
for the dataset with a URI or URL for the actual document. eg. You can 
have an RDF dataset expressed as .rdf or .ttl or .ntriples etc. Let's 
just assume it's all URLs for now.

Source CSV: http://example.org/input.csv
Output RDF: http://example.org/output.rdf
CSV Metadata: http://example.org/myformat.metadata

@prefix time: <http://www.w3.org/2006/time#>.
@prefix prov: <http://www.w3.org/ns/prov#>.
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#>.

<http://example.org/output.rdf#provenance> a prov:Activity ;
     prov:endedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ;
     prov:startedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ;
     prov:generated <http://example.org/output.rdf> ;
     prov:used <http://example.org/input.csv>, <http://example.org/myformat.metadata> .

What would make this far more useful is a single additional triple which 
indicates that the process used was a specific standard process. eg. w3c 
csv->rdf v1.0. Also possibly a way to distinguish that one "used" 
document describes the other. A more complex example would be this (I've 
just busked it and invented some csv2rdf properties, I'm not 
recommending them as-is)

<http://example.org/output.rdf#provenance> a prov:Activity ;
     prov:endedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ;
     prov:startedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ;
     prov:generated <http://example.org/output.rdf> ;

    prov:qualifiedUsage [
       a prov:Usage;
       prov:entity    <http://example.org/input.csv> ;
       prov:hadRole   csv2rdf:tabularDataToConvert .
    ];

    prov:qualifiedUsage [
       a prov:Usage;
       prov:entity    <http://example.org/myformat.metadata> ;
       prov:hadRole   csv2rdf:tabularMetadata .
    ] .

<http://example.org/myformat.metadata> a csv2rdf:TabularDataMetadataDocument ;
     csv2rdf:describes <http://example.org/input.csv> .
  




On 21/05/14 10:35, Ivan Herman wrote:
> Christopher,
>
> I think it is a good idea to add some provenance information to the output. Do you think you can write down, at least in a sketch, what triples you think should be generated using the metadata information we have in the metadata document?
>
> Thanks
>
> Ivan
>
> P.S. You probably know the saying: no good deed goes unpunished:-)
>
>
>
> On 21 May 2014, at 11:02 , Christopher Gutteridge <cjg@ecs.soton.ac.uk> wrote:
>
>> While it's not a top priority, I see an exciting use for some of the recent provenance vocab. work. For the Tabular(CSV)->Graph(RDF) route anyhow, as it's possible to add extra triples. We may well know the URI of the source table, and the URI of the metadata document. That's provenance right there. I would suggest (not as a high priority) that a recommended RDF way to express this relationship could be included in this work. eg. The triples in the output RDF saying it was generated from source document(s) X, using metadata Y and process Z at a given time & date by an agent (the organisation/person/system making the conversion).
>>
>> It should be just a handful of extra triples, and optional, but it would be good to give people a standard to follow. And also URIs to reference for the process followed (the algorithms being discussed now).
>>
>> You can see an example of what I mean at the top of this TTL file:
>> http://data.southampton.ac.uk/dumps/jargon/2014-05-08/jargon.ttl
>> (ignore the http://purl.org/void/provenance/ns/ triples, that was the previous vocab we used and are now transitioning to http://www.w3.org/ns/prov#)
>> -- 
>> Christopher Gutteridge --
>> http://users.ecs.soton.ac.uk/cjg
>>
>>
>> University of Southampton Open Data Service:
>> http://data.southampton.ac.uk/
>>
>> You should read the ECS Web Team blog:
>> http://blogs.ecs.soton.ac.uk/webteam/
>>
>>
>>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
>

Received on Wednesday, 21 May 2014 10:37:06 UTC