- From: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
- Date: Wed, 21 May 2014 11:38:15 +0100
- To: Ivan Herman <ivan@w3.org>
- CC: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
OKdokes. There's several possible issues, as it's easy to confuse a URI for the dataset with a URI or URL for the actual document. eg. You can have an RDF dataset expressed as .rdf or .ttl or .ntriples etc. Let's just assume it's all URLs for now. Source CSV: http://example.org/input.csv Output RDF: http://example.org/output.rdf CSV Metadata: http://example.org/myformat.metadata @prefix time: <http://www.w3.org/2006/time#>. @prefix prov: <http://www.w3.org/ns/prov#>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. <http://example.org/output.rdf#provenance> a prov:Activity ; prov:endedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ; prov:startedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ; prov:generated <http://example.org/output.rdf> ; prov:used <http://example.org/input.csv>, <http://example.org/myformat.metadata> . What would make this far more useful is a single additional triple which indicates that the process used was a specific standard process. eg. w3c csv->rdf v1.0. Also possibly a way to distinguish that one "used" document describes the other. A more complex example would be this (I've just busked it and invented some csv2rdf properties, I'm not recommending them as-is) <http://example.org/output.rdf#provenance> a prov:Activity ; prov:endedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ; prov:startedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ; prov:generated <http://example.org/output.rdf> ; prov:qualifiedUsage [ a prov:Usage; prov:entity <http://example.org/input.csv> ; prov:hadRole csv2rdf:tabularDataToConvert . ]; prov:qualifiedUsage [ a prov:Usage; prov:entity <http://example.org/myformat.metadata> ; prov:hadRole csv2rdf:tabularMetadata . ] . <http://example.org/myformat.metadata> a csv2rdf:TabularDataMetadataDocument ; csv2rdf:describes <http://example.org/input.csv> . On 21/05/14 10:35, Ivan Herman wrote: > Christopher, > > I think it is a good idea to add some provenance information to the output. Do you think you can write down, at least in a sketch, what triples you think should be generated using the metadata information we have in the metadata document? > > Thanks > > Ivan > > P.S. You probably know the saying: no good deed goes unpunished:-) > > > > On 21 May 2014, at 11:02 , Christopher Gutteridge <cjg@ecs.soton.ac.uk> wrote: > >> While it's not a top priority, I see an exciting use for some of the recent provenance vocab. work. For the Tabular(CSV)->Graph(RDF) route anyhow, as it's possible to add extra triples. We may well know the URI of the source table, and the URI of the metadata document. That's provenance right there. I would suggest (not as a high priority) that a recommended RDF way to express this relationship could be included in this work. eg. The triples in the output RDF saying it was generated from source document(s) X, using metadata Y and process Z at a given time & date by an agent (the organisation/person/system making the conversion). >> >> It should be just a handful of extra triples, and optional, but it would be good to give people a standard to follow. And also URIs to reference for the process followed (the algorithms being discussed now). >> >> You can see an example of what I mean at the top of this TTL file: >> http://data.southampton.ac.uk/dumps/jargon/2014-05-08/jargon.ttl >> (ignore the http://purl.org/void/provenance/ns/ triples, that was the previous vocab we used and are now transitioning to http://www.w3.org/ns/prov#) >> -- >> Christopher Gutteridge -- >> http://users.ecs.soton.ac.uk/cjg >> >> >> University of Southampton Open Data Service: >> http://data.southampton.ac.uk/ >> >> You should read the ECS Web Team blog: >> http://blogs.ecs.soton.ac.uk/webteam/ >> >> >> > > ---- > Ivan Herman, W3C > Digital Publishing Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > GPG: 0x343F1A3D > WebID: http://www.ivan-herman.net/foaf#me > > > > >
Received on Wednesday, 21 May 2014 10:37:06 UTC