- From: Ivan Herman <ivan@w3.org>
- Date: Wed, 21 May 2014 13:53:59 +0200
- To: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
- Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-Id: <5DCACCB2-AA34-4D99-90A8-03D8EEF16ADF@w3.org>
Thanks. I think this all looks reasonable (and I would probably go for the slightly longer version). I actually think that similar information should be put into the generated JSON, too, regardless of whether it is JSON-LD or not. I would propose, just that we would not forget this, to add an open issue to our issue management on github[1], with a reference to this thread. I presume that we have, as you say, more urgent things to solve first, so it is not yet a priority, but we should not forget to deal with it eventually; that is why the issue handling has been invented:-) Thanks a lot Christopher for raising this! Ivan P.S. Christopher, I do not think I have added you to github already; this would be necessary for you to edit document and add issues. Can you send me your github handle? Thanks. [1] https://github.com/w3c/csvw/issues On 21 May 2014, at 12:38 , Christopher Gutteridge <cjg@ecs.soton.ac.uk> wrote: > OKdokes. There's several possible issues, as it's easy to confuse a URI for the dataset with a URI or URL for the actual document. eg. You can have an RDF dataset expressed as .rdf or .ttl or .ntriples etc. Let's just assume it's all URLs for now. > > Source CSV: http://example.org/input.csv > Output RDF: http://example.org/output.rdf > CSV Metadata: http://example.org/myformat.metadata > > @prefix time: <http://www.w3.org/2006/time#>. > @prefix prov: <http://www.w3.org/ns/prov#>. > @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. > > <http://example.org/output.rdf#provenance> a prov:Activity ; > prov:endedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ; > prov:startedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ; > prov:generated <http://example.org/output.rdf> ; > prov:used <http://example.org/input.csv>, <http://example.org/myformat.metadata> . > > What would make this far more useful is a single additional triple which indicates that the process used was a specific standard process. eg. w3c csv->rdf v1.0. Also possibly a way to distinguish that one "used" document describes the other. A more complex example would be this (I've just busked it and invented some csv2rdf properties, I'm not recommending them as-is) > > <http://example.org/output.rdf#provenance> a prov:Activity ; > prov:endedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ; > prov:startedAtTime> "2014-05-21T09:50:01+01:00"^^xsd:dateTime ; > prov:generated <http://example.org/output.rdf> ; > > prov:qualifiedUsage [ > a prov:Usage; > prov:entity <http://example.org/input.csv> ; > prov:hadRole csv2rdf:tabularDataToConvert . > ]; > > prov:qualifiedUsage [ > a prov:Usage; > prov:entity <http://example.org/myformat.metadata> ; > prov:hadRole csv2rdf:tabularMetadata . > ] . > > <http://example.org/myformat.metadata> a csv2rdf:TabularDataMetadataDocument ; > csv2rdf:describes <http://example.org/input.csv> . > > > > > On 21/05/14 10:35, Ivan Herman wrote: >> Christopher, >> >> I think it is a good idea to add some provenance information to the output. Do you think you can write down, at least in a sketch, what triples you think should be generated using the metadata information we have in the metadata document? >> >> Thanks >> >> Ivan >> >> P.S. You probably know the saying: no good deed goes unpunished:-) >> >> >> >> On 21 May 2014, at 11:02 , Christopher Gutteridge <cjg@ecs.soton.ac.uk> wrote: >> >>> While it's not a top priority, I see an exciting use for some of the recent provenance vocab. work. For the Tabular(CSV)->Graph(RDF) route anyhow, as it's possible to add extra triples. We may well know the URI of the source table, and the URI of the metadata document. That's provenance right there. I would suggest (not as a high priority) that a recommended RDF way to express this relationship could be included in this work. eg. The triples in the output RDF saying it was generated from source document(s) X, using metadata Y and process Z at a given time & date by an agent (the organisation/person/system making the conversion). >>> >>> It should be just a handful of extra triples, and optional, but it would be good to give people a standard to follow. And also URIs to reference for the process followed (the algorithms being discussed now). >>> >>> You can see an example of what I mean at the top of this TTL file: >>> http://data.southampton.ac.uk/dumps/jargon/2014-05-08/jargon.ttl >>> (ignore the http://purl.org/void/provenance/ns/ triples, that was the previous vocab we used and are now transitioning to http://www.w3.org/ns/prov#) >>> -- >>> Christopher Gutteridge -- >>> http://users.ecs.soton.ac.uk/cjg >>> >>> >>> University of Southampton Open Data Service: >>> http://data.southampton.ac.uk/ >>> >>> You should read the ECS Web Team blog: >>> http://blogs.ecs.soton.ac.uk/webteam/ >>> >>> >>> >> >> ---- >> Ivan Herman, W3C >> Digital Publishing Activity Lead >> Home: http://www.w3.org/People/Ivan/ >> mobile: +31-641044153 >> GPG: 0x343F1A3D >> WebID: http://www.ivan-herman.net/foaf#me >> >> >> >> >> > ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me
Received on Wednesday, 21 May 2014 11:54:33 UTC