Graph metadata example [was: Fwd: RDF data archives] from Guus Schreiber on 2013-12-06 (public-rdf-wg@w3.org from December 2013)

From: Guus Schreiber <guus.schreiber@vu.nl>
Date: Fri, 6 Dec 2013 02:13:21 +0100
To: RDF WG <public-rdf-wg@w3.org>
Message-ID: <52A124B1.1020206@vu.nl>

Just to put the discussion about the one graph metadata triple in 
perspective:

* Check out reference 3 in the message below [1] (sent today by a W3C 
chair).

* Also, check the Prov bundle example in Sec. 4.2.3 of the provenance 
book written by the Prov chairs [2]. BTW Our response to the Prov WG a 
year ago is here [23.

This is simply to point out that ignoring practice is not going to help. 
An example *with proper caveats* might help a little bit.  Proper 
references to the Dataset Note will also likely be helpful.

Guus

[1] 
https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-Dataset-Provenance
[2] 
http://books.google.nl/books?id=8aBeAQAAQBAJ&pg=PT72&lpg=PT72&dq=provenance+trig&source=bl&ots=eardCUYmGt&sig=4EA4vZlWoXR-2o3CbEJwzDEPC4U&hl=en&sa=X&ei=CvqgUub1JOqf0QXSxYGACw&ved=0CFAQ6AEwBA#v=onepage&q=provenance%20trig&f=false
[3]  http://lists.w3.org/Archives/Public/public-rdf-wg/2012Oct/0208.html

-------- Original Message --------
Subject: 	RDF data archives
Resent-Date: 	Fri, 6 Dec 2013 00:06:33 +0000
Resent-From: 	<semantic-web@w3.org>
Date: 	Thu, 5 Dec 2013 16:05:38 -0800
From: 	Michel Dumontier <michel.dumontier@gmail.com>
To: 	w3c semweb hcls <public-semweb-lifesci@w3.org>, "public-lod@w3.org"
<public-lod@w3.org>, SWIG Web <semantic-web@w3.org>, bio2rdf
<bio2rdf@googlegroups.com>



Hi all,
   As you may know, Bio2RDF produces RDF dumps of its RDF datasets [1,2].
For each dataset, we generate a dataset description file (as per [3];
example [4]) that is in n-triples format, while the dataset is comprised
of one or more *gzipped* n-triple files. I just noticed that LODStats
did not correctly parse [5] these files to generate the dataset
statistics, owing, perhaps, to the assignment of
"application/x-ntriples" in the relevant datahub.io <http://datahub.io>
resource metadata.
I'd like to know what mime type we should specify for zipped, gzipped
RDF data.

as we prepare for our next release, we're planning to generate n-quads
for the datasets, thereby linking versioned datasets with their
metadata. we are wondering whether there will be sufficient support for
this format. Also, we are wondering whether it would be problematic to
provide single file downloads that are tar.gz  formatted.

comments and suggestions most welcome,

m.


[1] http://bio2rdf.org/datasets
[2] http://download.bio2rdf.org/
[3]
https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-Dataset-Provenance
[4]
http://download.bio2rdf.org/current/affymetrix/bio2rdf-affymetrix-20121004.nt
[5] http://stats.lod2.eu/rdfdocs?search=bio2rdf

-- 
Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford
University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Received on Friday, 6 December 2013 01:13:54 UTC