Re: RDF data archives from David Booth on 2013-12-06 (semantic-web@w3.org from December 2013)

From: David Booth <david@dbooth.org>
Date: Fri, 06 Dec 2013 01:19:37 -0500
To: Michel Dumontier <michel.dumontier@gmail.com>, w3c semweb hcls <public-semweb-lifesci@w3.org>, "public-lod@w3.org" <public-lod@w3.org>, SWIG Web <semantic-web@w3.org>, bio2rdf <bio2rdf@googlegroups.com>
Message-ID: <52A16C79.6090003@dbooth.org>

Hi Michel,

On 12/05/2013 07:05 PM, Michel Dumontier wrote:
> Hi all,
>   As you may know, Bio2RDF produces RDF dumps of its RDF datasets [1,2].
> For each dataset, we generate a dataset description file (as per [3];
> example [4]) that is in n-triples format, while the dataset is comprised
> of one or more *gzipped* n-triple files. I just noticed that LODStats
> did not correctly parse [5] these files to generate the dataset
> statistics, owing, perhaps, to the assignment of
> "application/x-ntriples" in the relevant datahub.io <http://datahub.io>
> resource metadata.
> I'd like to know what mime type we should specify for zipped, gzipped
> RDF data.

If you assume that the recipient will want to unzip them before parsing 
(as opposed to parsing *while* unzipping) then you could use a normal 
RDF MIME type but specify a gzip HTTP Content-Encoding:
http://stackoverflow.com/questions/864448/how-to-set-content-encoding-with-gzip

David
>
> as we prepare for our next release, we're planning to generate n-quads
> for the datasets, thereby linking versioned datasets with their
> metadata. we are wondering whether there will be sufficient support for
> this format. Also, we are wondering whether it would be problematic to
> provide single file downloads that are tar.gz  formatted.
>
> comments and suggestions most welcome,
>
> m.
>
>
> [1] http://bio2rdf.org/datasets
> [2] http://download.bio2rdf.org/
> [3]
> https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-Dataset-Provenance
> [4]
> http://download.bio2rdf.org/current/affymetrix/bio2rdf-affymetrix-20121004.nt
> [5] http://stats.lod2.eu/rdfdocs?search=bio2rdf
>
> --
> Michel Dumontier
> Associate Professor of Medicine (Biomedical Informatics), Stanford
> University
> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
> http://dumontierlab.com

Received on Friday, 6 December 2013 06:20:06 UTC