Re: Billion Triples Challenge Crawl 2014 from Tim Berners-Lee on 2014-02-16 (semantic-web@w3.org from February 2014)

From: Tim Berners-Lee <timbl@w3.org>
Date: Sun, 16 Feb 2014 08:31:45 +0200
To: Michel Dumontier <michel.dumontier@gmail.com>
Cc: Andreas Harth <andreas@harth.org>, SWIG Web <semantic-web@w3.org>
Message-Id: <B05AFB67-3AA2-46E6-AA47-34A0E30B59E7@w3.org>

On 2014-02 -14, at 09:46, Michel Dumontier wrote:

> Andreas,
> 
>  I'd like to help by getting bio2rdf data into the crawl, really. but we gzip all of our files, and they are in n-quads format.
> 
> http://download.bio2rdf.org/release/3/
> 
> think you can add gzip/bzip2 support ?
> 
> m.
> 
> Michel Dumontier
> Associate Professor of Medicine (Biomedical Informatics), Stanford University
> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
> http://dumontierlab.com
> 


An on 2014-02 -15, at 18:00, Hugh Glaser wrote:

> Hi Andreas and Tobias.
> Good luck!
> Actually, I think essentially ignoring dumps and doing a “real” crawl, is a feature, rather than a bug.


Michel, 

Agree with High. I would encourage you unzip the data files on your own servers 
so the URIs will work and your data is really Linked Data.
There are lots of advantages to the community to be compatible.

Tim

Received on Sunday, 16 February 2014 09:05:45 UTC