Re: Billion Triples Challenge Crawl 2014 from Michel Dumontier on 2014-02-18 (semantic-web@w3.org from February 2014)

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Mon, 17 Feb 2014 20:42:42 -0800
To: Tim Berners-Lee <timbl@w3.org>
Cc: Andreas Harth <andreas@harth.org>, SWIG Web <semantic-web@w3.org>
Message-ID: <CALcEXf534ROArAspeGGSV6Dny3uuweOc05QBPJY8jDpfar226w@mail.gmail.com>

Hi Tim,
  That folder contains 350GB of compressed RDF. I'm not about to unzip it
because a crawler can't decompress it on the fly.  Honestly, it worries me
that people aren't considering the practicalities of storing, indexing, and
presenting all this data.
  Nevertheless, Bio2RDF does provide void definitions, URI resolution, and
access to SPARQL endpoints.  I can only hope our data gets discovered.

m.

Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford
University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com


On Sat, Feb 15, 2014 at 10:31 PM, Tim Berners-Lee <timbl@w3.org> wrote:

> On 2014-02 -14, at 09:46, Michel Dumontier wrote:
>
> Andreas,
>
>  I'd like to help by getting bio2rdf data into the crawl, really. but we
> gzip all of our files, and they are in n-quads format.
>
> http://download.bio2rdf.org/release/3/
>
> think you can add gzip/bzip2 support ?
>
> m.
>
> Michel Dumontier
> Associate Professor of Medicine (Biomedical Informatics), Stanford
> University
> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest
> Group
> http://dumontierlab.com
>
>
> An on 2014-02 -15, at 18:00, Hugh Glaser wrote:
>
> Hi Andreas and Tobias.
> Good luck!
> Actually, I think essentially ignoring dumps and doing a "real" crawl, is
> a feature, rather than a bug.
>
>
>
> Michel,
>
> Agree with High. I would encourage you unzip the data files on your own
> servers
> so the URIs will work and your data is really Linked Data.
> There are lots of advantages to the community to be compatible.
>
> Tim
>
>
>

Received on Tuesday, 18 February 2014 04:43:30 UTC