- From: Michel Dumontier <michel.dumontier@gmail.com>
- Date: Mon, 17 Feb 2014 20:42:42 -0800
- To: Tim Berners-Lee <timbl@w3.org>
- Cc: Andreas Harth <andreas@harth.org>, SWIG Web <semantic-web@w3.org>
- Message-ID: <CALcEXf534ROArAspeGGSV6Dny3uuweOc05QBPJY8jDpfar226w@mail.gmail.com>
Hi Tim, That folder contains 350GB of compressed RDF. I'm not about to unzip it because a crawler can't decompress it on the fly. Honestly, it worries me that people aren't considering the practicalities of storing, indexing, and presenting all this data. Nevertheless, Bio2RDF does provide void definitions, URI resolution, and access to SPARQL endpoints. I can only hope our data gets discovered. m. Michel Dumontier Associate Professor of Medicine (Biomedical Informatics), Stanford University Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group http://dumontierlab.com On Sat, Feb 15, 2014 at 10:31 PM, Tim Berners-Lee <timbl@w3.org> wrote: > On 2014-02 -14, at 09:46, Michel Dumontier wrote: > > Andreas, > > I'd like to help by getting bio2rdf data into the crawl, really. but we > gzip all of our files, and they are in n-quads format. > > http://download.bio2rdf.org/release/3/ > > think you can add gzip/bzip2 support ? > > m. > > Michel Dumontier > Associate Professor of Medicine (Biomedical Informatics), Stanford > University > Chair, W3C Semantic Web for Health Care and the Life Sciences Interest > Group > http://dumontierlab.com > > > An on 2014-02 -15, at 18:00, Hugh Glaser wrote: > > Hi Andreas and Tobias. > Good luck! > Actually, I think essentially ignoring dumps and doing a "real" crawl, is > a feature, rather than a bug. > > > > Michel, > > Agree with High. I would encourage you unzip the data files on your own > servers > so the URIs will work and your data is really Linked Data. > There are lots of advantages to the community to be compatible. > > Tim > > >
Received on Tuesday, 18 February 2014 04:43:30 UTC