- From: Ross Horne <ross.horne@gmail.com>
- Date: Tue, 18 Feb 2014 14:53:23 +0600
- To: Michel Dumontier <michel.dumontier@gmail.com>
- Cc: Tim Berners-Lee <timbl@w3.org>, Andreas Harth <andreas@harth.org>, SWIG Web <semantic-web@w3.org>
Hi Michel, I think you point is worth considering. Google make heavy use of zippy, rather than gzip, simply to reduce the latency of reading and sending large amounts of data, see [1]. (Of course, storage is not a limitation.) Could zippy have a role in Linked Data protocols? Regards, Ross [1] Dean, Jeff. "Designs, lessons and advice from building large distributed systems." Keynote from LADIS (2009). http://www.lamsade.dauphine.fr/~litwin/cours98/CoursBD/doc/dean-keynote-ladis2009_scalable_distributed_google_system.pdf On 18 February 2014 10:42, Michel Dumontier <michel.dumontier@gmail.com> wrote: > Hi Tim, > That folder contains 350GB of compressed RDF. I'm not about to unzip it > because a crawler can't decompress it on the fly. Honestly, it worries me > that people aren't considering the practicalities of storing, indexing, and > presenting all this data. > Nevertheless, Bio2RDF does provide void definitions, URI resolution, and > access to SPARQL endpoints. I can only hope our data gets discovered. > > m. > > Michel Dumontier > Associate Professor of Medicine (Biomedical Informatics), Stanford > University > Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group > http://dumontierlab.com > > > On Sat, Feb 15, 2014 at 10:31 PM, Tim Berners-Lee <timbl@w3.org> wrote: >> >> On 2014-02 -14, at 09:46, Michel Dumontier wrote: >> >> Andreas, >> >> I'd like to help by getting bio2rdf data into the crawl, really. but we >> gzip all of our files, and they are in n-quads format. >> >> http://download.bio2rdf.org/release/3/ >> >> think you can add gzip/bzip2 support ? >> >> m. >> >> Michel Dumontier >> Associate Professor of Medicine (Biomedical Informatics), Stanford >> University >> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest >> Group >> http://dumontierlab.com >> >> >> An on 2014-02 -15, at 18:00, Hugh Glaser wrote: >> >> Hi Andreas and Tobias. >> Good luck! >> Actually, I think essentially ignoring dumps and doing a "real" crawl, is >> a feature, rather than a bug. >> >> >> >> Michel, >> >> Agree with High. I would encourage you unzip the data files on your own >> servers >> so the URIs will work and your data is really Linked Data. >> There are lots of advantages to the community to be compatible. >> >> Tim >> >> >
Received on Tuesday, 18 February 2014 08:53:51 UTC