Zippy from Ross Horne on 2014-02-18 (semantic-web@w3.org from February 2014)

From: Ross Horne <ross.horne@gmail.com>
Date: Tue, 18 Feb 2014 14:53:23 +0600
To: Michel Dumontier <michel.dumontier@gmail.com>
Cc: Tim Berners-Lee <timbl@w3.org>, Andreas Harth <andreas@harth.org>, SWIG Web <semantic-web@w3.org>
Message-ID: <CAHBrK_j2gkRr+DayEvWN99qFBFtYvUUomhPxgMoOPoQ3PiVPsw@mail.gmail.com>

Hi Michel,

I think you point is worth considering. Google make heavy use of
zippy, rather than gzip, simply to reduce the latency of reading and
sending large amounts of data, see [1]. (Of course, storage is not a
limitation.)

Could zippy have a role in Linked Data protocols?

Regards,

Ross

[1] Dean, Jeff. "Designs, lessons and advice from building large
distributed systems." Keynote from LADIS (2009).
http://www.lamsade.dauphine.fr/~litwin/cours98/CoursBD/doc/dean-keynote-ladis2009_scalable_distributed_google_system.pdf


On 18 February 2014 10:42, Michel Dumontier <michel.dumontier@gmail.com> wrote:
> Hi Tim,
>   That folder contains 350GB of compressed RDF. I'm not about to unzip it
> because a crawler can't decompress it on the fly.  Honestly, it worries me
> that people aren't considering the practicalities of storing, indexing, and
> presenting all this data.
>   Nevertheless, Bio2RDF does provide void definitions, URI resolution, and
> access to SPARQL endpoints.  I can only hope our data gets discovered.
>
> m.
>
> Michel Dumontier
> Associate Professor of Medicine (Biomedical Informatics), Stanford
> University
> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
> http://dumontierlab.com
>
>
> On Sat, Feb 15, 2014 at 10:31 PM, Tim Berners-Lee <timbl@w3.org> wrote:
>>
>> On 2014-02 -14, at 09:46, Michel Dumontier wrote:
>>
>> Andreas,
>>
>>  I'd like to help by getting bio2rdf data into the crawl, really. but we
>> gzip all of our files, and they are in n-quads format.
>>
>> http://download.bio2rdf.org/release/3/
>>
>> think you can add gzip/bzip2 support ?
>>
>> m.
>>
>> Michel Dumontier
>> Associate Professor of Medicine (Biomedical Informatics), Stanford
>> University
>> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest
>> Group
>> http://dumontierlab.com
>>
>>
>> An on 2014-02 -15, at 18:00, Hugh Glaser wrote:
>>
>> Hi Andreas and Tobias.
>> Good luck!
>> Actually, I think essentially ignoring dumps and doing a "real" crawl, is
>> a feature, rather than a bug.
>>
>>
>>
>> Michel,
>>
>> Agree with High. I would encourage you unzip the data files on your own
>> servers
>> so the URIs will work and your data is really Linked Data.
>> There are lots of advantages to the community to be compatible.
>>
>> Tim
>>
>>
>

Received on Tuesday, 18 February 2014 08:53:51 UTC