Re: Dataset hosting from Yoav Weiss on 2013-11-05 (public-webdevdata@w3.org from November 2013)

From: Yoav Weiss <yoav@yoav.ws>
Date: Tue, 5 Nov 2013 17:06:37 +0100
To: Robin Berjon <robin@w3.org>
Cc: Marcos Caceres <w3c@marcosc.com>, public-webdevdata@w3.org
Message-ID: <CACj=BEi6uj7+61zOZ-91zR9p0Le_KtKoVrpW4GqHFn_snnnATA@mail.gmail.com>

On Tue, Nov 5, 2013 at 4:59 PM, Robin Berjon <robin@w3.org> wrote:

> On 05/11/2013 16:25 , Marcos Caceres wrote:
>
>> I wonder if we should start hosting the dataset on the W3C’s HG
>> server. Trying to d/l the latest data set has been really slow for me
>> (~1h today, but it was going to take 9h to d/l yesterday - and it’s
>> only 700mb). Also, having the data sets on HG means we can keep a
>> nice version history.
>>
>
> Not speaking on behalf of the systeam or anything but...
>
> While W3C does have a nice infrastructure, I'm not sure that it's
> necessarily up to the task here. Also, please note that the HG server is
> often down.
>
> Also, I don't know if it's such a good idea to hold the snapshot zip in
> HG. I don't know how HG does its internal storage, but if it's anything
> like Git then *every* single zip snapshot will be kept. At 700MB a piece,
> that could increase pretty fast. (Plus all the unzipped content too.)
>

My personal experience is that HG is actually worse than git with binaries.
So +1 to not store 1G binaries in source control.


>
> This strikes me as the sort of thing that could get some form of corporate
> sponsorship. You know, hosting on Google, Akamai, Amazon, or whatever.
>

I'm in Velocity next week. If you guys are cool with that, I can try to
talk to Steve to see if we could join forces with the HTTPArchive, at least
regarding the hosting aspects.

Received on Tuesday, 5 November 2013 16:07:06 UTC