Re: [Xmldatadumps-l] Availability of Wikidata JSON dumps after Feb, 2019

Thanks Gerhard, I will be touching base off-list.

I am looking for those json dumps precisely. We have been developing a 
toolkit that can process them in 12 hours (at least for the tests I have 
done with 2020 dumps). I will be happy to share more information with 
you (or anyone who is interested).

Best,

Daniel

On 11/25/2020 7:13 AM, Gerhard Gonter wrote:
> On Wed, Nov 25, 2020 at 1:22 PM Daniel Garijo <dgarijo@isi.edu> wrote:
>> Hello,
>>
>> I am writing this message because I am analyzing the Wikidata JSON dumps
>> available in the Internet Archive and I have found there are no dumps
>> available after Feb 8th, 2019 (see
>> https://archive.org/details/wikimediadownloads?and%5B%5D=%22Wikidata%20entity%20dumps%22).
>> I know the latest dumps are available at
>> https://dumps.wikimedia.org/wikidatawiki/entities/, but unfortunately
>> they only cover the last few months.
> Which dump files are exactly looking for?  Dumps like
>
> https://dumps.wikimedia.org/wikidatawiki/entities/20201116/wikidata-20201116-all.json.gz
>
> which can also be found on https://dumps.wikimedia.org/other/wikidata/
> as 20201116.json.gz ?
>
>> [...]
>> Does anyone on this list know where some of these missing Wikidata dumps
>> may be found? If anyone has pointers to a server where they can be
>> downloaded, I would highly appreciate it.
> If you are looking for these dumps, I have about 8 TB stored on
> external disks.  Transferring these over the network might be
> difficult, however.  Please contact me off-list, if this you need any
> of these dumps, maybe we can arrange something.
>
> I'm curious, what are you trying to do with all of these files?
> Processing all of them must take months.  My processor usually picks
> up the dump on Wednesday and takes 80 hours to comb through it.  But
> my processor is written in Perl, something in C or Rust might be a lot
> faster...
>
> regards, Gerhard Gonter

Received on Thursday, 26 November 2020 03:26:06 UTC