- From: Daniel Garijo <dgarijo@isi.edu>
- Date: Wed, 25 Nov 2020 19:25:45 -0800
- To: Gerhard Gonter <ggonter@gmail.com>
- Cc: List - XMLDataDumps <xmldatadumps-l@lists.wikimedia.org>, "Discussion list for the Wikidata project." <wikidata@lists.wikimedia.org>, semantic-web@w3.org
Thanks Gerhard, I will be touching base off-list. I am looking for those json dumps precisely. We have been developing a toolkit that can process them in 12 hours (at least for the tests I have done with 2020 dumps). I will be happy to share more information with you (or anyone who is interested). Best, Daniel On 11/25/2020 7:13 AM, Gerhard Gonter wrote: > On Wed, Nov 25, 2020 at 1:22 PM Daniel Garijo <dgarijo@isi.edu> wrote: >> Hello, >> >> I am writing this message because I am analyzing the Wikidata JSON dumps >> available in the Internet Archive and I have found there are no dumps >> available after Feb 8th, 2019 (see >> https://archive.org/details/wikimediadownloads?and%5B%5D=%22Wikidata%20entity%20dumps%22). >> I know the latest dumps are available at >> https://dumps.wikimedia.org/wikidatawiki/entities/, but unfortunately >> they only cover the last few months. > Which dump files are exactly looking for? Dumps like > > https://dumps.wikimedia.org/wikidatawiki/entities/20201116/wikidata-20201116-all.json.gz > > which can also be found on https://dumps.wikimedia.org/other/wikidata/ > as 20201116.json.gz ? > >> [...] >> Does anyone on this list know where some of these missing Wikidata dumps >> may be found? If anyone has pointers to a server where they can be >> downloaded, I would highly appreciate it. > If you are looking for these dumps, I have about 8 TB stored on > external disks. Transferring these over the network might be > difficult, however. Please contact me off-list, if this you need any > of these dumps, maybe we can arrange something. > > I'm curious, what are you trying to do with all of these files? > Processing all of them must take months. My processor usually picks > up the dump on Wednesday and takes 80 hours to comb through it. But > my processor is written in Perl, something in C or Rust might be a lot > faster... > > regards, Gerhard Gonter
Received on Thursday, 26 November 2020 03:26:06 UTC