Re: [Wikidata] [Xmldatadumps-l] Availability of Wikidata JSON dumps after Feb, 2019

Gerhard,

I'm curious what you mean by "processing" and "comb through".
Can you describe how your processing and what system or database the output
gets loaded into?
Perhaps you have your scripts publicly available on something like GitHub?

It would be nice to know a bit more on what you also are doing.  Thanks in
advance!

Thad
https://www.linkedin.com/in/thadguidry/


On Wed, Nov 25, 2020 at 9:14 AM Gerhard Gonter <ggonter@gmail.com> wrote:

> On Wed, Nov 25, 2020 at 1:22 PM Daniel Garijo <dgarijo@isi.edu> wrote:
> >
> > Hello,
> >
> > I am writing this message because I am analyzing the Wikidata JSON dumps
> > available in the Internet Archive and I have found there are no dumps
> > available after Feb 8th, 2019 (see
> >
> https://archive.org/details/wikimediadownloads?and%5B%5D=%22Wikidata%20entity%20dumps%22
> ).
> > I know the latest dumps are available at
> > https://dumps.wikimedia.org/wikidatawiki/entities/, but unfortunately
> > they only cover the last few months.
>
> Which dump files are exactly looking for?  Dumps like
>
>
> https://dumps.wikimedia.org/wikidatawiki/entities/20201116/wikidata-20201116-all.json.gz
>
> which can also be found on https://dumps.wikimedia.org/other/wikidata/
> as 20201116.json.gz ?
>
> > [...]
> > Does anyone on this list know where some of these missing Wikidata dumps
> > may be found? If anyone has pointers to a server where they can be
> > downloaded, I would highly appreciate it.
>
> If you are looking for these dumps, I have about 8 TB stored on
> external disks.  Transferring these over the network might be
> difficult, however.  Please contact me off-list, if this you need any
> of these dumps, maybe we can arrange something.
>
> I'm curious, what are you trying to do with all of these files?
> Processing all of them must take months.  My processor usually picks
> up the dump on Wednesday and takes 80 hours to comb through it.  But
> my processor is written in Perl, something in C or Rust might be a lot
> faster...
>
> regards, Gerhard Gonter
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

Received on Wednesday, 25 November 2020 15:40:40 UTC