W3C home > Mailing lists > Public > public-sparql-dev@w3.org > July to September 2018

Wikidata SPARQL query logs available

From: Markus Kroetzsch <markus.kroetzsch@tu-dresden.de>
Date: Tue, 7 Aug 2018 17:50:52 +0200
To: SPARQL <public-sparql-dev@w3.org>
Message-ID: <d38be805-5eef-f6ae-46fa-d32816a98dea@tu-dresden.de>
Dear SPARQLers,

Together with Wikimedia, we have just released a dataset of >200M SPARQL 
queries that have been answered by the Wikidata SPARQL live query 
service [1]. You can find details and download links on the following page:

https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en

The data so far includes all queries answered in June-August 2017. There 
is also an accompanying publication (to be presented at ISWC) that 
describes the workings of and practical experiences with the SPARQL 
query service [2]. This paper also points to the software used to run 
the whole service (including live updates), in case you want to have 
your own instance.

The queries have been pre-processed to avoid user identification (see 
above page for details), but they are still complete (i.e., not 
sampled), with precise time information and with some amount of user 
agent classification (in particular, we have made an attempt to separate 
"organic" query traffic from the vast majority of bot queries).

We hope that the data can be useful to any of you interested in 
practical SPARQL usage. From our analyses so far, the metrics for the 
queries seem rather different from the metrics reported for other 
datasets (e.g., we found that path queries are very important, and also 
saw a lot of use of other SPARQL 1.1. features, such as VALUES and 
BIND). We would be interested in any further insights that one might get 
from this data.

I'd like to highlight that this is also part of a success story of 
SPARQL as a whole. The use of SPARQL for Wikidata was not pre-determined 
and other technologies were seriously considered. The massive use of the 
service and the wealth of (non-research) applications shows that SPARQL 
really works for this community (there is continued significant increase 
in usage in 2018, but we don't have permission to publish more recent 
logs so far). So kudos to everybody who has been working on the 
standard, the tooling, and the documentation around SPARQL over the 
years -- well done!

Cheers,

Markus

[1] https://query.wikidata.org/ (or rather the web service that powers 
this UI and many other applications).
[2] Stanislav Malyshev, Markus Krötzsch, Larry González, Julius Gonsior, 
Adrian Bielefeldt: Getting the Most out of Wikidata: Semantic Technology 
Usage in Wikipedia’s Knowledge Graph. In Proceedings of the 17th 
International Semantic Web Conference (ISWC-18), Springer 2018. 
https://iccl.inf.tu-dresden.de/web/Inproceedings3044/en

-- 
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://kbs.inf.tu-dresden.de/



Received on Tuesday, 7 August 2018 15:51:16 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 7 August 2018 15:51:17 UTC