- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Tue, 12 Nov 2024 13:45:48 -0500
- To: RDF-star Working Group <public-rdf-star-wg@w3.org>
TL;DR: Please contact me if you want your SPARQL implementation evaluated on Wikidata. I have started work on a small project whose funded goal is to help determine how well QLever could work as a SPARQL query service for Wikidata as Wikidata grows. The current methodology of the project is to: - gather queries from both existing benchmarks and users of Wikidata query services; - run queries against the current full Wikidata RDF dump, concentrating on hard queries and queries that involve parts of the graph of varying size; - create synthetic extensions of the current full Wikidata RDF dump and run queries on them; and - analyze results to determine how well QLever would work as Wikidata grows. The project will run queries against both the public QLever Wikidata service and a local version of the service running on high-end hardware with looser resource limits. The current hardware for the local service is a Ryzen 9 9950X with 192GB memory, 10TB of fast SSDs, and an 8TB hard drive. Information on the progress of the project is kept at https://www.wikidata.org/wiki/Wikidata:Scaling_Wikidata/Benchmarking I want to expand this project to other SPARQL implementations so I am soliciting information on suitable implementations. A suitable implementation must either be open source or be used in a public service running queries against the full Wikidata RDF dump. For any public service there must be information available on the hardware the service runs on and any resource limitations on it. If there is no public service there must be good instructions for building the system and loading the current full Wikidata RDF dump into it in under 3 days on a high-end desktop. There should be, both for public services and local builds, a way to run queries without interference from other queries (including clearing any caches) and a way to extract resource consumption of evaluating a query (including both compute time and memory needed). There should also be information on the best parameters to use for graphs of size about 20 billion triples or information on the best parameters to use for Wikidata. I believe that Blazegraph, QLever, Virtuoso Open Source, and Millenium DB currently satisfy (or nearly satisfy) these requirements. If you are interested in having your system evaluated in this expanded project I would like to hear from you. I will collaborate with you to find out how to set up your system to make it as capable as possible on data of the size and scope of Wikidata. I welcome input on setting up the benchmarks and on other useful activities that would fit within the expanded project. Peter F. Patel-Schneider
Received on Tuesday, 12 November 2024 18:45:53 UTC