- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Mon, 19 Aug 2024 18:02:17 -0400
- To: public-rdf-star-wg@w3.org
On 8/19/24 17:44, James Anderson wrote: > good evening; > >> On 19. Aug 2024, at 18:06, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote: >> >> It is indeed true that many SPARQL implementations do a poor job of optimizing queries that use the ontology facilities of RDFS. It should be possible to run the query you provide at essentially the same speed as the version of the query that does not include subproperties on RDF graphs that have no relevant subproperty statements and with not much loss in speed on a graph that only has a few relevant subproperty statements (compared to running the simpler query on an RDF graph that has materialized the consequences of the subproperty statements). > > i am curious, why one would make a broad claim on this order. > sparql formulations of the sort which combined those sorts of patterns freely in large queries which targeted graphs of large subject and object cardinalities would appear to constitute a significant challenge to an optimizer. > this even given the expressed content restrictions. > > do you have any references to discussions about how one might in general optimize such a query type and/or benchmarks which demonstrate results on that topic? > > best regards, from berlin, I don't view this as broad claim. It is essentially just a claim that keeping statistics is a good idea and that there is a way to exploit these statistics in the above situations. More detail follows. As far as I know it is generally useful for a query optimizer to keep statistics on each property. These statistics are useful for a number of optimizations, in particular to be able to work on intermediate results that are likely to be small before working on results that are likely to be large. If these statistics show that determining the subproperties of a property is likely to be very cheap a good query optimizer would generally do that part of the query first. If the results of these subqueries show that there are no non-trivial subproperties the properties can be substituted into the rest of the query. So the overhead is the overhead of keeping the statistics - a good idea in general; consulting the statistics - which would be quick; running the query - which in this case is very quick; and doing the substitution - which is extremely quick here. So no significant overhead. If the results of the query show only one or two subproperties then the overhead is the difference between running a query for a small number of properties and combining the results vs running a query for a property of the larger size. So it is possible to construct a query optimizer that handles these sorts of queries without much overhead. peter
Received on Monday, 19 August 2024 22:02:24 UTC