- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Thu, 9 Jun 2016 04:07:33 -0700
- To: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg <public-data-shapes-wg@w3.org>
On 06/08/2016 08:03 PM, Holger Knublauch wrote: [...] >> So in the end the difference, even in the unoptimized case, is likely to be >> quite modest, certainly nowhere near a factor of n / log n, for n the size of >> the triple store. > > To be clear, I did not mean N to be the size of the triple store. It would be > the size of the set of value nodes, e.g. the number of objects of a > subject/predicate combination. > > But basically you are saying there are lots of unknowns - it depends on the > data, on the available indices etc. I maintain my position that the difference > is most likely N to log(N), because I assume most databases will have SP->O > indices. So N vs log(N) for N that is very likely to be a small number, often 1. Here the difference between N and log(N) is not significant so the constants are going to dominate. Which constants are going to dominate? It depends on how the triple store is designed, but it is likely that there will be a large constant that is the same for both, so the difference is likely to be small. And this is for the case that the SPARQL implementation does not do an easy optimization. (I don't have much faith that most SPARQL implementations actually do this easy optimization, but implementing the current design of SHACL is going to require significant additions to SPARQL so adding this optimization to a query optimizer is likely going to be easy in comparison.) > Anyway, you are the one who is suggesting that we should delete the ability to > specify optimized queries for each of the three cases. So the burden is upon > you to provide convincing evidence that for every possible constraint > component, the difference between a generic query and a specialized query is > negligible. I cannot see how you can possibly provide such evidence. There are > far more complicating use cases "out there" than the trivial sh:hasValue > scenario. What other cases? Any constraint component that requires iterating over the value nodes is going to have the same cost of this iteration. The only difference is going to be whether the SPARQL implementation can optimize the boilerplate so that it doesn't do both the forward and inverse lookups. This is another easy optimization. The only other core constraint components that don't need to iterate over the value nodes are sh:not, sh:and, and sh:or, but these don't themselves do any data access. That leaves sh:hasValue as the only core constraint component where the actual data access is different. I also don't know of any non-core constraint component that doesn't need to iterate over the value nodes. > Holger peter
Received on Thursday, 9 June 2016 11:08:04 UTC