- From: Holger Knublauch <holger@topquadrant.com>
- Date: Tue, 7 Jun 2016 16:24:24 +1000
- To: public-data-shapes-wg <public-data-shapes-wg@w3.org>
- Message-ID: <e042542f-0e10-05c2-b675-a78c28100414@topquadrant.com>
On 7/06/2016 16:02, Dimitris Kontokostas wrote: > > > On Tue, Jun 7, 2016 at 2:45 AM, Holger Knublauch > <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote: > > On 6/06/2016 22:14, Peter F. Patel-Schneider wrote: > > As far as I can tell, there are not going to be any > significant inefficiencies > in a single-implementation setup. Even if the boilerplate > solution is the > only possibility implementations of constraint components come > down to > starting out with the boilerplate and adding to it the code > that implements > the constraint component for property constraints. > > There are, admittedly, some potential inefficiencies in the > boilerplate > solution as the boilerplate is not modifiable. For example, > sh:hasValue will > look something like > > SELECT $this ... > WHERE { FILTER NOT EXISTS { [boilerplate] > FILTER ( > sameTerm($this,$hasValue) ) } } > > If the SPARQL implementation cannot optimize out the query > followed by a > simple filter then the above query will run slower than > > SELECT $this ... > WHERE { FILTER NOT EXISTS { $this $predicate $hasValue } } > > > I think you have contradicted yourself in this email. Yes, these > inefficiencies do exist and they are significant. The boilerplate > solution would first need to iterate over all potential values of > the property, i.e. have O(n) performance plus the overhead of a > FILTER clause, while the direct query has O(1) or O(log(N)) > performance via a direct database lookup. A crippled SHACL that > doesn't allow users to benefit from database optimizations will > fail on the marketplace, and vendors will provide all kinds of > native extensions to work around the limits of the standard. > > Even if there was a mechanism for defining a single query for > every case and every constraint component (which I doubt), then we > still require a mechanism to overload them for these > optimizations. So, I would be OK to having sh:defaultValidator as > long as sh:propertyValidator remains in place. > > > Personally I would take it a small step further to achieve further > optimizations. i.e. have a sh:defaultValidator and then zero or more > sh:filteredValidators. > A filtered validator would override the default validator based on > 1) context (as we do already) > 2) parameter values (e.g. for sh:minCount = 1) > 3) platform specific information (e.g. sparql engine, sparql version etc) > > This is already supported in RDFUnit (mainly #2 now) and it is defined > with an ASK query like "ASK { FILTER ($minCount = 1)}" / "ASK { FILTER > ($minCount > 1)}" Sounds very good to me. I guess what we would need is a new property at the SPARQL validators to point at zero or one such preconditions. As you state they could be ASK queries, assuming we combine it with some integer for the ordering (otherwise they would all need to be completely disjoint). It would be a generic solution to things like vendor-specific optimizations. So maybe ex:MyValidator a sh:SPARQLAskValidator ; sh:ask "... the actual query ..." ; sh:order 3 ; sh:filter "... return true if applicable ..." . (using sh:order would allow an engine to start with the most likely match first, and would make the filter logic simpler). Do you have experience as to how complex these pre-conditions would become? And would they need to operate on the data graph or shapes graph? The latter may make a performance difference as the selection would just need to be executed once per shapes graph. Yet I believe access to the data graph may be needed. In some cases a query may simple be of the form ASK { FILTER bound(?productX) } which does not even require a look up on any graph. I also expect queries to look slightly different depending on the type of database. Holger
Received on Tuesday, 7 June 2016 06:24:57 UTC