- From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
- Date: Tue, 7 Jun 2016 10:54:02 +0300
- To: Holger Knublauch <holger@topquadrant.com>
- Cc: public-data-shapes-wg <public-data-shapes-wg@w3.org>
- Message-ID: <CA+u4+a0G6cEsN73mWbv4T0DjPeWEeQEwpEPu3opxJJQ2APv+HQ@mail.gmail.com>
On Tue, Jun 7, 2016 at 9:24 AM, Holger Knublauch <holger@topquadrant.com> wrote: > On 7/06/2016 16:02, Dimitris Kontokostas wrote: > > > > On Tue, Jun 7, 2016 at 2:45 AM, Holger Knublauch < > <holger@topquadrant.com>holger@topquadrant.com> wrote: > >> On 6/06/2016 22:14, Peter F. Patel-Schneider wrote: >> >>> As far as I can tell, there are not going to be any significant >>> inefficiencies >>> in a single-implementation setup. Even if the boilerplate solution is >>> the >>> only possibility implementations of constraint components come down to >>> starting out with the boilerplate and adding to it the code that >>> implements >>> the constraint component for property constraints. >>> >>> There are, admittedly, some potential inefficiencies in the boilerplate >>> solution as the boilerplate is not modifiable. For example, sh:hasValue >>> will >>> look something like >>> >>> SELECT $this ... >>> WHERE { FILTER NOT EXISTS { [boilerplate] >>> FILTER ( sameTerm($this,$hasValue) ) } } >>> >>> If the SPARQL implementation cannot optimize out the query followed by a >>> simple filter then the above query will run slower than >>> >>> SELECT $this ... >>> WHERE { FILTER NOT EXISTS { $this $predicate $hasValue } } >>> >> >> I think you have contradicted yourself in this email. Yes, these >> inefficiencies do exist and they are significant. The boilerplate solution >> would first need to iterate over all potential values of the property, i.e. >> have O(n) performance plus the overhead of a FILTER clause, while the >> direct query has O(1) or O(log(N)) performance via a direct database >> lookup. A crippled SHACL that doesn't allow users to benefit from database >> optimizations will fail on the marketplace, and vendors will provide all >> kinds of native extensions to work around the limits of the standard. >> >> Even if there was a mechanism for defining a single query for every case >> and every constraint component (which I doubt), then we still require a >> mechanism to overload them for these optimizations. So, I would be OK to >> having sh:defaultValidator as long as sh:propertyValidator remains in place. > > > Personally I would take it a small step further to achieve further > optimizations. i.e. have a sh:defaultValidator and then zero or more > sh:filteredValidators. > A filtered validator would override the default validator based on > 1) context (as we do already) > 2) parameter values (e.g. for sh:minCount = 1) > 3) platform specific information (e.g. sparql engine, sparql version etc) > > This is already supported in RDFUnit (mainly #2 now) and it is defined > with an ASK query like "ASK { FILTER ($minCount = 1)}" / "ASK { FILTER > ($minCount > 1)}" > > > Sounds very good to me. I guess what we would need is a new property at > the SPARQL validators to point at zero or one such preconditions. As you > state they could be ASK queries, assuming we combine it with some integer > for the ordering (otherwise they would all need to be completely disjoint). > It would be a generic solution to things like vendor-specific > optimizations. So maybe > > ex:MyValidator > a sh:SPARQLAskValidator ; > sh:ask "... the actual query ..." ; > sh:order 3 ; > sh:filter "... return true if applicable ..." . > > (using sh:order would allow an engine to start with the most likely match > first, and would make the filter logic simpler). > sounds good to me > > Do you have experience as to how complex these pre-conditions would > become? And would they need to operate on the data graph or shapes graph? > The latter may make a performance difference as the selection would just > need to be executed once per shapes graph. Yet I believe access to the data > graph may be needed. In some cases a query may simple be of the form ASK { > FILTER bound(?productX) } which does not even require a look up on any > graph. I also expect queries to look slightly different depending on the > type of database. > In my implementation the preconditions are very simple, similar to the ones I posted and I use only the shapes graph Actually, using pre-binding they evaluate to a static sparql query and no graph is actually needed in the end in all of my cases e.g. for "sh:minCount 2" and filter "ASK { FILTER ($minCount = 1)}" it is translated to ASK { FILTER (2 = 1)} which can be evaluated even in an empty model so, for my use cases, some access to the shapes graph is needed to get the pre-bound variables only. In general I would say ok for access to the shapes graph from filters but not sure for access to the data graph Dimitris -- Dimitris Kontokostas Department of Computer Science, University of Leipzig & DBpedia Association Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu Homepage: http://aksw.org/DimitrisKontokostas Research Group: AKSW/KILT http://aksw.org/Groups/KILT
Received on Tuesday, 7 June 2016 07:54:59 UTC