Re: ISSUE-139: uniform descriptions and implementations of constraint components from Holger Knublauch on 2016-06-08 (public-data-shapes-wg@w3.org from June 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Wed, 8 Jun 2016 13:29:46 +1000
Cc: public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <eba3d9b9-e6f6-b0e0-f84b-7b05231d39af@topquadrant.com>
Dimitris,

for the record, a variation of the approach you suggest would simplify 
the SELECT query at the end of

https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Jun/0034.html

Instead of a UNION with three branches and three sh:messages, it could 
be rewritten as multiple validators for each branch. One branch would be 
for the case of zero values, one for exactly one value and the 3rd case 
for more than one value. The filter queries would require access to the 
data graph though, and then I guess the only approach with realistic 
performance would be to insert the ASK queries into the SELECT 
(otherwise there would be too many queries).

Holger


On 7/06/2016 17:54, Dimitris Kontokostas wrote:
>
>
> On Tue, Jun 7, 2016 at 9:24 AM, Holger Knublauch 
> <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>
>     On 7/06/2016 16:02, Dimitris Kontokostas wrote:
>>
>>
>>     On Tue, Jun 7, 2016 at 2:45 AM, Holger Knublauch
>>     <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>>
>>         On 6/06/2016 22:14, Peter F. Patel-Schneider wrote:
>>
>>             As far as I can tell, there are not going to be any
>>             significant inefficiencies
>>             in a single-implementation setup. Even if the boilerplate
>>             solution is the
>>             only possibility implementations of constraint components
>>             come down to
>>             starting out with the boilerplate and adding to it the
>>             code that implements
>>             the constraint component for property constraints.
>>
>>             There are, admittedly, some potential inefficiencies in
>>             the boilerplate
>>             solution as the boilerplate is not modifiable.  For
>>             example, sh:hasValue will
>>             look something like
>>
>>             SELECT $this ...
>>             WHERE { FILTER NOT EXISTS { [boilerplate]
>>                                           FILTER (
>>             sameTerm($this,$hasValue) ) } }
>>
>>             If the SPARQL implementation cannot optimize out the
>>             query followed by a
>>             simple filter then the above query will run slower than
>>
>>             SELECT $this ...
>>             WHERE { FILTER NOT EXISTS { $this $predicate $hasValue } }
>>
>>
>>         I think you have contradicted yourself in this email. Yes,
>>         these inefficiencies do exist and they are significant. The
>>         boilerplate solution would first need to iterate over all
>>         potential values of the property, i.e. have O(n) performance
>>         plus the overhead of a FILTER clause, while the direct query
>>         has O(1) or O(log(N)) performance via a direct database
>>         lookup. A crippled SHACL that doesn't allow users to benefit
>>         from database optimizations will fail on the marketplace, and
>>         vendors will provide all kinds of native extensions to work
>>         around the limits of the standard.
>>
>>         Even if there was a mechanism for defining a single query for
>>         every case and every constraint component (which I doubt),
>>         then we still require a mechanism to overload them for these
>>         optimizations. So, I would be OK to having
>>         sh:defaultValidator as long as sh:propertyValidator remains
>>         in place.
>>
>>
>>     Personally I would take it a small step further to achieve
>>     further optimizations. i.e. have a sh:defaultValidator and then
>>     zero or more sh:filteredValidators.
>>     A filtered validator would override the default validator based on
>>     1) context (as we do already)
>>     2) parameter values (e.g. for sh:minCount = 1)
>>     3) platform specific information (e.g. sparql engine, sparql
>>     version etc)
>>
>>     This is already supported in RDFUnit (mainly #2 now) and it is
>>     defined with an ASK query like "ASK { FILTER ($minCount = 1)}" /
>>     "ASK { FILTER ($minCount > 1)}"
>
>     Sounds very good to me. I guess what we would need is a new
>     property at the SPARQL validators to point at zero or one such
>     preconditions. As you state they could be ASK queries, assuming we
>     combine it with some integer for the ordering (otherwise they
>     would all need to be completely disjoint). It would be a generic
>     solution to things like vendor-specific optimizations. So maybe
>
>     ex:MyValidator
>         a sh:SPARQLAskValidator ;
>         sh:ask "... the actual query ..." ;
>         sh:order 3 ;
>         sh:filter "... return true if applicable ..." .
>
>     (using sh:order would allow an engine to start with the most
>     likely match first, and would make the filter logic simpler).
>
>
> sounds good to me
>
>
>     Do you have experience as to how complex these pre-conditions
>     would become? And would they need to operate on the data graph or
>     shapes graph? The latter may make a performance difference as the
>     selection would just need to be executed once per shapes graph.
>     Yet I believe access to the data graph may be needed. In some
>     cases a query may simple be of the form ASK { FILTER
>     bound(?productX) } which does not even require a look up on any
>     graph. I also expect queries to look slightly different depending
>     on the type of database.
>
>
> In my implementation the preconditions are very simple, similar to the 
> ones I posted and I use only the shapes graph
> Actually, using pre-binding they evaluate to a static sparql query and 
> no graph is actually needed in the end in all of my cases
> e.g. for "sh:minCount 2" and filter "ASK { FILTER ($minCount = 1)}" it 
> is translated to  ASK { FILTER (2 = 1)} which can be evaluated even in 
> an empty model
>
> so, for my use cases, some access to the shapes graph is needed to get 
> the pre-bound variables only.
> In general I would say ok for access to the shapes graph from filters 
> but not sure for access to the data graph
>
> Dimitris
>
> -- 
> Dimitris Kontokostas
> Department of Computer Science, University of Leipzig & DBpedia 
> Association
> Projects: http://dbpedia.org, http://rdfunit.aksw.org, 
> http://aligned-project.eu
> Homepage: http://aksw.org/DimitrisKontokostas
> Research Group: AKSW/KILT http://aksw.org/Groups/KILT
>
Received on Wednesday, 8 June 2016 03:30:22 UTC