Re: ISSUE-139: uniform descriptions and implementations of constraint components from Dimitris Kontokostas on 2016-06-07 (public-data-shapes-wg@w3.org from June 2016)

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Tue, 7 Jun 2016 10:54:02 +0300
To: Holger Knublauch <holger@topquadrant.com>
Cc: public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <CA+u4+a0G6cEsN73mWbv4T0DjPeWEeQEwpEPu3opxJJQ2APv+HQ@mail.gmail.com>
On Tue, Jun 7, 2016 at 9:24 AM, Holger Knublauch <holger@topquadrant.com>
wrote:

> On 7/06/2016 16:02, Dimitris Kontokostas wrote:
>
>
>
> On Tue, Jun 7, 2016 at 2:45 AM, Holger Knublauch <
> <holger@topquadrant.com>holger@topquadrant.com> wrote:
>
>> On 6/06/2016 22:14, Peter F. Patel-Schneider wrote:
>>
>>> As far as I can tell, there are not going to be any significant
>>> inefficiencies
>>> in a single-implementation setup.  Even if the boilerplate solution is
>>> the
>>> only possibility implementations of constraint components come down to
>>> starting out with the boilerplate and adding to it the code that
>>> implements
>>> the constraint component for property constraints.
>>>
>>> There are, admittedly, some potential inefficiencies in the boilerplate
>>> solution as the boilerplate is not modifiable.  For example, sh:hasValue
>>> will
>>> look something like
>>>
>>> SELECT $this ...
>>> WHERE { FILTER NOT EXISTS { [boilerplate]
>>>                               FILTER ( sameTerm($this,$hasValue) ) } }
>>>
>>> If the SPARQL implementation cannot optimize out the query followed by a
>>> simple filter then the above query will run slower than
>>>
>>> SELECT $this ...
>>> WHERE { FILTER NOT EXISTS { $this $predicate $hasValue } }
>>>
>>
>> I think you have contradicted yourself in this email. Yes, these
>> inefficiencies do exist and they are significant. The boilerplate solution
>> would first need to iterate over all potential values of the property, i.e.
>> have O(n) performance plus the overhead of a FILTER clause, while the
>> direct query has O(1) or O(log(N)) performance via a direct database
>> lookup. A crippled SHACL that doesn't allow users to benefit from database
>> optimizations will fail on the marketplace, and vendors will provide all
>> kinds of native extensions to work around the limits of the standard.
>>
>> Even if there was a mechanism for defining a single query for every case
>> and every constraint component (which I doubt), then we still require a
>> mechanism to overload them for these optimizations. So, I would be OK to
>> having sh:defaultValidator as long as sh:propertyValidator remains in place.
>
>
> Personally I would take it a small step further to achieve further
> optimizations. i.e. have a sh:defaultValidator and then zero or more
> sh:filteredValidators.
> A filtered validator would override the default validator based on
> 1) context (as we do already)
> 2) parameter values (e.g. for sh:minCount = 1)
> 3) platform specific information (e.g. sparql engine, sparql version etc)
>
> This is already supported in RDFUnit (mainly #2 now) and it is defined
> with an ASK query like "ASK { FILTER ($minCount = 1)}" / "ASK { FILTER
> ($minCount > 1)}"
>
>
> Sounds very good to me. I guess what we would need is a new property at
> the SPARQL validators to point at zero or one such preconditions. As you
> state they could be ASK queries, assuming we combine it with some integer
> for the ordering (otherwise they would all need to be completely disjoint).
> It would be a generic solution to things like vendor-specific
> optimizations. So maybe
>
> ex:MyValidator
>     a sh:SPARQLAskValidator ;
>     sh:ask "... the actual query ..." ;
>     sh:order 3 ;
>     sh:filter "... return true if applicable ..." .
>
> (using sh:order would allow an engine to start with the most likely match
> first, and would make the filter logic simpler).
>

sounds good to me


>
> Do you have experience as to how complex these pre-conditions would
> become? And would they need to operate on the data graph or shapes graph?
> The latter may make a performance difference as the selection would just
> need to be executed once per shapes graph. Yet I believe access to the data
> graph may be needed. In some cases a query may simple be of the form ASK {
> FILTER bound(?productX) } which does not even require a look up on any
> graph. I also expect queries to look slightly different depending on the
> type of database.
>

In my implementation the preconditions are very simple, similar to the ones
I posted and I use only the shapes graph
Actually, using pre-binding they evaluate to a static sparql query and no
graph is actually needed in the end in all of my cases
e.g. for "sh:minCount 2" and filter "ASK { FILTER ($minCount = 1)}" it is
translated to  ASK { FILTER (2 = 1)} which can be evaluated even in an
empty model

so, for my use cases, some access to the shapes graph is needed to get the
pre-bound variables only.
In general I would say ok for access to the shapes graph from filters but
not sure for access to the data graph

Dimitris

-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT
Received on Tuesday, 7 June 2016 07:54:59 UTC