Re: ISSUE-139: uniform descriptions and implementations of constraint components from Dimitris Kontokostas on 2016-06-07 (public-data-shapes-wg@w3.org from June 2016)

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Tue, 7 Jun 2016 09:02:12 +0300
To: Holger Knublauch <holger@topquadrant.com>
Cc: public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <CA+u4+a0XawjZY08ogMyymofpdzW1d+iz5MXqWC+oFCki6icJxg@mail.gmail.com>

On Tue, Jun 7, 2016 at 2:45 AM, Holger Knublauch <holger@topquadrant.com>
wrote:

> On 6/06/2016 22:14, Peter F. Patel-Schneider wrote:
>
>> As far as I can tell, there are not going to be any significant
>> inefficiencies
>> in a single-implementation setup.  Even if the boilerplate solution is the
>> only possibility implementations of constraint components come down to
>> starting out with the boilerplate and adding to it the code that
>> implements
>> the constraint component for property constraints.
>>
>> There are, admittedly, some potential inefficiencies in the boilerplate
>> solution as the boilerplate is not modifiable.  For example, sh:hasValue
>> will
>> look something like
>>
>> SELECT $this ...
>> WHERE { FILTER NOT EXISTS { [boilerplate]
>>                               FILTER ( sameTerm($this,$hasValue) ) } }
>>
>> If the SPARQL implementation cannot optimize out the query followed by a
>> simple filter then the above query will run slower than
>>
>> SELECT $this ...
>> WHERE { FILTER NOT EXISTS { $this $predicate $hasValue } }
>>
>
> I think you have contradicted yourself in this email. Yes, these
> inefficiencies do exist and they are significant. The boilerplate solution
> would first need to iterate over all potential values of the property, i.e.
> have O(n) performance plus the overhead of a FILTER clause, while the
> direct query has O(1) or O(log(N)) performance via a direct database
> lookup. A crippled SHACL that doesn't allow users to benefit from database
> optimizations will fail on the marketplace, and vendors will provide all
> kinds of native extensions to work around the limits of the standard.
>
> Even if there was a mechanism for defining a single query for every case
> and every constraint component (which I doubt), then we still require a
> mechanism to overload them for these optimizations. So, I would be OK to
> having sh:defaultValidator as long as sh:propertyValidator remains in place.


Personally I would take it a small step further to achieve further
optimizations. i.e. have a sh:defaultValidator and then zero or more
sh:filteredValidators.
A filtered validator would override the default validator based on
1) context (as we do already)
2) parameter values (e.g. for sh:minCount = 1)
3) platform specific information (e.g. sparql engine, sparql version etc)

This is already supported in RDFUnit (mainly #2 now) and it is defined with
an ASK query like "ASK { FILTER ($minCount = 1)}" / "ASK { FILTER
($minCount > 1)}"

Dimitris

-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT

Received on Tuesday, 7 June 2016 06:03:08 UTC