Re: ISSUE-139: uniform descriptions and implementations of constraint components

On 6/06/2016 22:14, Peter F. Patel-Schneider wrote:
> As far as I can tell, there are not going to be any significant inefficiencies
> in a single-implementation setup.  Even if the boilerplate solution is the
> only possibility implementations of constraint components come down to
> starting out with the boilerplate and adding to it the code that implements
> the constraint component for property constraints.
>
> There are, admittedly, some potential inefficiencies in the boilerplate
> solution as the boilerplate is not modifiable.  For example, sh:hasValue will
> look something like
>
> SELECT $this ...
> WHERE { FILTER NOT EXISTS { [boilerplate]
>                               FILTER ( sameTerm($this,$hasValue) ) } }
>
> If the SPARQL implementation cannot optimize out the query followed by a
> simple filter then the above query will run slower than
>
> SELECT $this ...
> WHERE { FILTER NOT EXISTS { $this $predicate $hasValue } }

I think you have contradicted yourself in this email. Yes, these 
inefficiencies do exist and they are significant. The boilerplate 
solution would first need to iterate over all potential values of the 
property, i.e. have O(n) performance plus the overhead of a FILTER 
clause, while the direct query has O(1) or O(log(N)) performance via a 
direct database lookup. A crippled SHACL that doesn't allow users to 
benefit from database optimizations will fail on the marketplace, and 
vendors will provide all kinds of native extensions to work around the 
limits of the standard.

Even if there was a mechanism for defining a single query for every case 
and every constraint component (which I doubt), then we still require a 
mechanism to overload them for these optimizations. So, I would be OK to 
having sh:defaultValidator as long as sh:propertyValidator remains in place.

Holger


>
> peter
>
>
> On 06/05/2016 11:31 PM, Dimitris Kontokostas wrote:
>>
>> On Sun, Jun 5, 2016 at 11:36 PM, Peter F. Patel-Schneider
>> <pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>> wrote:
>>
>>      Yes, each constraint component should not need more than one implementation,
>>      whether it is in the core or otherwise.  Otherwise there are just that many
>>      more ways of introducing an error.
>>
>>      Yes, in the current setup each constraint component should be usable in node
>>      constraints, in property constraints, and in inverse property constraints.
>>      Otherwise there is an extra cognitive load on users to figure out when a
>>      constraint component can be used.  The idea is to not have errors result from
>>      these extra uses, though.  Just as sh:minLength does not cause an error when a
>>      value node is a blank node neither should sh:datatype cause an error when used
>>      in an inverse property constraint.  Of course, an sh:datatype in an inverse
>>      property constraint will always be false on a data graph that is not an
>>      extended RDF graph.
>>
>>
>> I would argue that all these cases should throw an error, otherwise it would
>> again require extra cognitive load to remember when a use of a constraint is
>> actually used or ignored.
>>
>> One other case is optimization, if we require "no more than one"
>> implementation then we may result in very inefficiently defined constraints.
>> e.g. for a particular context (and a particular value of the constraint) I can
>> probably create a very efficient SPARQL query that is many times faster than
>> the general one, with your approach we loose that advantage.
>> When we test small / in-memory graphs the delay might not be so noticeable but
>> in big SPARQL endpoints it may result in very big delays or even failing to
>> run the query

Received on Monday, 6 June 2016 23:45:44 UTC