eliminating the need for three SPARQL queries for constraint components from Peter F. Patel-Schneider on 2016-06-02 (public-data-shapes-wg@w3.org from June 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 2 Jun 2016 16:54:10 -0700
To: public-data-shapes-wg@w3.org
Message-ID: <b57debba-c957-9f9b-6b17-edda7e92f0ad@gmail.com>
Instead of having potentially three different SPARQL queries for a
constraint component, is it possible to have only one?  There are at least
two problems to overcome.  First, SPARQL does not have all the facilities
that one is used to having in a programming language, like compound values
and subroutines.  Second, SHACL validation reports have a setup that is
different for node, property, and inverse property constraints.  Third,
SHACL messages in validation reports are sensitive to whether the constraint
component is in a node, a property, or in inverse property constraint.

One way of overcoming the first problemm is just to have some boilerplate
that has to be included in each SPARQL body.  This boilerplate is
responsible for setting up the correct environment for the code that
implements the actual constraint component.

A piece of boilerplate that does this is
 
 { $this $predicate ?value .
   FILTER ( sameTerm(?context,sh:PropertyConstraint) )
 } UNION {
    ?value $predicate $this .
   FILTER ( sameTerm(?context,sh:InversePropertyConstraint) )
  } UNION {
   BIND ( $this AS ?value )
   FILTER ( sameTerm(?context,sh:NodeConstraint) )
 }

Another way of overcoming this problem is to use the VALUES meaning of
pre-binding, so that the SPARQL bodies are just started with potentially
multiple value for $value, namely all the value nodes for a particular focus
node.  However, this requires a particular kind of pre-binding which may not
be available in all SPARQL implementations so lets go with the first
solution.  Also, I don't think that this works for all constraint
components, in particular sh:equals.


One way of overcoming the second problem is to have the biolerplate also set
up the values for the validation reports, as in
 
 { $this $predicate ?value .
   BIND ( $this AS ?subject )
   BIND ( ?value AS ?object )
   FILTER ( sameTerm(?context,sh:PropertyConstraint) )
 } UNION {
    ?value $predicate $this .
   BIND ( ?value AS ?subject )
   BIND ( $this AS ?object )
   FILTER ( sameTerm(?context,sh:InversePropertyConstraint) )
  } UNION {
   BIND ( $this AS ?value )
   FILTER ( sameTerm(?context,sh:NodeConstraint) )
 }

This generally works, but would require a change in validation reports for
sh:closed.

Another way of overcoming this problem is to change the validation reports,
to eliminate sh:subject and sh:object and include instead sh:value for the
value node involved and sh:context for the kind of constraint.  This appears
to be better to me as it reduces the amount of work that has to be done in
the SPARQL code.


The third problem can be overcome by a more sophisticated way of generating
the messages, such as having a macro that would expand to either "a value
of <p> for <f>", or "a value of the inverse of <p> for <f>", or "<f>" for
<f> a focus node and <p> a property.


So what then would the SPARQL code for constraint components look like?

Simple constraint components like sh:class are dominated by the boilerplate

SELECT $this $predicate ?value ?context $class
WHERE {
 { $this $predicate ?value .
   FILTER ( sameTerm($context,sh:PropertyConstraint) )
 } UNION {
    ?value $predicate $this .
   FILTER ( sameTerm($context,sh:InversePropertyConstraint) )
  } UNION {
   BIND ( $this AS ?value )
   FILTER ( sameTerm($context,sh:NodeConstraint) )
 }
 FILTER EXISTS { $value rdf:type/rdfs:subClassOf* $class }
      }

This isn't as nice as the current ask validators but doesn't need that extra
capability.

Some constraint components that cannot be handled by ask validators are
nearly as simple.  The SPARQL code for sh:minCount would be

SELECT $this $predicate ?context $minCount
WHERE {
 { $this $predicate ?value .
   FILTER ( sameTerm($context,sh:PropertyConstraint) )
 } UNION {
    ?value $predicate $this .
   FILTER ( sameTerm($context,sh:InversePropertyConstraint) )
  } UNION {
   BIND ( $this AS ?value )
   FILTER ( sameTerm($context,sh:NodeConstraint) )
 }
      }
HAVING ( COUNT (DISTINCT ?value) < $minCount )

The SPARQL code for sh:disjoint would be

SELECT $this $predicate ?value ?context
WHERE {
 { $this $predicate ?value .
   FILTER ( sameTerm($context,sh:PropertyConstraint) )
 } UNION {
    ?value $predicate $this .
   FILTER ( sameTerm($context,sh:InversePropertyConstraint) )
  } UNION {
   BIND ( $this AS ?value )
   FILTER ( sameTerm($context,sh:NodeConstraint) )
 }
 $this $disjoint ?value .
      }

The SPARQL code for sh:equals requires the boilerplate twice

SELECT $this $predicate ?value ?context
WHERE {
 {
   { $this $predicate ?value .
     FILTER ( sameTerm($context,sh:PropertyConstraint) )
   } UNION {
      ?value $predicate $this .
     FILTER ( sameTerm($context,sh:InversePropertyConstraint) )
    } UNION {
     BIND ( $this AS ?value )
     FILTER ( sameTerm($context,sh:NodeConstraint) )
   }
   FILTER NOT EXISTS { $this $equals ?value }
       } UNION
       {
   $this $equals ?value .
   FILTER NOT EXISTS {
   { $this $predicate ?value .
     FILTER ( sameTerm($context,sh:PropertyConstraint) )
   } UNION {
      ?value $predicate $this .
     FILTER ( sameTerm($context,sh:InversePropertyConstraint) )
    } UNION {
     BIND ( $this AS ?value )
     FILTER ( sameTerm($context,sh:NodeConstraint) )
   }
   }
 }
      }


peter
Received on Thursday, 2 June 2016 23:54:40 UTC