ISSUE-139: The primary keys use case

On 7/06/2016 0:54, Peter F. Patel-Schneider wrote:
> Are you proposing that there should be a constraint component for primary
> keys?  I don't see any description of how this would work in property
> constraints so how can anyone determine how it would work in other
> constraints.  If you are not proposing that there be a constraint component
> for primary keys then I don't see any relevance to the discussion here.

I would like to elaborate this use case a bit because it has 
implications on ISSUE-139.

The "Primary Keys" feature has been mentioned in our Use Cases 
deliverable as UC25 [1] and in the wiki

https://www.w3.org/2014/data-shapes/wiki/Primary_Keys_with_URI_Pattern

This is a feature that has been in successful use based on SPIN in 
TopBraid products for a couple of years now. I have since ported it to 
SHACL as part of the DASH namespace. I have pasted a source code snippet 
to the bottom of this email.

An example in SHACL would be

ex:CountryShape
     a sh:Shape ;
     sh:scopeClass ex:Country ;
     sh:property [
         sh:predicate ex:countryCode ;
         dash:uriStart "http://example.org/Country-" ;
     ] .

A valid instance would be

ex:Country-de
   rdf:type ex:Country ;
   ex:countryCode "de" .

An invalid instance would be

ex:Country-incorrect
   rdf:type ex:Country ;
   ex:countryCode "en" .

The rule is that if a property has a dash:uriStart then that property 
serves as "primary key" which means that it must have exactly one value 
and the URI of the subject must be the uriStart + the value of the 
primary key, e.g. "ex:Country-de" for a primary key value of "de" and a 
uriStart of "ex:Country-".

This constraint component highlights an important strength of SHACL: It 
provides machine-readable definitions of constraints that can be used 
for validation purposes but also many other use cases. In our particular 
case, we are using the primary key to produce well-formed URIs from a 
primary key value when a new instance is created. I have attached a 
screenshot of TopBraid showing a dialog in which the user just enters 
the name of a Country and its country code, and the URI gets produced 
automatically.

The fact that this constraint can also be used for constraint checking 
is a great way of locking in a contract in the model, but the 
dash:uriStart triples can be queried by user interface tools too, and 
that use case is far more important than constraint validation here. For 
example, once we know that a primary key exists, looking up the URI for 
a given country is trivial and doesn't even require a query against the 
database.

Clearly, this constraint component makes no sense for either inverse 
properties or node constraints. For inverse property constraints it 
makes no sense because the values could never be literals, making them 
unsuitable to build new URIs. For node constraints, it would even be 
very hard to even come up with an explanation of what it could possibly 
mean. In a node constraint, we would have $this = $value, i.e. the 
subject that is supposed to have a certain URI is the same as the value 
of the primary key! Such a constraint is impossible to fulfill because 
the URI would have to include itself recursively. This is just to show 
how silly such examples can become with the strict policy suggested here 
in this ticket.

By having a sh:context triple (see below), the creator of such a 
constraint component can clearly communicate how this component is 
supposed to be used and where it should not even be offered as a choice.

And the query of the validator (below) is quite irregular and would not 
fit into any of the proposed "boilerplate" generalizations.

This example demonstrates that the proposals to have only one validator 
per constraint component, and to always allow every constraint 
component, make SHACL fail to address real-world requirements.

Holger

[1] https://www.w3.org/TR/shacl-ucr/#uc25-primary-keys-with-uri-patterns



dash:PrimaryKeyConstraintComponent
   rdf:type sh:ConstraintComponent ;
   rdfs:comment "Enforces a constraint that the given property 
(sh:predicate) serves as primary key for all resources in the scope of 
the shape. If a property has been declared to be the primary key then 
each resource must have exactly one value for that property. 
Furthermore, the URIs of those resources must start with a given string 
(dash:uriStart), followed by the URL-encoded primary key value. For 
example if dash:uriStart is \"http://example.org/country-\" and the 
primary key for an instance is \"de\" then the URI must be 
\"http://example.org/country-de\". Finally, as a result of the URI 
policy, there can not be any other resource with the same value under 
the same primary key policy." ;
   rdfs:label "Primary key constraint component" ;
   sh:context sh:PropertyConstraint ;
   sh:labelTemplate "The property {?predicate} is the primary key and 
URIs start with {?uriStart}" ;
   sh:parameter [
       sh:predicate dash:uriStart ;
       sh:datatype xsd:string ;
       sh:description "The start of the URIs of well-formed resources." ;
       sh:name "URI start" ;
     ] ;
   sh:propertyValidator [
       rdf:type sh:SPARQLSelectValidator ;
       sh:select """SELECT $this ($this AS ?subject) $predicate (?value 
AS ?object) ?message
WHERE {
     {
         FILTER NOT EXISTS {
             ?this $predicate ?any .
         } .
         BIND (\"Missing value for primary key property\" AS ?message) .
     }
     UNION
     {
         FILTER (dash:valueCount(?this, $predicate) > 1) .
         BIND (\"Multiple values of primary key property\" AS ?message) .
     }
     UNION
     {
         FILTER (dash:valueCount(?this, $predicate) = 1) .
         ?this $predicate ?value .
         BIND (CONCAT($uriStart, ENCODE_FOR_URI(str(?value))) AS ?uri) .
         FILTER (str(?this) != ?uri) .
         BIND (CONCAT(\"Primary key value \", str(?value), \" does not 
align with the expected URI \", ?uri) AS ?message) .
     } .
}""" ;
     ] ;
.

Received on Wednesday, 8 June 2016 02:12:02 UTC