Re: ISSUE-139: The primary keys use case

This is not about primary keys.  A primary key has the special characteristic
that there is only one primary key for each database table (which could be
read as RDFS class in the RDF case).  There is no requirement here that there
can be only one primary key property for a class.  The use case in the UCR
document and accompanying documentation should be changed to fix this error.


What will happen if this constraint component is used in an inverse property
constraint?  Well, in non-extended RDF the inverse values of properties cannot
be strings so the check for the inverse property value being a string is going
to be false.  What will happen if this constraint component is used in a node
constraint?  Well, the node is already an IRI, so it can't be a string
literal.  So both of these situations end up with a constraint that is
uniformly violated.  A good-style checker for SHACL might flag these as being
questionable, but there is no problem in allowing these as valid SHACL as they
are perfectly well behaved.

But what makes this constraint component useful?  It is precisely that SHACL
can validate that the instances of a class in an RDF graph have property
values that determine their IRIs.  If this constraint component couldn't be
used for this purpose then there would not be any reason to have it.

Can this constraint component be used for other purposes?  Sure, it could be
used as input to a DB-extraction tool to tell it how to construct the IRIs for
nodes that it creates.  That certainly adds to the utility of this constraint
component.  Does this extra potential use mean that the tool has to do a
little bit of extra work if this constraint component was allowed in node and
inverse property constraints?  Probably a little bit.  The tool needs to find
the appropriate constraint components in a shapes graph, so it has to
comprehend SHACL shapes graphs.  The only extra that it might need to do is to
check that the constraint component occurs only in property constraints
because it can't do anything useful if it isn't.


As far as implementation goes, this constraint component is slightly unusual
because it does two things.  It checks that there is a single property value
and then checks that the focus node's string value is suitable.

As boilerplate solution could look something like

SELECT $this WHERE {
 { SELECT $this WHERE {
     [boilerplate]
   } HAVING ( COUNT ( DISTINCT ?value ) != 1 )
 } UNION {
   [boilerplate]
   BIND (CONCAT($uriStart, ENCODE_FOR_URI(str(?value))) AS ?uri) .
   FILTER (str(?this) != ?uri)
 }
}

peter



On 06/07/2016 07:11 PM, Holger Knublauch wrote:
> On 7/06/2016 0:54, Peter F. Patel-Schneider wrote:
>> Are you proposing that there should be a constraint component for primary
>> keys?  I don't see any description of how this would work in property
>> constraints so how can anyone determine how it would work in other
>> constraints.  If you are not proposing that there be a constraint component
>> for primary keys then I don't see any relevance to the discussion here.
> 
> I would like to elaborate this use case a bit because it has implications on
> ISSUE-139.
> 
> The "Primary Keys" feature has been mentioned in our Use Cases deliverable as
> UC25 [1] and in the wiki
> 
> https://www.w3.org/2014/data-shapes/wiki/Primary_Keys_with_URI_Pattern
> 
> This is a feature that has been in successful use based on SPIN in TopBraid
> products for a couple of years now. I have since ported it to SHACL as part of
> the DASH namespace. I have pasted a source code snippet to the bottom of this
> email.
> 
> An example in SHACL would be
> 
> ex:CountryShape
>     a sh:Shape ;
>     sh:scopeClass ex:Country ;
>     sh:property [
>         sh:predicate ex:countryCode ;
>         dash:uriStart "http://example.org/Country-" ;
>     ] .
> 
> A valid instance would be
> 
> ex:Country-de
>   rdf:type ex:Country ;
>   ex:countryCode "de" .
> 
> An invalid instance would be
> 
> ex:Country-incorrect
>   rdf:type ex:Country ;
>   ex:countryCode "en" .
> 
> The rule is that if a property has a dash:uriStart then that property serves
> as "primary key" which means that it must have exactly one value and the URI
> of the subject must be the uriStart + the value of the primary key, e.g.
> "ex:Country-de" for a primary key value of "de" and a uriStart of "ex:Country-".
> 
> This constraint component highlights an important strength of SHACL: It
> provides machine-readable definitions of constraints that can be used for
> validation purposes but also many other use cases. In our particular case, we
> are using the primary key to produce well-formed URIs from a primary key value
> when a new instance is created. I have attached a screenshot of TopBraid
> showing a dialog in which the user just enters the name of a Country and its
> country code, and the URI gets produced automatically.
> 
> The fact that this constraint can also be used for constraint checking is a
> great way of locking in a contract in the model, but the dash:uriStart triples
> can be queried by user interface tools too, and that use case is far more
> important than constraint validation here. For example, once we know that a
> primary key exists, looking up the URI for a given country is trivial and
> doesn't even require a query against the database.
> 
> Clearly, this constraint component makes no sense for either inverse
> properties or node constraints. For inverse property constraints it makes no
> sense because the values could never be literals, making them unsuitable to
> build new URIs. For node constraints, it would even be very hard to even come
> up with an explanation of what it could possibly mean. In a node constraint,
> we would have $this = $value, i.e. the subject that is supposed to have a
> certain URI is the same as the value of the primary key! Such a constraint is
> impossible to fulfill because the URI would have to include itself
> recursively. This is just to show how silly such examples can become with the
> strict policy suggested here in this ticket.
> 
> By having a sh:context triple (see below), the creator of such a constraint
> component can clearly communicate how this component is supposed to be used
> and where it should not even be offered as a choice.
> 
> And the query of the validator (below) is quite irregular and would not fit
> into any of the proposed "boilerplate" generalizations.
> 
> This example demonstrates that the proposals to have only one validator per
> constraint component, and to always allow every constraint component, make
> SHACL fail to address real-world requirements.
> 
> Holger
> 
> [1] https://www.w3.org/TR/shacl-ucr/#uc25-primary-keys-with-uri-patterns
> 
> 
> 
> dash:PrimaryKeyConstraintComponent
>   rdf:type sh:ConstraintComponent ;
>   rdfs:comment "Enforces a constraint that the given property (sh:predicate)
> serves as primary key for all resources in the scope of the shape. If a
> property has been declared to be the primary key then each resource must have
> exactly one value for that property. Furthermore, the URIs of those resources
> must start with a given string (dash:uriStart), followed by the URL-encoded
> primary key value. For example if dash:uriStart is
> \"http://example.org/country-\" and the primary key for an instance is \"de\"
> then the URI must be \"http://example.org/country-de\". Finally, as a result
> of the URI policy, there can not be any other resource with the same value
> under the same primary key policy." ;
>   rdfs:label "Primary key constraint component" ;
>   sh:context sh:PropertyConstraint ;
>   sh:labelTemplate "The property {?predicate} is the primary key and URIs
> start with {?uriStart}" ;
>   sh:parameter [
>       sh:predicate dash:uriStart ;
>       sh:datatype xsd:string ;
>       sh:description "The start of the URIs of well-formed resources." ;
>       sh:name "URI start" ;
>     ] ;
>   sh:propertyValidator [
>       rdf:type sh:SPARQLSelectValidator ;
>       sh:select """SELECT $this ($this AS ?subject) $predicate (?value AS
> ?object) ?message
> WHERE {
>     {
>         FILTER NOT EXISTS {
>             ?this $predicate ?any .
>         } .
>         BIND (\"Missing value for primary key property\" AS ?message) .
>     }
>     UNION
>     {
>         FILTER (dash:valueCount(?this, $predicate) > 1) .
>         BIND (\"Multiple values of primary key property\" AS ?message) .
>     }
>     UNION
>     {
>         FILTER (dash:valueCount(?this, $predicate) = 1) .
>         ?this $predicate ?value .
>         BIND (CONCAT($uriStart, ENCODE_FOR_URI(str(?value))) AS ?uri) .
>         FILTER (str(?this) != ?uri) .
>         BIND (CONCAT(\"Primary key value \", str(?value), \" does not align
> with the expected URI \", ?uri) AS ?message) .
>     } .
> }""" ;
>     ] ;
> .
> 

Received on Thursday, 9 June 2016 14:13:48 UTC