Re: Shapes Constraint Language (SHACL) Working Draft of 2017-02-02 from Peter F. Patel-Schneider on 2017-02-08 (public-rdf-shapes@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 8 Feb 2017 07:47:42 -0800
To: Holger Knublauch <holger@topquadrant.com>, public-rdf-shapes@w3.org
Message-ID: <cd7db8ea-5d15-dad8-2505-639c6788e711@gmail.com>
OK, thanks for the interim response.

peter


On 02/07/2017 10:51 PM, Holger Knublauch wrote:
> This is primarily a housekeeping email.
> 
> On 4/02/2017 14:10, Peter F. Patel-Schneider wrote:
>> I took a quick look at the recent working draft of
>> https://www.w3.org/TR/shacl/ dated 02 February 2017.
>>
>> The document says that the next version of the document is planned to be a
>> Candidate Recommendation but does not provide a schedule for comments for
>> this version of the document.  Nor does the document state a schedule for
>> responses to comments on previous working drafts of this document that have
>> not yet received substantive responses from the working group.
>>
>> In this quick look I examined the document to see if some of the major
>> problems with the document have been solved.  What I found is that the three
>> major problems I first looked at remain unsolved.  Each of them still needs
>> significant work.  Each of them prevents reviewers of the document from
>> providing fully informed reviews of the definition of SHACL.  Given that
>> there are at least these three major, pervasive problems in the document, I
>> don't see that detailed comments on the rest of the document will be very
>> worthwhile at this time.
>>
>>
>> Pre-binding:
>>
>> There has never been a definition of pre-binding that meets the needs of
>> SHACL.  The definition of pre-binding in this version of the document is no
>> different.  Pre-binding is only defined for a solution mapping and a graph
>> pattern.  However, all uses of pre-binding in SHACL are for a solution
>> mapping and a query so, in effect, there is no definition of pre-binding at
>> all in this document.
>>
>> As well, there is no demonstration that the current definition of
>> pre-binding is well-behaved even where it is defined.
>>
>> The document that is stated to be the source of the definition of
>> pre-binding for SHACL is a document that has not been accepted by anyone
>> other than the author of the document as far as I can tell.  Saying that it
>> is the draft of a WG CG report is giving a false impression of its effective
>> status.
> 
> I have removed that reference - it was there in the hope that the EXISTS CG
> may actually converge on something in time.
> 
>>
>> The unsuitability of this definition of pre-binding has been already reported
>> in https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
>> but there is no indication in the working draft that there are any problems
>> with pre-binding.  The lack of such an indication in the document means that
>> reviewers may miss the fact that much of the document has fundamental
>> problems.
> 
> I have meanwhile added this reference in.
> 
>>
>> As pre-binding is a central part of SPARQL-SHACL and is also used to
>> describe much of SHACL Core it is not possible for reviewers to provide
>> fully informed comments on large parts of SHACL at this time.  As there is
>> as of yet no suitable definition provided for pre-binding even though the
>> problems with it have been known since at least June of 2015 it will be
>> better at this late stage to simply remove all parts of SHACL and the SHACL
>> document that depend on pre-binding.
> 
> We have already raised ISSUE-222 on this topic. I also notice ongoing
> discussion between you and Andy on the
> https://lists.w3.org/Archives/Public/public-sparql-exists/2017Feb/ EXISTS
> mailing list.
> 
>>
>>
>> Shapes:
>>
>> The way that shapes are formed and used in SHACL remains a severe problem.
>>
>> There are shapes, node shapes, and property shapes.  There are also three
>> RDF terms that are related to shapes: sh:Shape, sh:NodeShape, and
>> sh:PropertyShape.
>>
>> There is much confusing wording on how these all work together.
>>
>> First, there is "sh:NodeShape and sh:PropertyShape can be used to represent
>> node and property shapes".  How do these RDF terms represent anything?
>>
>> Second, there are what appear to be the main definitions of node shapes and
>> property shapes.
>> "A node shape is a shape in the shapes graph that is not the subject of a
>> triple with sh:path as its predicate."
>> "A property shape is a shape in the shapes graph that is the subject of a
>> triple that has sh:path as its predicate."
>> What is the role of sh:NodeShape and sh:PropertyShape if the definition
>> of node shapes and property shapes doesn't even refer to them?
>> This is only reinforced by
>> "However, the presence of any rdf:type triple does not determine whether a
>> node is treated as a node shape or not."
>> "However, the presence of any rdf:type triple does not determine whether a
>> node is treated as a property shape or not."
>>
>> Third, there are what appear to be alternative definitions of node shapes and
>> property shapes.
>> "sh:NodeShape is the class of node shapes and should be declared as a type
>> for shapes that are IRIs."
>> "sh:PropertyShape is the class of property shapes and should be declared as a
>> type for shapes that are IRIs."
>> There are multiple problems with these alternative definitions.  For
>> starters, there is no description in SHACL of what it means to be the class
>> of anything.  Next, there is no description in SHACL of how to declare a
>> type for anything.  Further, there is the strong suggestion here that shapes
>> that are IRIs should somehow have both sh:NodeShape and sh:PropertyShape
>> declared as their type, which doesn't make sense at all.
>>
>> Fourth, the conditions to be a shape include being a SHACL instance of
>> sh:NodeShape or sh:PropertyShape, but not sh:Shape.  This contradicts the
>> normative statements that rdf:type triples are irrelevant for determining
>> whether a node is a node or property shape.  It is also exceedingly weird as
>> sh:Shape is previously indicated to be somehow related to shapes, but being
>> a SHACL instance of sh:Shape in an RDF graph doesn't make a node a shape in
>> the graph.  As sh:Shape is the natural RDF term for the type of shapes,
>> users will use it over sh:NodeShape and sh:PropertyShape.
>>
>> Aside from these problems with node shapes and property shapes, there are
>> problems with the definitions that shapes depend on.  For example, shapes
>> graphs are defined too narrowly.  SHACL validation processes don't always
>> validate a data graph against the shapes in another graph, but shapes graphs
>> are not defined for these other situations.
>>
>> All this ends up with a big mess.  It appears that it is possible to use
>> sh:NodeShape and sh:PropertyShape in ways counter to what appears to be
>> their intended meaning.  For example,
>>    ex:s1 rdf:type sh:NodeShape ;
>>      sh:targetClass ex:Person ;
>>      sh:path ex:child ;
>>      sh:nodeKind sh:IRI .
>> appears to be form a constraint on the children of people even though the
>> type of the shape is sh:NodeShape.
>>
>> What needs to be done is to get rid of sh:NodeShape and sh:PropertyShape.
>> They serve no useful purpose.  They will only produce confusion.  Then the
>> defintions underlying shapes need to be corrected.  Because of these
>> significant and pervasive problems with shapes in SHACL, reviewers cannot
>> provide fully informed commments on the SHACL document at this time.
> 
> I have raised ISSUEs 223 and 224 for this topic.
> 
>>
>>
>> Validation results and reports:
>>
>> A validation report is the result of validation.  It is an RDF graph where
>> some nodes are validation results reporting on constraints that were not
>> satisifed.  There are serious problems in how validation reports are
>> generated and the form of validation reports.
>>
>> The first problem is the generation of validation results.  Throughout the
>> definitions of SHACL Core constraint components there is wording like "For
>> each value node [...], a validation result MUST be produced with the value
>> node as sh:value." and "If [...], a validation result MUST be produced."
>> This means that each SHACL processor must produce these validation results
>> to be a conforming implementation of SHACL.
>>
>> The processor must produce these validation results no matter whether they
>> are going to show up in the final validation report or not.  The processor
>> must produce these validation results even if it not going to return a
>> validation report at all.  This mixing of conformance requirements into the
>> definition of validation introduces an unnecessary and problematic
>> procedural aspect into the underlying definitions of SHACL.
>>
>> Although it is mandated that a SHACL processor much produce these validation
>> results it is completely unclear how many must be produced.  A SHACL
>> processor may end up checking whether a particular node satisfies a
>> particular constraint numerous times.  Must it produce a validation result
>> for each of these times?  Must it only produce one validation result for all
>> of these times?  Or is the number of times it produce a validation result
>> undetermined?  This multiplicity problem can show up at top-level due to
>> converging sh:property chains.
>>
>> The second problem is the form of a validation report.  There is
>> insufficient guidance on how multiple validation results are to be
>> produced.  For example, can a single validation result have multiple values
>> for sh:value, making it a validation report for multiple violations?
>> Similarly, if a shape has two sh:ClassConstraintComponent constraints, can
>> a single validation report be used for violations from both of them?
>> Without better guidance on these issues it will be very difficult to
>> determine just violations occured from a validation report.
>>
>> The third problem is just what validation results are to be included in a
>> validation report and which of these are to be values of sh:result for the
>> single node in the graph that is a SHACL instance of sh:ValidationReport.
>> There is "Only the validation results that are not object of any sh:details
>> triple in the results graph are top-level results." and "The property
>> sh:detail may link a (parent) result with one or more other (child) results
>> that provide further details about the cause of the (parent) result."
>> So a validation process has to produce validation results which then end up
>> in the validation report if they are not values for sh:details triples.
>> What happens if a validation result comes from violation of a constraint
>> that is both directly at top level (e.g., from a property shape that is
>> value of
>> sh:property for a shape that has targets) and not at top level (e.g., from
>> the same property shape as before that is linked to the shape with targets
>> via a combination of sh:node and sh:property triples)?  Can a SHACL
>> processor use sh:detail to collect that otherwise might be top-level
>> validation results?
>>
>> There are also some other minor problems with validation reports.  For
>> example, there is the requirement that "A validation report has exactly one
>> value for the property sh:conforms that is of datatype xsd:boolean."
>> However, the result of validation is an RDF graph and RDF graphs so this
>> requirement doesn't make sense.  The definitions underlying validation
>> reports need to be carefully examined to eliminate problems like these.
>>
>> Much of the description of how validation reports are generated and what
>> they contain need to be rewritten to remove any procedural aspects and to
>> suitably describe the contents of validation resports.  As this will change
>> large portions of the document, reviewers cannot provide fully informed
>> commments on it at this time.
>>
> 
> I have raised ISSUE-225 for this topic.
> 
> Holger
> 
>
Received on Wednesday, 8 February 2017 17:26:35 UTC