- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Fri, 3 Feb 2017 20:10:13 -0800
- To: public-rdf-shapes@w3.org
I took a quick look at the recent working draft of https://www.w3.org/TR/shacl/ dated 02 February 2017. The document says that the next version of the document is planned to be a Candidate Recommendation but does not provide a schedule for comments for this version of the document. Nor does the document state a schedule for responses to comments on previous working drafts of this document that have not yet received substantive responses from the working group. In this quick look I examined the document to see if some of the major problems with the document have been solved. What I found is that the three major problems I first looked at remain unsolved. Each of them still needs significant work. Each of them prevents reviewers of the document from providing fully informed reviews of the definition of SHACL. Given that there are at least these three major, pervasive problems in the document, I don't see that detailed comments on the rest of the document will be very worthwhile at this time. Pre-binding: There has never been a definition of pre-binding that meets the needs of SHACL. The definition of pre-binding in this version of the document is no different. Pre-binding is only defined for a solution mapping and a graph pattern. However, all uses of pre-binding in SHACL are for a solution mapping and a query so, in effect, there is no definition of pre-binding at all in this document. As well, there is no demonstration that the current definition of pre-binding is well-behaved even where it is defined. The document that is stated to be the source of the definition of pre-binding for SHACL is a document that has not been accepted by anyone other than the author of the document as far as I can tell. Saying that it is the draft of a WG CG report is giving a false impression of its effective status. The unsuitability of this definition of pre-binding has been already reported in https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html but there is no indication in the working draft that there are any problems with pre-binding. The lack of such an indication in the document means that reviewers may miss the fact that much of the document has fundamental problems. As pre-binding is a central part of SPARQL-SHACL and is also used to describe much of SHACL Core it is not possible for reviewers to provide fully informed comments on large parts of SHACL at this time. As there is as of yet no suitable definition provided for pre-binding even though the problems with it have been known since at least June of 2015 it will be better at this late stage to simply remove all parts of SHACL and the SHACL document that depend on pre-binding. Shapes: The way that shapes are formed and used in SHACL remains a severe problem. There are shapes, node shapes, and property shapes. There are also three RDF terms that are related to shapes: sh:Shape, sh:NodeShape, and sh:PropertyShape. There is much confusing wording on how these all work together. First, there is "sh:NodeShape and sh:PropertyShape can be used to represent node and property shapes". How do these RDF terms represent anything? Second, there are what appear to be the main definitions of node shapes and property shapes. "A node shape is a shape in the shapes graph that is not the subject of a triple with sh:path as its predicate." "A property shape is a shape in the shapes graph that is the subject of a triple that has sh:path as its predicate." What is the role of sh:NodeShape and sh:PropertyShape if the definition of node shapes and property shapes doesn't even refer to them? This is only reinforced by "However, the presence of any rdf:type triple does not determine whether a node is treated as a node shape or not." "However, the presence of any rdf:type triple does not determine whether a node is treated as a property shape or not." Third, there are what appear to be alternative definitions of node shapes and property shapes. "sh:NodeShape is the class of node shapes and should be declared as a type for shapes that are IRIs." "sh:PropertyShape is the class of property shapes and should be declared as a type for shapes that are IRIs." There are multiple problems with these alternative definitions. For starters, there is no description in SHACL of what it means to be the class of anything. Next, there is no description in SHACL of how to declare a type for anything. Further, there is the strong suggestion here that shapes that are IRIs should somehow have both sh:NodeShape and sh:PropertyShape declared as their type, which doesn't make sense at all. Fourth, the conditions to be a shape include being a SHACL instance of sh:NodeShape or sh:PropertyShape, but not sh:Shape. This contradicts the normative statements that rdf:type triples are irrelevant for determining whether a node is a node or property shape. It is also exceedingly weird as sh:Shape is previously indicated to be somehow related to shapes, but being a SHACL instance of sh:Shape in an RDF graph doesn't make a node a shape in the graph. As sh:Shape is the natural RDF term for the type of shapes, users will use it over sh:NodeShape and sh:PropertyShape. Aside from these problems with node shapes and property shapes, there are problems with the definitions that shapes depend on. For example, shapes graphs are defined too narrowly. SHACL validation processes don't always validate a data graph against the shapes in another graph, but shapes graphs are not defined for these other situations. All this ends up with a big mess. It appears that it is possible to use sh:NodeShape and sh:PropertyShape in ways counter to what appears to be their intended meaning. For example, ex:s1 rdf:type sh:NodeShape ; sh:targetClass ex:Person ; sh:path ex:child ; sh:nodeKind sh:IRI . appears to be form a constraint on the children of people even though the type of the shape is sh:NodeShape. What needs to be done is to get rid of sh:NodeShape and sh:PropertyShape. They serve no useful purpose. They will only produce confusion. Then the defintions underlying shapes need to be corrected. Because of these significant and pervasive problems with shapes in SHACL, reviewers cannot provide fully informed commments on the SHACL document at this time. Validation results and reports: A validation report is the result of validation. It is an RDF graph where some nodes are validation results reporting on constraints that were not satisifed. There are serious problems in how validation reports are generated and the form of validation reports. The first problem is the generation of validation results. Throughout the definitions of SHACL Core constraint components there is wording like "For each value node [...], a validation result MUST be produced with the value node as sh:value." and "If [...], a validation result MUST be produced." This means that each SHACL processor must produce these validation results to be a conforming implementation of SHACL. The processor must produce these validation results no matter whether they are going to show up in the final validation report or not. The processor must produce these validation results even if it not going to return a validation report at all. This mixing of conformance requirements into the definition of validation introduces an unnecessary and problematic procedural aspect into the underlying definitions of SHACL. Although it is mandated that a SHACL processor much produce these validation results it is completely unclear how many must be produced. A SHACL processor may end up checking whether a particular node satisfies a particular constraint numerous times. Must it produce a validation result for each of these times? Must it only produce one validation result for all of these times? Or is the number of times it produce a validation result undetermined? This multiplicity problem can show up at top-level due to converging sh:property chains. The second problem is the form of a validation report. There is insufficient guidance on how multiple validation results are to be produced. For example, can a single validation result have multiple values for sh:value, making it a validation report for multiple violations? Similarly, if a shape has two sh:ClassConstraintComponent constraints, can a single validation report be used for violations from both of them? Without better guidance on these issues it will be very difficult to determine just violations occured from a validation report. The third problem is just what validation results are to be included in a validation report and which of these are to be values of sh:result for the single node in the graph that is a SHACL instance of sh:ValidationReport. There is "Only the validation results that are not object of any sh:details triple in the results graph are top-level results." and "The property sh:detail may link a (parent) result with one or more other (child) results that provide further details about the cause of the (parent) result." So a validation process has to produce validation results which then end up in the validation report if they are not values for sh:details triples. What happens if a validation result comes from violation of a constraint that is both directly at top level (e.g., from a property shape that is value of sh:property for a shape that has targets) and not at top level (e.g., from the same property shape as before that is linked to the shape with targets via a combination of sh:node and sh:property triples)? Can a SHACL processor use sh:detail to collect that otherwise might be top-level validation results? There are also some other minor problems with validation reports. For example, there is the requirement that "A validation report has exactly one value for the property sh:conforms that is of datatype xsd:boolean." However, the result of validation is an RDF graph and RDF graphs so this requirement doesn't make sense. The definitions underlying validation reports need to be carefully examined to eliminate problems like these. Much of the description of how validation reports are generated and what they contain need to be rewritten to remove any procedural aspects and to suitably describe the contents of validation resports. As this will change large portions of the document, reviewers cannot provide fully informed commments on it at this time.
Received on Saturday, 4 February 2017 04:10:53 UTC