- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Tue, 1 Sep 2015 17:40:08 -0700
- To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Review of Shapes Constraint Language (SHACL) W3C Editor's Draft 10 August 2015 Summary: There are many problems with this document. It has multiple technical flaws. It has undefined terms. It is misleading. It does not accurately reflect the state of working group deliberations. I deem it to not be suitable for publication as a first public work draft of the working group. Changing the document to make it suitable would probably touch every subsection and almost every paragraph of the initial sections. Overall Comments: This document is likely to be the first document about SHACL that interested parties will read, so I am reviewing it with this in mind. I'm assuming that the reader is quite familiar with RDF and somewhat familiar with SPARQL. I find it very difficult to get a notion of what SHACL is from reading the first portions of the document. Not only are there many incorrect or misleading statements, there is no overall description of what SHACL is for and the example inappropriately centrally includes shapes as classes. The initial section of the document appears to be trying to be like a mini-primer but it doesn't work for this purpose because of the lack of an overall description. Compare this document with Core SHACL Semantics at http://w3c.github.io/data-shapes/semantics/ which provides a short but clear notion of what that version of SHACL is for. I have detailed comments on the Abstract and Section 1 below. I have somewhat less detailed comments on later sections. I fear that releasing the current document as a FPWD will create a large amount of confusion. In my opinion someone needs to do a full rewrite of the document before it is suitable for publication. Abstract "SHACL (Shapes Constraint Language) is an RDF vocabulary for describing RDF graph structures." This is not the purpose of SHACL, at least as far as I am concerned. Further, SHACL is not an RDF vocabulary. Something along the lines of 'SHACL (...) is a language for checking constraints against RDF graphs.' Subsequent sentences in the abstract should be changed to match this. "Additional constraints can be associated with shapes using SPARQL and similar executable languages. These executable languages can also be used to define new high-level vocabulary terms." Given that there are no provisions for languages other than SPARQL at the present time, this feature should not be mentioned in the abstract. Instead just use 'Additional constraints can be associated with shapes using SPARQL.' 1. Introduction "The SHACL vocabulary includes high-level concepts to represent restrictions on predicates used in triples." SHACL does not work this way, but instead can be used to check whether constraints on the information in an RDF graph are satisfied or not. SHACL does not have high-level concepts. Replace with "The SHACL language includes high-level constructs to represent constraints on the information in RDF graphs." "Some users and implementors will be content with using the high-level shapes language only" The idea of partial implementations does not belong here. Replace with "Some users will be content with using the high-level language only". "express other restrictions in an executable language such as SPARQL" There is no need to say that SPARQL is executable. Replace with 'express other conditions as SPARQL queries' and adjust the wording in the rest of the paragraph accordingly. "SHACL definitions are represented in RDF and can be serialized in multiple RDF formats. The example snippets in this document use Turtle [turtle] and (not written yet:) JSON-LD [json-ld] notation." SHACL does not have definitions. If there is no JSON-LD yet, then don't mention it. Replace with 'SHACL is written in RDF and can be serialized in multiple RDF formats. The examples in this document use Turtle [turtle]." 1.2 Overview and Terminology of Core Features This initial example centrally uses a controversial feature - shapes as classes. It needs to be rewritten to not use this feature. I have not further commented on problems in this section related to shapes as classes. "Each ex:Issue must point to exactly one user via the property ex:submittedBy and may have one value for ex:assignedTo." SHACL doesn't do ``must''. Instead in SHACL one writes constraints and then checks to see whether the constraints are satisfied. Replace with 'One shape requires that each instance of ex:Issue has exactly one value for ex:submittedBy, that that value is an instance of schema:Person that satisfies the constraints on submitters, has at most one value for ex:assignedTo, and that that value is an instance of schema:Person.' "Users that have submitted an issue must also have a schema:email address, so that they can be notified when the issue has been updated." SHACL does not have an facilities for email notification. Replace with 'The submitter shape requires at least one value for schema:email.' "Each constraint defines a condition that can be validated against a graph." Constraints do not work this way. There needs to be some high-level description of how SHACL works before constraints are described. "One of the operations that SHACL engines should support validates that a given RDF node matches a given shape." There should not be discussion of SHACL engines here. Instead the features of the SHACL language should be described. Replace with 'One of the operations of SHACL is validating a given RDF node against a given shape.' "This operation can be invoked based on any control logic, ...." As this section is describing the basic part of SHACL, it is not appropriate to talk about external control logic here. Just remove the entire sentence. "Another supported operation" Replace with "Another operation". I have previously commented on the problems with the description of validation and violations in this part of the document. "The following example code" I don't see any code here. Replace by "The following RDF graph" "a SHACL engine may follow" How does ``may'' come into this at all? This needs to be replaced by some description of how SHACL is invoked on the control graph and the data graph. 1.3 Overview and Terminology of Advanced Features "The validation of each constraint is formalized with one or more execution languages." I don't see how anyone who does not already know about SHACL can figure out what this is saying. Replace with something like 'Each constraint is SHACL is defined in terms of a SPARQL 1.1 query.' and then adjust the rest of the paragraph accordingly. 2. Shapes "A Constraint defines restrictions on the structure of an RDF graph." I find this rather misleading. "A Shape is a group of constraints that have the same focus nodes." There are lots more things that go into a shape. Replace the entire paragraph with a concise description of what shapes are for that includes scopes and filters. This entire section needs to be reordered and rewritten to talk about what shapes are for and how they work This description should describe how shapes use constraints, scopes, and filters. Talk about rdfs:label and rdfs:comment, if retained, should be moved to a much less prominent place. Talk about shapes and classes should be moved to the advanced topics sections. "all instances of the class are expected to have the shape" SHACL doesn't do expectations. All that it can do is validation. "shape declarations can specialize the shapes associated with superclasses" SHACL does not have any notion of specialization of shapes. These paragraphs need to be rewritten to conform with how SHACL works. UML diagrams should not be used as readers may not have adequate knowledge of UML. There are a number of SHACL vocabulary terms that show up in the section but that are not discussed. If a vocabulary term is important enough to show up here, then it is important enough that it needs to be discussed here. 3. Property Constraints I find the introduction to this section very confusing. Focus nodes don't have properties. Property constraints and inverse property constraints aren't on the objects or subjects of triples. If property constraints have to be instances of sh:PropertyConstraint, then blank node property constraints also have to be, so they can't be missing the rdf:type triple. This special treatment of blank nodes occurs in other places as well and should be eliminated. The use of rdfs:label for human-readable properties in context leaves the notion of context undefined. When should tools prefer the labels in property constraints? Similarly for rdfs:comment. There is also no guidance on even when sh:defaultValue might be used by user interface tools. The sentence including "is assumed to be an exhaustive" is not needed and only serves to confuse the matter. The wording "A violation must be reported" appears in several places, but it is not correct. Not all constraint violations end up being reported, for example, those in negated contexts. What are the instances of rdfs:Datatype? How is it determined whether something is an instance of rdfs:Class? sh:text should be defined as a datatype, so that it does not need any special treatment. The treatment of sh:valueClass just says "subclasses" in the description, and then provides a non-standard version of subclass. This discrepancy needs to be prominently mentioned. In several places properties have default value, e.g., the default value for sh:minLength is 0. There is no reason to even talk about such default values as the default value is the value that has no effect. In several places properties are mentioned as being optional, e.g., sh:minCount. There is no need for such discussion, as all the properties in the 3.1.x sections are optional. Mentioning that some of these properties are optional begs the question as to whether the other properties are mandatory. There is no discussion of whether properties can be repeated. Even if this information is presented later, it should also be presented here, particularly if some properties can be repeated and others cannot. "restrict the string length of triples" needs to be changed to instead talk about values. There is no need to create extra classes for syntactic categories. For example, sh:nodeKind is unnecessary. The textual definition for sh:valueClass is written the wrong way around. Instead of having a mismatched rdf:type value being the triggering condition it should be not having a matching rdf:type value. Untyped non-literals do not need special treatment. Matching a shape is not a defined notion. [I took a quick look at the newest version of the document, and the changes do not help.] There is no discussion on whether constraint violations inside embedded shapes are to be handled specially. Similar problems occur in several other places. Having a simple definition giving access to the correct value sets would permit the property constraint bits to be used in both directions. 3.1.12 from version current on 28 August "error-level constraint violations" is not defined. The textual definition is reversed - counting the number of values that do not validate instead of those that do validate. The syntax is problematic, as there might be multiple qualified value shapes. 4. Other Core Constraints on Shapes These are not constraints on shapes, but are instead just constraints. There is no indication that violations reported from within sh:NotConstraint constraints are to be treated in any special way. There is no defined notion of a node matching a shape. Evaluation of and and or is in order and short-circuiting. A note should be added that this does not conform to a recent working group resolution. There is no definition of "error-level constraint violations". There should be some discussion on the difference between top-level constraints and constraints inside an and. The general definition of XOR with multiple arguments is that it is true precisely when an odd number of the arguments are true. 4.5 from version current on 28 August What happens if a SHACL control file adds ignored properties to sh:ClosedShape? 5. Scopes and Filter Shapes The wording at the beginning of Section 5 reads like it is optional for SHACL processors to use scopes. There needs to be some introductory material that discusses how SHACL validation works with scopes and filters. In my opinion individual scope links should point from the shape to the node. This would align them with class-based scopes. It would also mean that the property would not need to be excluded when closing shapes. The discussion of class-based scopes needs to be clear that the definition of instance is different from that in RDFS or OWL. There should be a mention in 5.1.2 that there is an issue related to the interaction of classes and shapes. The discussion of rdf:type is incorrect. Even for classes that are shapes there might not be an rdf:type triple linking a resource with its shapes. In fact, there is no real notion of the shapes of a node defined in SHACL at all. This wording is another attempt to make SHACL a modelling language. Are multiple scopes conjunctive, disjunctive, or independent? The discussion of filter shapes on shapes ("applies to all constraints") does not match the discussion at the beginning of the section. 6. Constraint Violations Vocabulary There is no definition of constraint validation operation or even of constraint validation. [The glossary in the version current as of 28 August does not define constraint validation or constraint validation operation in any useful fashion.] What happens if a constraint violation node has an rdf:type link to rdfs:Resource? According to the document it will not be a constraint violation, but that's crazy. There is no discussion of how constraint severity works. [From here on in, I have done a less thorough examination.] 7. General Shape Constraints How does "MAY reference a focus node" interact with being a constraint? The discussion of general shape constraints based on a template does not make any sense, as templates have not yet been adequately introduced. In many cases there will not be a node that can be used to represent the entire graph. How then can graph-level constraints be defined? 8. Templates There is no notion of SHACL instantiation defined. If the entire core profile is templates, then say so. If not, say so. The wording concerning the relationship between templates and the core profile is extremely confusing as it stands. How are templates accessed? SHACL doesn't have rules or stored queries. Is rdfs:subClassOf important for templates? If so, how? If not, why the restriction related to template superclasses? [By this point, I was just trying to get to the end, and may have missed some problems.] 11. Supported Operations I find this section extremely difficult to understand. Some of the information given here needs to be mentioned much earlier. SHACL engines MUST support SHACL operations. All of the operations in this section are missing the control graph as an argument. The operations should have the data graph as an explicit argument. There is no need to have templates arranged in a class hierarchy. sh:NativeConstraint is not adequately defined. There are numerous parts of the pseudo-code that don't make sense, e.g., "is at least ?minSeverity". All of the interface arguments are missing the data and control graphs as arguments. There is nothing on the programming langauge types that are to be used. 12. Functions There is no method provided for calling a SHACL function from within SPARQL code. 14. SPARQL-based Execution There is an assumption that the SHACL control graph is accessible to the SPARQL code. There is a working group issue related to this, which needs to be referenced. There is no indication that SPARQL-based execution cannot be done using a standard SPARQL engine. The values of sh:sparql are not strings that are syntactically valid SPARQL queries. (See the beginning of 14.2 - they can be fragments. They also can be missing prefix declarations.) There is no notion of the defining graph in SHACL. It is not true that sh:sparql values need not declare any prefixes. There could be prefixes there that do not occur elsewhere, and such prefixes would have to be declared in the sh:sparql value. There is no indication that executing the SPARQL queries cannot be done in a standard SPARQL implementation. The execution of function (and other template, I expect) bodies is not a SHOULD relationship. Instead it is a MUST unless the alternative produces the same results. Wording problems that exist in multiple places: "SHACL RDF vocabulary" See above for why this is wrong. Replace by 'SHACL Language'. Variations of this also exist, e.g., "an RDF vocabulary", "SHACL vocabulary". "restriction" See above for why this is wrong. Each occurence needs to be examined for a suitable replacement. "subclass" and "instance" These words are used loosely and differently in different places. Each place they are used needs to be examined to ensure either that a standard meaning is being used or that the deviation from the standard is prominently described.
Received on Wednesday, 2 September 2015 00:40:43 UTC