Re: review of SHACL document (Second pass of responses) from Holger Knublauch on 2015-09-03 (public-data-shapes-wg@w3.org from September 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 3 Sep 2015 15:53:36 +1000
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <55E7E060.1060309@topquadrant.com>
Hi Peter,

here is my second pass through your comments.

On 9/2/2015 10:40, Peter F. Patel-Schneider wrote:
> Abstract
>
>
> "Additional constraints can be associated with shapes using SPARQL and
> similar executable languages. These executable languages can also be used to
> define new high-level vocabulary terms."
> Given that there are no provisions for languages other than SPARQL at the
> present time, this feature should not be mentioned in the abstract.
> Instead just use 'Additional constraints can be associated with shapes using
> SPARQL.'

The resolution to ISSUE-60 was that SHACL shall support other languages 
beside SPARQL. I know you voted against this, yet this is what the rest 
of the WG currently prefers. The spec includes an API-like interface 
ExecutionLanguage in section 11.5. While that is a far from perfect 
formal description at this stage, the intention remains the same. I 
believe it remains important to keep the door open to non-SPARQL 
communities, and therefore also include it in the abstract.

>
> 1. Introduction
>
>
> "express other restrictions in an executable language such as SPARQL"
> There is no need to say that SPARQL is executable.
> Replace with 'express other conditions as SPARQL queries' and adjust the
> wording in the rest of the paragraph accordingly.

See above, but I have dropped the word "executable".

>
>
> 1.2 Overview and Terminology of Core Features
>
>
>
> "Each constraint defines a condition that can be validated against a graph."
> Constraints do not work this way.
> There needs to be some high-level description of how SHACL works before
> constraints are described.

I have added a paragraph directly after the example, before constraints 
are described.

SHACL can be used to define structural constraints that can be used to 
validate graphs. A SHACL validation process takes two graphs as input:

  * Adata graph(or more general: adataset
    <http://www.w3.org/TR/rdf11-concepts/#section-dataset>), for example
    containing instances of|ex:Issue|
  * Ashapes graphcontaining shape definitions, as shown in the example
    above.

The output of a SHACL validation process is a new graph, containing 
validation results.


This is hopefully a first step - I welcome additions/corrections to such 
a paragraph.

I have also made sure that we use the term "data graph" consistently.

>
>
> "a SHACL engine may follow"
> How does ``may'' come into this at all?  This needs to be replaced by some
> description of how SHACL is invoked on the control graph and the data graph.

See above, and I have replaced "may" with "will".

>
> 1.3 Overview and Terminology of Advanced Features
>
> "The validation of each constraint is formalized with one or more execution
> languages."
> I don't see how anyone who does not already know about SHACL can figure out
> what this is saying.
> Replace with something like 'Each constraint is SHACL is defined in terms of
> a SPARQL 1.1 query.' and then adjust the rest of the paragraph accordingly.

See above on the general topic of execution languages. Meanwhile changed to

"Each constraint is defined in terms of one or more execution languages."

>
>
> 2. Shapes
>
> "A Constraint defines restrictions on the structure of an RDF graph."
> I find this rather misleading.

Ok, sentence deleted. Constraints were already introduced elsewhere.

> "A Shape is a group of constraints that have the same focus nodes."
> There are lots more things that go into a shape.
> Replace the entire paragraph with a concise description of what shapes are
> for that includes scopes and filters.

Ok, changed to:

AShapeis a group of constraints that is to be validated against the same 
focus nodes. Shapes/may/also havescopesthat instruct a SHACL processor 
on how to select those focus nodes, and/may/also havefilter shapesthat 
narrow down the scope. For example, a shape can be used to state that 
all instances of a class must have a certain number of values for a 
given property. In that example, the instances of the class are the 
focus nodes in the scope, and the restriction on property values is 
expressed via a constraint.


I also switched to shape in lower-case everywhere.

>
> This entire section needs to be reordered and rewritten to talk about what
> shapes are for and how they work  This description should describe how
> shapes use constraints, scopes, and filters.  Talk about rdfs:label and
> rdfs:comment, if retained, should be moved to a much less prominent place.

Ok, I have completely restructured this to have the following subsections:

- Scopes
- Filter Shapes
- Shape Constraints
   (I have removed the UML diagram about constraint metaclasses - this 
belongs into the ref document)

I picked that order so that the last section on constraints naturally 
flows into the next sections which have details about the various 
constraint types.

I have moved the content from the previous section 5 into the Scopes and 
Filter subsections here. This deleted some duplicate content.

>
>
>
> There are a number of SHACL vocabulary terms that show up in the section but
> that are not discussed.  If a vocabulary term is important enough to show
> up here, then it is important enough that it needs to be discussed here.

Details would be good, but maybe this has been resolved now.

>
>
> 3. Property Constraints
>
>
> The use of rdfs:label for human-readable properties in context leaves the
> notion of context undefined.  When should tools prefer the labels in
> property constraints?  Similarly for rdfs:comment.

Ok, replaced "context" with "scope" and clarified with an example.

>
> There is also no guidance on even when sh:defaultValue might be used by user
> interface tools.

I am surprised now. I thought you were against default values in 
general, so I had watered down their meaning to basically nothing but a 
faint recommendation. And I can live with the current wording.

>
> What are the instances of rdfs:Datatype?

Nodes that have rdf:type rdfs:Datatype, e.g. xsd:string. I had added the 
most common the SHACL Turtle file, to be sure they are known. Is 
xsd:string not an instance of rdfs:Datatype?

>    How is it determined whether something is an instance of rdfs:Class?

With the usual interpretation - either it has an rdf:type link to 
rdfs:Class, or to a subclass thereof. Is there anything I need to fix?

>
> There is no discussion of whether properties can be repeated.  Even if this
> information is presented later, it should also be presented here,
> particularly if some properties can be repeated and others cannot.

This information is present in the context of templates - each 
sh:argument can have at most one value. But I have added two sentences 
to clarify this:

None of these properties can be repeated within the 
same|sh:PropertyConstraint|. In order to define multiple constraints 
using the same property, such as multiple|sh:hasValue|constraints, the 
shape/must/use multiple|sh:property|definitions.

(this also applies to, for example, sh:qualifiedValueShape)

>
> Matching a shape is not a defined notion.  [I took a quick look at the
> newest version of the document, and the changes do not help.]

I had already removed all references to "matching" shapes. That verb is 
now only used for things like regular expressions.

>
> There is no discussion on whether constraint violations inside embedded
> shapes are to be handled specially.   Similar problems occur in several
> other places.

I will clarify this in the planned rewrite of the operations section. 
Basically all results of nested shape calls (sh:hasShape) are discarded.

>
>
> 3.1.12 from version current on 28 August
>
> "error-level constraint violations" is not defined.

Replaced with "a validation result with severity <code>sh:Error</code> 
or a failure" in several places. Note that this anticipates a resolution 
to the open ISSUE on the types of results, dropping the current 
sh:FatalError.

>
>
> 4. Other Core Constraints on Shapes
>
> There is no indication that violations reported from within sh:NotConstraint
> constraints are to be treated in any special way.  There is no defined
> notion of a node matching a shape.

I believe this is already addressed.

>
> There is no definition of "error-level constraint violations".

See above.

>
> There should be some discussion on the difference between top-level
> constraints and constraints inside an and.

Sounds like the same topic.

>
>
> 5. Scopes and Filter Shapes
>
> The wording at the beginning of Section 5 reads like it is optional for
> SHACL processors to use scopes.  There needs to be some introductory
> material that discusses how SHACL validation works with scopes
> and filters.

I believe this is now better, as the Scope section was moved higher up, 
right underneath the graphical representation of the workflow.

>
> The discussion of class-based scopes needs to be clear that the definition
> of instance is different from that in RDFS or OWL.

(I don't think we need to reference OWL at all - just RDFS.) With the 
new prose, would you be able to point me at a statement that you would 
like to see?

>
> There should be a mention in 5.1.2 that there is an issue related to the
> interaction of classes and shapes.

Ok, ISSUE-23 is mentioned.

>
> The discussion of rdf:type is incorrect.   Even for classes that are shapes
> there might not be an rdf:type triple linking a resource with its shapes.
> In fact, there is no real notion of the shapes of a node defined in SHACL at
> all.  This wording is another attempt to make SHACL a modelling language.

I suggest we await the resolution on ISSUE-23 before discussing this 
further.

>
> Are multiple scopes conjunctive, disjunctive, or independent?

Ok, clarified to be the union.

>
> The discussion of filter shapes on shapes ("applies to all constraints")
> does not match the discussion at the beginning of the section.

The new structure of the paragraph hopefully improved things.
(I have also added a reference to ISSUE-49).

>
>
> 6. Constraint Violations Vocabulary
>
> There is no definition of constraint validation operation or even of
> constraint validation.

Operations are introduced later, but I believe the overall changes have 
improved this terminology. Changed "validation operation" to "validation 
process" for now.

>     [The glossary in the version current as of 28
> August does not define constraint validation or constraint validation
> operation in any useful fashion.]
>
> There is no discussion of how constraint severity works.

I will clean this up after the resolution of the corresponding issue.

Since this has completed the Core vocabulary chapters, I have committed 
the changes that I did so far to master.

I will try to look into the Advanced chapters tomorrow, in a separate 
email. For the record, below is a list of still-to-be-covered topics.

Thanks,
Holger


>
>
> [From here on in, I have done a less thorough examination.]
>
> 7. General Shape Constraints
>
> The discussion of general shape constraints based on a template does not
> make any sense, as templates have not yet been adequately introduced.
>
> 8. Templates
>
> There is no notion of SHACL instantiation defined.
>
> If the entire core profile is templates, then say so.  If not, say so.
> The wording concerning the relationship between templates and the core
> profile is extremely confusing as it stands.
>
> How are templates accessed?
>
> SHACL doesn't have rules or stored queries.
>
> Is rdfs:subClassOf important for templates?  If so, how?  If not, why the
> restriction related to template superclasses?
>
> 11. Supported Operations
>
> I find this section extremely difficult to understand.   Some of the
> information given here needs to be mentioned much earlier.
>
> SHACL engines MUST support SHACL operations.
>
> All of the operations in this section are missing the control graph as an
> argument.
>
> The operations should have the data graph as an explicit argument.
>
> There is no need to have templates arranged in a class hierarchy.
>
> sh:NativeConstraint is not adequately defined.
>
> There are numerous parts of the pseudo-code that don't make sense, e.g., "is
> at least ?minSeverity".
>
> All of the interface arguments are missing the data and control graphs as
> arguments.  There is nothing on the programming langauge types that are to
> be used.
>
>
> 14. SPARQL-based Execution
>
> There is no indication that SPARQL-based execution cannot be done using a
> standard SPARQL engine.
>
> The values of sh:sparql are not strings that are syntactically valid SPARQL
> queries.  (See the beginning of 14.2 - they can be fragments.  They also can
> be missing prefix declarations.)
>
> There is no notion of the defining graph in SHACL.
>
> There is no indication that executing the SPARQL queries cannot be done in a
> standard SPARQL implementation.
>
> The execution of function (and other template, I expect) bodies is not a
> SHOULD relationship.  Instead it is a MUST unless the alternative produces
> the same results.
>
>
>
> Wording problems that exist in multiple places:
>
> "SHACL RDF vocabulary"
> See above for why this is wrong.
> Replace by 'SHACL Language'.
> Variations of this also exist, e.g., "an RDF vocabulary", "SHACL
> vocabulary".
>
> "restriction"
> See above for why this is wrong.
> Each occurence needs to be examined for a suitable replacement.
>
> "subclass" and "instance"
> These words are used loosely and differently in different places.  Each
> place they are used needs to be examined to ensure either that a standard
> meaning is being used or that the deviation from the standard is prominently
> described.
>
Received on Thursday, 3 September 2015 05:54:16 UTC