Re: comments on current version of SHACL document from Holger Knublauch on 2015-09-17 (public-data-shapes-wg@w3.org from September 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 17 Sep 2015 12:38:06 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <55FA278E.6040901@topquadrant.com>
On 9/17/2015 1:28, Peter F. Patel-Schneider wrote:
> I took a quick look at the version of the document current on 15 September,
> concentrating mostly on Sections 1-6.
>
> The document is looking better, but there are still several significant
> problems.

Thanks, see my responses below.

As an aside, I find it unhelpful to read hyperboles such as "significant 
problems" for things that are relatively easy to fix. We are currently 
just talking about a First Public Working Draft, not a 100% perfect 
specification.

>
> - With the new emphasis on SPARQL, there should be a part of Section 1 that
>    introduces the use of SPARQL as the definition of SHACL.  This would
>    include some of the stuff from the beginning of Section 3.

Done.

>    There needs to
>    be more information on how SPARQL is used in the definition of SHACL in
>    the discussion that is currently at the beginning of Section 3, such as
>    how the results of the queries are combined.  This would also discuss the
>    problem with blank nodes.

I have attempted to formulate a paragraph on blank nodes, but marked it 
with a red TODO because the wording may not use the correct terminology. 
I would appreciate input (I believe Eric may have the right references 
here).

>    As well, sh:hasShape needs to be better described.

I added a bit on that; not sure what else is missing. The key feature of 
that definition is the reference to the validateNodeAgainstShape 
operation, which would describe the details.

>    Several SPARQL queries provided require that the shapes graph
>    be accessible.  As this is not guaranteed, there needs to be an
>    explanation of what is going on.  It would also be better to have SPARQL
>    definitions for more of SHACL, such as scopes.  (This would require moving
>    the details of using SPARQL to define SHACL earlier in the document.)

All done, I believe.

>
> - The handling of results is poorly defined.  There is no discussion of how
>    the results of embedded constraints and shapes are to be handled.  This
>    needs to be cleaned up before FPWD publication.

I have added statements that clarify that these nested results are just 
temporary.

>
> - With the user-friendly syntax, shapes do not necessarily need to be in an
>    RDF graph.  I think that this means that the early part of the document
>    should not say "shapes graph", but instead something like "shapes".  Then
>    the document can say "SHACL shapes are often encoded in an RDF graph,
>    which is called the shapes graph."  Then later on it can say "shapes
>    encoded as an RDF graph" where necessary.  I don't know what should be
>    done for constructs that are not going to be part of the user-friendly
>    syntax.

This is not my understanding of how SHACL works. I believe the SHACL 
spec always assumes that the shapes are represented in RDF, and in a 
dedicated shapes graph, using exactly the specified vocabulary. If 
someone wants to use another (compact) syntax then these syntaxes need 
to be translated into RDF triples prior to execution.

>
> - SHACL is not an RDF vocabulary.  It is a language that can be encoded in
>    RDF, and that uses a particular vocabulary in the encoding.  Any time
>    SHACL shapes are discussed as being part of an RDF graph, careful
>    attention needs to be paid ot the wording used so as to not give incorrect
>    information.

My understanding of SHACL is that SHACL *defines* an RDF vocabulary. Our 
whole spec references IRIs from that vocabulary such as sh:minCount. I 
have tried to make it clear in previous edits that SHACL is not just a 
vocabulary, and would appreciate specific pointers if you believe this 
is still misleading.

>
> - What happens with cyclic shapes references that do not involve a property
>    constraint?  Are these handled the same as cyclic references that do
>    involve a property constraint?

Yes, the same way. The recursion stops if it encounters the same 
shape/focusNode combination. Having said this, the handling of 
sh:valueShape differs from the others such as sh:AndConstraint: they 
pass in another argument to the recursionIsError argument. The effect of 
this is that sh:valueShape will handle recursion as "true" while others 
will handle them as "failure".

>
> - All uses of RDFS notions, e.g., subclasses, should be qualified, e.g.,
>    SHACL subclasses.

To me this feels a bit pedantic. Isn't this handled by section 1.1 
(which you wrote), i.e. we do a broad statement in the beginning so that 
we don't need to repeat the same things over and over again?

>
> - The relationship between shapes and constraints is poorly explained.  A
>    shape has a bunch of constraints, which together serve to define the
>    shape.  Constraints are not just validated against the same focus nodes.

I have tried to improve the wording.

>
> - Most of the document is about the definition of SHACL.  There is little or
>    no need for MUST, etc., in this definition.  Where MUST, etc., should show
>    up is when describing the behaviour of SHACL implementations.  For
>    example, a good description of scope combination would be "The scopes of a
>    SHACL shape are considered additively, so, for example, in a shape with
>    two individual scopes both individuals are in the scope of the shape."
>    with no MUST, etc., needed.

Ok, I didn't know that (and wonder why MUST exists at all). I have now 
tried to limit MUST to sentences about the implementations.

>
> - The description of the various bits of property constraints obscures the
>    independence of some of the bits.  For example, splitting sh:minCount and
>    sh:maxCount would eliminate talk about defaults.

I mainly mentioned the default value of sh:minCount to clarify it, 
because we had discussed in the past that ShEx uses 1 as default. min 
and maxCount are grouped together because the computation of counts is a 
potentially expensive operation and shouldn't be done twice. You might 
say this is an implementation detail, but in order to be able to apply 
this optimization, they need to be in the same template and thus SPARQL 
query. I think there should be strong reasons if we want to deviate from 
the template mapping. Eliminating the talk about defaults doesn't strike 
me as a strong reason. (Another small difference is that the number of 
results may be different if we split them up. Right now there will 
always only be one constraint violation, even if both are violated, due 
to a modeling error).

>
> There are few things that would make the document better, but that are
> certainly not needed immediately.
>
> - It would be nice to have an option to hide the SPARQL definitions.

Yep, will do at some stage.

>
> - It might be better to turn Section 2.3 into the beginning of Section 3.

This doesn't make sense to me because 3 is only about the Core 
constraint types, while 2.3 is about the general mechanism (including a 
brief reference to the extension mechanism). Also, how can we have a 
section on Shapes without mentioning Constraints.

Details of my recent edits are here:

https://github.com/w3c/data-shapes/commit/607e758fbf6f633ab36fcdb4d59df01ccdad1699

Thanks again,
Holger
Received on Thursday, 17 September 2015 02:39:29 UTC