Re: comments on current version of SHACL document from Holger Knublauch on 2015-09-25 (public-data-shapes-wg@w3.org from September 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Sat, 26 Sep 2015 09:13:22 +1000
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <5605D512.2070802@topquadrant.com>
I am fine with this change. Did I hear you volunteer for the editorial 
changes :)

Just one detail below.

Holger


On 9/26/15 7:25 AM, Arthur Ryman wrote:
> Holger,
>
> I'd like to discuss your comment:
>
> "I mainly mentioned the default value of sh:minCount to clarify it,
> because we had discussed in the past that ShEx uses 1 as default. min
> and maxCount are grouped together because the computation of counts is
> a potentially expensive operation and shouldn't be done twice. You
> might say this is an implementation detail, but in order to be able to
> apply this optimization, they need to be in the same template and thus
> SPARQL query. I think there should be strong reasons if we want to
> deviate from the template mapping. Eliminating the talk about defaults
> doesn't strike me as a strong reason. (Another small difference is
> that the number of results may be different if we split them up. Right
> now there will always only be one constraint violation, even if both
> are violated, due to a modeling error)."
>
> This is an example of how implementation considerations ca affect the
> spec. We should avoid that for several reasons. For example, the
> meaning of a set of property constraints is that they all have to be
> satisfied (conjunction). All of the property constraints are optional.
> Therefore we do not need to define optional values for missing
> property constraints. In the case of minCount, if it is absent, then
> there is no constraint. This happens to have the the same effect as
> including minCount 0, but that doesn't mean we need to introduce a
> default value. Similarly, if maxCount is absent then there is no
> constraint. Of course, an implementation would optimize and only
> compute the actual count once. That is why is it better to not view
> the SPARQL semantics as the actual definitions of templates.
> Implementers must have the freedom to implement the spec however they
> want as long as the get the correct answers. Actual individual
> templates for minCount and maxCount would still be useful for the
> creation of a test oracle, or maybe even an acceptable reference
> implementation.
>
> The latest version of the spec talks about default values and
> interpretations and includes the following SPARQL which combines
> minCount and maxCount . The spec doesn't assign a default value to
> $maxCount so the SPARQL is not actually well-defined when maxCount is
> absent.

This is why I added bound($maxCount) as a guard clause.

>
> SELECT $this ($this AS ?subject) $predicate
> WHERE {
> {
> SELECT (COUNT(?value) AS ?count)
> WHERE {
> $this $predicate ?value .
> }
> }
> FILTER (?count < $minCount || (bound($maxCount) && ?count > $maxCount))
> }
>
> I propose that the discussion about defaults should be eliminated and
> the SPARQL should be split into one for minCount and one for maxCount.
>
> Can we treat this as editorial? Thanks.
>
> -- Arthur
>
>
> On Fri, Sep 18, 2015 at 10:56 PM, Peter F. Patel-Schneider
> <pfpschneider@gmail.com> wrote:
>> On 9/17/2015 1:28, Peter F. Patel-Schneider wrote:
>>>> I took a quick look at the version of the document current on 15 September,
>>>> concentrating mostly on Sections 1-6.
>>>>
>>>> The document is looking better, but there are still several significant
>>>> problems.
>>> Thanks, see my responses below.
>>>
>>> As an aside, I find it unhelpful to read hyperboles such as "significant
>>> problems" for things that are relatively easy to fix. We are currently just
>>> talking about a First Public Working Draft, not a 100% perfect
>>> specification.
>> I view several of the problems that I found to be significant.  You can
>> protest all you want, but don't expect me to stop using words that are
>> neither hyperbolic nor inflammatory.
>>
>>>> - With the new emphasis on SPARQL, there should be a part of Section 1 that
>>>>     introduces the use of SPARQL as the definition of SHACL.  This would
>>>>     include some of the stuff from the beginning of Section 3.
>>> Done.
>>>
>>>>     There needs to
>>>>     be more information on how SPARQL is used in the definition of SHACL in
>>>>     the discussion that is currently at the beginning of Section 3, such as
>>>>     how the results of the queries are combined.  This would also discuss the
>>>>     problem with blank nodes.
>>> I have attempted to formulate a paragraph on blank nodes, but marked it
>>>> with a red TODO because the wording may not use the correct
>>>> terminology. I would appreciate input (I believe Eric may have the right
>>>> references here).
>> Although more needs to be done here, I think that it is acceptable for now.
>>
>>>>     As well, sh:hasShape needs to be better described.
>>> I added a bit on that; not sure what else is missing. The key feature of
>>>> that definition is the reference to the validateNodeAgainstShape
>>>> operation, which would describe the details.
>> This doesn't work, as validateNodeAgainstShape depends on sh:hasShape.
>>
>> As well, the recursion as error handling doesn't appear to be specified in a
>> reasonable manner.   One of the calls doesn't even specify a value for this
>> variable.  The calls for and and or set it naively, without regards for
>> whether there is a negated loop.
>>
>>>>     Several SPARQL queries provided require that the shapes graph
>>>>     be accessible.  As this is not guaranteed, there needs to be an
>>>>     explanation of what is going on.  It would also be better to have SPARQL
>>>>     definitions for more of SHACL, such as scopes.  (This would require moving
>>>>     the details of using SPARQL to define SHACL earlier in the document.)
>>> All done, I believe.
>>>
>>>> - The handling of results is poorly defined.  There is no discussion of how
>>>>     the results of embedded constraints and shapes are to be handled.  This
>>>>     needs to be cleaned up before FPWD publication.
>>> I have added statements that clarify that these nested results are just
>>> temporary.
>> I think that much more needs to be done to introduce result handling.
>>
>>>> - With the user-friendly syntax, shapes do not necessarily need to be in an
>>>>     RDF graph.  I think that this means that the early part of the document
>>>>     should not say "shapes graph", but instead something like "shapes".  Then
>>>>     the document can say "SHACL shapes are often encoded in an RDF graph,
>>>>     which is called the shapes graph."  Then later on it can say "shapes
>>>>     encoded as an RDF graph" where necessary.  I don't know what should be
>>>>     done for constructs that are not going to be part of the user-friendly
>>>>     syntax.
>>> This is not my understanding of how SHACL works. I believe the SHACL spec
>>> always assumes that the shapes are represented in RDF, and in a
>>> dedicated shapes graph, using exactly the specified vocabulary. If
>>> someone wants to use another (compact) syntax then these syntaxes need
>>> to be translated into RDF triples prior to execution.
>> This is an unnecessary step.  It should not be required.
>>
>>>> - SHACL is not an RDF vocabulary.  It is a language that can be encoded in
>>>>     RDF, and that uses a particular vocabulary in the encoding.  Any time
>>>>     SHACL shapes are discussed as being part of an RDF graph, careful
>>>>     attention needs to be paid ot the wording used so as to not give incorrect
>>>>     information.
>>> My understanding of SHACL is that SHACL *defines* an RDF vocabulary. Our
>>> whole spec references IRIs from that vocabulary such as sh:minCount. I
>>> have tried to make it clear in previous edits that SHACL is not just a
>>> vocabulary, and would appreciate specific pointers if you believe this is
>>> still misleading.
>> I will do another pass on this after FPWD publication.
>>
>>>> - What happens with cyclic shapes references that do not involve a property
>>>>     constraint?  Are these handled the same as cyclic references that do
>>>>     involve a property constraint?
>>> Yes, the same way. The recursion stops if it encounters the same
>>> shape/focusNode combination. Having said this, the handling of
>>> sh:valueShape differs from the others such as sh:AndConstraint: they pass
>>> in another argument to the recursionIsError argument. The effect of this
>>> is that sh:valueShape will handle recursion as "true" while others will
>>> handle them as "failure".
>> So recursion inside an and is a failure?  This does not seem to be
>> reasonable in general.
>>
>>>> - All uses of RDFS notions, e.g., subclasses, should be qualified, e.g.,
>>>>     SHACL subclasses.
>>> To me this feels a bit pedantic. Isn't this handled by section 1.1 (which
>>> you wrote), i.e. we do a broad statement in the beginning so that we don't
>>> need to repeat the same things over and over again?
>> There are going to be many readers of the document that are coming from an
>> RDF and RDFS background.  I think that the document needs to hammer in the
>> differences so that such people do not get the wrong idea now and complain
>> bitterly later.
>>
>>>> - The relationship between shapes and constraints is poorly explained.  A
>>>>     shape has a bunch of constraints, which together serve to define the
>>>>     shape.  Constraints are not just validated against the same focus nodes.
>>> I have tried to improve the wording.
>> I think that the problem is still that the way that SHACL is put together is
>> difficult to describe, and comprehend.  For example, shape scopes are used
>> sometimes when shapes are considered but not at other times.  A cleaner
>> design would be easier to describe.
>>
>>>> - Most of the document is about the definition of SHACL.  There is little or
>>>>     no need for MUST, etc., in this definition.  Where MUST, etc., should show
>>>>     up is when describing the behaviour of SHACL implementations.  For
>>>>     example, a good description of scope combination would be "The scopes of a
>>>>     SHACL shape are considered additively, so, for example, in a shape with
>>>>     two individual scopes both individuals are in the scope of the shape."
>>>>     with no MUST, etc., needed.
>>> Ok, I didn't know that (and wonder why MUST exists at all). I have now
>>> tried to limit MUST to sentences about the implementations.
>> Things are better now.  It is guaranteed that more wordsmithing will be
>> needed down the line.
>>
>>>> - The description of the various bits of property constraints obscures the
>>>>     independence of some of the bits.  For example, splitting sh:minCount and
>>>>     sh:maxCount would eliminate talk about defaults.
>>> I mainly mentioned the default value of sh:minCount to clarify it,
>>> because we had discussed in the past that ShEx uses 1 as default. min and
>>> maxCount are grouped together because the computation of counts is a
>>> potentially expensive operation and shouldn't be done twice. You might say
>>> this is an implementation detail, but in order to be able to apply this
>>> optimization, they need to be in the same template and thus SPARQL
>>> query. I think there should be strong reasons if we want to deviate from
>>> the template mapping. Eliminating the talk about defaults doesn't strike
>>> me as a strong reason. (Another small difference is that the number of
>>> results may be different if we split them up. Right now there will always
>>> only be one constraint violation, even if both are violated, due to a
>>> modeling error).
>>>> There are few things that would make the document better, but that are
>>>> certainly not needed immediately.
>>>>
>>>> - It would be nice to have an option to hide the SPARQL definitions.
>>> Yep, will do at some stage.
>>>
>>>> - It might be better to turn Section 2.3 into the beginning of Section 3.
>>> This doesn't make sense to me because 3 is only about the Core constraint
>>> types, while 2.3 is about the general mechanism (including a brief
>>> reference to the extension mechanism). Also, how can we have a section on
>>> Shapes without mentioning Constraints.
>>> Details of my recent edits are here:
>>>
>>>
>> https://github.com/w3c/data-shapes/commit/607e758fbf6f633ab36fcdb4d59df01ccdad1699
>>> Thanks again,
>>> Holger
Received on Friday, 25 September 2015 23:13:55 UTC