Re: shapes-ISSUE-182 (Validation report): [Editorial] Clarifications need to section 3.0 from Holger Knublauch on 2016-10-06 (public-data-shapes-wg@w3.org from October 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 7 Oct 2016 09:36:52 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <ecbfbe2c-2147-d888-e69c-849a35366054@topquadrant.com>
On 7/10/2016 2:12, Karen Coyle wrote:
>
>
> On 10/5/16 8:58 PM, Holger Knublauch wrote:
>> On 1/10/2016 11:23, Karen Coyle wrote:
>>>
>>>
>>> On 9/29/16 8:57 PM, Holger Knublauch wrote:
>>>>
>>>>
>>>> On 30/09/2016 1:40, RDF Data Shapes Working Group Issue Tracker wrote:
>>>>> shapes-ISSUE-182 (Validation report): [Editorial] Clarifications need
>>>>> to section 3.0
>>>>>
>>>>> http://www.w3.org/2014/data-shapes/track/issues/182
>>>>>
>>>>> Raised by: Karen Coyle
>>>>> On product:
>>>>>
>>>>> Section 3.0 on validation talks about the validation results, but
>>>>> doesn't explain clearly which properties are required and which are
>>>>> optional. It also should refer to the shapes graph as the source of
>>>>> the properties, not just to their appearance in the report. Some
>>>>> examples:
>>>>>
>>>>> "3.4.1.3 Value (sh:value)
>>>>>
>>>>> Validation results may have a value for the property sh:value 
>>>>> pointing
>>>>> at a specific node that has caused the result."
>>>>>
>>>>> - it isn't clear if sh:value MUST be returned if sh:value is coded in
>>>>> the constraint, or if echoing back sThanh:value when it exists is
>>>>> itself
>>>>> optional.
>>>>
>>>> I have added some prose into 3.4.3 to clarify how this property is
>>>> populated. I hope this clarifies that sh:value is not coded in the
>>>> constraint but is dynamically populated from the data graph.
>>>
>>> Thanks, Holger. However, I'd change the wording from:
>>>
>>> "Validation results may have a value for the property sh:value
>>> pointing at a specific node that has caused the result. "
>>>
>>> to:
>>>
>>> "Validation results MAY include a property sh:value. The property
>>> takes as its object the specific node in the data graph that caused
>>> the result. This object can be any RDF term (IRI, literal, or blank
>>> node)."
>>>
>>> Then I'd leave off the "for example" part, but it doesn't hurt 
>>> anything.
>>
>> A problem here is that the term "property" is overloaded.
>> 1) "property" referring to the rdf:Property itself (which is my default
>> understanding)
>> 2) "property" referring to a specific object.
>> But for the latter we already have the term "property value"
>> (abbreviated as "value"). In any case, a subject cannot include a 
>> property.
>
> Property needs to refer only to instances of rdf:Property. It can't 
> have two meanings. I have been looking into uses of "property value" 
> in other RDF specifications. Here it refers to the object of the 
> triple. I don't think that is what is used in other RDF 
> specifications, although I do find a few uses of that (e.g. one in 
> SKOS). In general, when speaking of the value of the object of a 
> predicate, RDF specifications use phrases like this:
>
> "This states that any resource that is the value of an rdfs:domain 
> property is an instance of rdfs:Class."
>
> They always refer to the resource that has value, which is not the 
> property itself but the object of the property.
>
> So I don't think that it is overloaded, but it could be worded as:
>
> "Validation results MAY include a property sh:value. sh:value has as 
> its object the node in the data graph that caused the result."

As I tried to say before, this use of "property" does not align with how 
we (and other specs) use that term. Validation results do not include a 
property. They may include *a value* for a property. And then, sh:value 
cannot have an object, just the triple where sh:value is the predicate 
can have an object.

So here is another attempt (suggested by Irene offlist):

Validation results may include, as a value of the property sh:value, a
specific node that has caused the result.


(where value is hyperlinked to the definition of what a property value 
is). I believe this is both precise and readable.


>
>>
>> Furthermore, enumerating the three node kinds is redundant, because this
>> is implied by the term "node".
>
> Yes, but it is a nice reminder to the reader. I won't insist.
>
>>
>> Also, in the past when I had used MAY in all-caps, I received backlash
>> because this is apparently not according to the predefined meaning of
>> MAY in W3C specs. I have to confess I never understood when it's valid
>> and when not.
>>
>>
>>>
>>> "pointing to" has the same problem as the "linking to" comment that
>>> Peter first brought up. For this reason it may be best to use the
>>> terminology of triples when speaking of relationships between the
>>> components of triples, which are only defined positionally, not as
>>> links or pointers.
>>
>> Ok, using the triple-centric notation, I have tried to reformulate 
>> this to:
>>
>>                         Validation results may have zero or one values
>> for the property <code>sh:value</code>.
>>                         The <a>object</a> of a <a>triple</a> that has
>> <code>sh:value</code> as its <a>predicate</a> and a validation result as
>> its <a>subject</a> is the specific <a>node</a> that has caused the 
>> result.
>>                         For example, validation results produced as a
>> result of a <code>sh:nodeKind</code> constraint use
>> <code>sh:value</code> with the <a>value node</a> that does not have the
>> correct node kind as its <a>object</a>,
>>                         while results produced due to a
>> <code>sh:minCount</code> violation do not use <code>sh:value</code>
>> because there is no individual node that could be mentioned.
>>
>> (While this is more precise, I doubt that this contributes to 
>> readability).
>
> A triple cannot have zero objects. If there is no object of sh:value 
> then there is no triple.

Yes, but I didn't say this anywhere. In any case, this paragraph has 
been rewritten.

>
> for the second sentence, this is all that is needed:
> The <a>object</a> of
> <code>sh:value</code> is the specific <a>node</a> that has caused the 
> result.
>
> And I would advise to drop the example as unnecessary.

I have dropped the example (not because I agree but in the interest of 
moving on).

>
>>
>>>
>>> I assume that an sh:value with a blank node as object may not always
>>> be informative, but it is a possibility.
>>
>> sh:value as blank node is fine assuming that the graph maintains bnode
>> ids (which almost all implementations do).
>>
>>>
>>>
>>>>
>>>>>
>>>>> 3.4.1.8 Declaring the Severity of a Constraint uses "can" not "MAY",
>>>>> and gives the default as sh:Violation (Does that mean T/F cannot have
>>>>> a default?). Better wording would be:
>>>>>
>>>>> "The severity level of a constraint violation MAY be coded in the
>>>>> constraint of a shapes graph using the property sh:severity, which
>>>>> takes as its value one of the SHACL pre-defined severities, or a
>>>>> locally defined severity." (followed by remaining sentences)
>>>>
>>>> I have applied similar wording to 3.4.8.
>>>
>>> I don't see changes in those sections - did the changes actually go in?
>>
>> I did not use your exact wording, but please verify whether you can live
>> with 3.4.8 and 3.4.9 now. I didn't use the term "locally defined
>> severity" because it will open more questions such as "in which graph".
>> So I went with that it can be any IRI.
>>
>>>
>>>>
>>>>>
>>>>> Also, the example given shows the shapes graph, but would be more
>>>>> informative if it also included the validation report that results.
>>>>
>>>> For this to happen, I would also need to create a data graph, then the
>>>> results graph. This would easily fill two pages. While I agree this
>>>> would be "informative", I am honestly not convinced whether this is
>>>> worth the effort. With every paragraph that we add, more stuff will 
>>>> need
>>>> to be reviewed (and no doubt someone will not like something about
>>>> them). The current example includes # comments that are IMHO clear
>>>> enough about what will happen. But if you feel strongly about this, I
>>>> can add expand on the example.
>>>
>>> I do think that such an example would be good. It may be possible to
>>> show snippets rather than whole SHACL documents. Another option is to
>>> put examples at the end of the section. (The turtle syntax document
>>> does this.)
>>
>> I have extended the example as you have requested.
>
> Thanks.
>
>>
>>>
>>>>
>>>>>
>>>>> Note that examples throughout do not include sh:severity or 
>>>>> sh:message
>>>>> in constraints, which requires some explanation, perhaps in the
>>>>> introductory area where examples are described. (I presume that it is
>>>>> expected that most or many constraints will include a severity, so it
>>>>> would be a normally occurring property, and that sh:message will also
>>>>> be common.)
>>>>
>>>> I have added two sentences enumerating the mandatory properties:
>>>>
>>>>                     The properties <code>sh:focusNode</code> and
>>>> <code>sh:severity</code> are the only mandatory properties of all
>>>> validation results.
>>>>                     The property
>>>> <code>sh:sourceConstraintComponent</code> is mandatory for validation
>>>> results produced by violations of <a>constraint components</a>.
>>>>
>>>> I hope this addresses the role of mandatory vs optional properties?
>>>
>>> Yes, I believe it does. Thanks.
>>>
>>>>
>>>>>
>>>>> The Example validation report in section 2.2 (Filter shapes) has
>>>>> sh:severity and sh:message although those are not shown in the shapes
>>>>> graph.
>>>>
>>>> sh:severity is optional and therefore not shown (the default is
>>>> sh:Violation).
>>>> sh:message is automatically produced by the engine, although I have
>>>> recently opened a ticket to also allow it at individual constraints.
>>>>
>>>> So technically that's all OK. I could add an explanation about where
>>>> these properties are coming from, but that's kinda repetitive and 
>>>> would
>>>> require forward-references to later sections.
>>>
>>> You could comment in the example with something like "# See Violations
>>> section" - that way, people know that it's something that will be
>>> explained later, but it doesn't take up much space.
>>
>> Added a forward reference as a sentence right before the example.
>>
>> My latest round of edits is
>>
>> https://github.com/w3c/data-shapes/commit/71073d04b640b3d9fd646f0a977e4fb9d86cc00d 
>>
>>
>>
>>>
>>>>
>>>>
>>>> As always, it is possible to have different opinions about such
>>>> editorial changes. If you can live with the current state, let me know
>>>> so we can close this ticket. Otherwise, please respond with what else
>>>> needs to be changed.
>>>
>>> I discovered where it is said (although not quite directly) that a
>>> validation report is only created when the focus node is NOT valid,
>>> i.e. fails to meet the criteria of the constraints: it's in the
>>> terminology section in the box that begins "Data Graph, Shapes
>>> Graph...". It says there:
>>>
>>> " A node in a data graph is said to validate against a shape if
>>> validation of that node against the shape neither produces any
>>> validation results nor results in a failure."
>>>
>>> That's a bit subtle, and if nothing else it also needs to be said in
>>> the actual validation section. But I think it should be clearer and it
>>> say that a validation report is only produced for focus nodes that
>>> *fail* to validate against the constraints. It needs to be very clear
>>> that the validation report is not a report of all of the results of
>>> the validation process, but only a report on the failures. Its name
>>> does not imply that; actually, it implies that it is reporting on the
>>> results of the act of validation, which has at least two possible
>>> outcomes: pass/fail (or T/F). So it is important to make this point.
>>
>> The validation report *is* a report of all results. Failures are
>> "exceptions" that basically stop the process and do not produce a report
>> at all. Failures are reported by different channels. With this, are you
>> able to point at specific changes that need to be made to resolve the
>> issue?
>>
>> Thanks,
>> Holger
>
> I didn't mean fail in the way that you are using it here, sorry, I 
> meant comparisons where the data graph is found not to conform to the 
> shape constraints. So validation is a report of non-conformance, but 
> does not report conformance, is that correct? If the data graph 
> conforms to the constraints in the shapes graph, no validation report 
> is generated, true? If that is the case, then that needs to be said in 
> the section on the validation report.

If validation produces an empty report then this can be regarded as 
conformance. So it always returns a report graph, but that graph may be 
empty.

I have tried to come up with a sentence that could be inserted after the 
first sentence in section 3.4. But whatever I tried, it introduced new 
areas that will leave room for attacks by someone else, e.g. "the term 
conforms is not defined". Could you suggest a sentence that meets your 
need for clarification without introducing new issues? To me, I have to 
confess, the current wording is very clear - zero results are obviously 
meaning that everything has been OK.

Latest round of edits:

https://github.com/w3c/data-shapes/commit/89803a61f70c6dbe30a3315594e6cae26bc82a2b

Thanks
Holger
Received on Thursday, 6 October 2016 23:37:25 UTC