- From: Holger Knublauch <holger@topquadrant.com>
- Date: Fri, 3 Mar 2017 16:24:49 +1000
- To: public-rdf-shapes@w3.org
Hi Peter,
thank you for your continued efforts and level of detail in reviewing
the SHACL drafts. Here is our response.
> ** Problems in early definitions:
1)
> "A property path is a possible route in a graph between two graph nodes."
> Route is not defined. Possible route is not defined. (This wording
is used
> once in the SPARQL document but is not defined there.)
Yes, SPARQL document in section 9 entitled “Property Paths” defines
property paths as follows:
"A property path is a possible route through a graph between two graph
nodes.”
While it is true that SPARQL document does not further explain nor
define a term “possible route”,
this potential definitional deficiency you are concerned about did not
prevent SPARQL from reaching
Rec status, nor had it caused any implementation or usage issues. We
believe it to be clear enough
for SHACL as well.
2)
> "A binding is a pair (variable, RDF term), consistent with the term's
use in
> SPARQL."
> If binding is taken from SPARQL just link to its definition as is
done for
> the other terms taken from SPARQL
This is purely an editorial preference. Here we have included both the
link to the SPARQL document
and a local definition so that readers are not forced to follow the link
to understand the meaning.
3)
> "A solution is a set of bindings, one row in the body of the result
table of
> a SPARQL query."
> In SPARQL a solution is not one row in the body of the result table of a
> SPARQL query. That is just how it is shown in some places. The actual
> correct term is solution mapping.
The fact that it's a row in the result table is indeed not definitional
but has been added to make
it easier to understand. I have injected the term "informally" to mark
this sub-sentence off:
"... informally often understood as one row in the body of the result
table of a SPARQL query."
The SPARQL spec itself almost always uses "solution" instead of
"solution mapping", stating
"We use the term 'solution' where it is clear." (18.1.8). In SHACL we do
the same.
4)
> "The results table is a SolutionSequence, a list of solutions, possibly
> unordered."
> Results table is not used in SHACL.
Ok, redundant sentence deleted.
5)
> "A node in an RDF graph is a SHACL instance of a SHACL class in the
graph if
> one of its SHACL types is the class."
> SHACL types require reference to a graph.
Ok, reference to G added.
6)
> "true denotes the RDF term "true"^^xsd:boolean. false denotes the RDF
term
> "false"^^xsd:boolean."
> There is no notion of what denotes means here.
"denotes" is used in its common English meaning. Even without a formal
definition,
I cannot see how it could possibly be misunderstood. Should I delete
these two sentences to move on?
7)
> "Target declarations are values of certain properties (such as
> sh:targetClass) for a shape in a shapes graph."
> More than just a value is needed.
Editorial adjustments made to clarify that such values (of course) also
need a subject and a
predicate, (although this already follows from the definition of "value".)
8)
> "All bindings of the variable this from the solution become focus nodes."
> SPARQL queries do not return a solution as defined in this document.
This looks OK to me although I have switched from singular "solution" to
"solutions" in 3 of
the 4 definitions. Could you clarify where you see a contradiction?
> ** Unclear wording:
9)
> "(In this document, the verbs specify or declare are sometimes used to
> express the fact that an RDF term has property values in a graph.)"
> As opposed to an RDF term not having any values for any property in a
graph?
Clarified to mean "sometimes used to express the fact that an RDF term
has values
for a given predicate in a graph"
10)
> "A constraint component is an IRI."
> I don't see how every IRI is a constraint component.
Why would that follow from the above sentence? If just says that if
something is a
constraint component then it is an IRI, not also vice versa. "An apple
is a fruit"
also doesn't say that every fruit is an apple.
11)
> "The IRI is used, among others, in validation reports."
> I never would have imagined that validation reports could only use
IRIs that
> are constraint components.
See 10. I don't understand this comment. In any case, this sentence is
not needed
so I have taken it out.
12)
> "SHACL-SPARQL can be used to declare additional constraint components
based
> on SPARQL."
> What part of SHACL are these additional constraint components in?
Any constraint component defined with SHACL-SPARQL and not Core is not
in Core.
The sentence above includes a link to section 6.
> ** Strange links:
13)
> The definition of member has a link back to itself.
Ok, link removed.
> ** Normative wording in non-normative portions of the document:
14)
> Section 1.6 is labelled as non-normative but discusses how to treat other
> portions of the document and states conformance requirements for SHACL
> implementations.
I see nothing in 1.6 that is not repeated in normative sections 5 or 6.
Do you?
15)
> In many places there are what are labelled as SPARQL definitions of SHACL
> Core constraint components. These are in non-normative portions of the
> document. The document needs to not make the impression that these
bits of
> SPARQL are definitions of portions of the SHACL Core.
Section 1.6 states:
For SHACL Core this specification uses parts of SPARQL 1.1 in
non-normative alternative definitions of the semantics of constraint
components and targets. While these may help some implementers, SPARQL
is not required for the implementation of the SHACL Core language.
I think this make it clear that SPARQL definitions are non-normative
already but anyway:
Changed from "SPARQL DEFINITION" to "POTENTIAL DEFINITION IN SPARQL"
16)
> There are probably other places where definitional wording occurs in
> non-normative parts of the document. These should be removed or
changed to
> not give any impression that they are normative.
Comment not actionable.
> ** SHACL Vocabulary:
17)
> What is the status of the SHACL vocabulary? SHACL is not an ontology
that
> needs a RDF graph interpreted using the RDFS semantics to provide its
> vocabulary. All that SHACL needs is a set of IRIs that are used in
its RDF
> syntax. What then is the status of the mentioned RDF graph? How is this
> graph to be interpreted? Is information entailed by the graph using
the RDF
> or RDFS semantics have any effect on SHACL? Is any information in
the graph
> part of the definition of SHACL? Of SHACL Core? Is all of the
information
> in the graph part of the definition of SHACL? Of SHACL Core?
This topic has been discussed multiple times, see archives. It has no
normative status whatsoever.
The role of RDFS has also been extensively discussed within the WG,
especially during the period that you were still a WG member.
> ** Shapes:
18)
> If any node in a shapes graph has a sh:shape link back to itself then the
> shapes graph is recursive and behaviour of SHACL processors on the
graph is
> undefined even if this node is completely disconnected from the rest
of the
> graph. It would be better to have the behaviour of SHACL processors
defined
> in cases like this.
Such an optimization could be handled by future working groups or
community groups.
Since the treatment of recursion is undefined, implementations already
have the option
to ignore such cases.
BTW, sh:shape has been renamed to sh:node.
19)
> It used to be that top-level shapes were conventionally indicated by an
> rdf:type link to sh:Shape. This has apparently changed to sh:NodeShape.
> There is no apparent reason for this change, which should be changed
back.
This was a topic of separate emails and discussions and already
responded to.
20)
> sh:Shape does not appear to have any effect at all in SHACL. If this
is the
> case then it should be removed.
This was a topic of separate emails and discussions and already
responded to.
21)
> The syntax rule for shapes doesn't appear to be a syntax rule at all.
> Instead it is defining what a shape is.
They are marked as syntax rules because other syntax rules depend on them.
22)
> "A shape in a shapes graph declares a constraint of kind c if c is a
> constraint component and the shape has values for all mandatory
parameters
> of c. The constraint declaration consists of the values that the
shape has
> for all mandatory and optional parameters of that component."
> The word "declares" is only harmful here.
> For constraint components that have more than one parameter this
definition
> loses which value is connected to which parameter. For shapes that
have two
> values for a single-parameter constraint component there is only one
> resultant constraint and that constraint has both of the values of the
> parameter.
The sentences that you quote above cannot be taken out of context. Other
sentence
explain what happens in the multiple-values case. I don't see how to
keep the
definition readable while cramping all this information into it.
If you have specific suggestions, feel free to send them along.
23)
> "Note that the definition above does not include all of the syntax
rules of
> well-formed shapes."
> There is no notion of well-formed shapes introduced in the document, even
> though it is used in several places. Similarly there is no notion of
> well-formed property shape or well-formed node shape introduced in the
> document even though both of these are used in the document.
Shapes are nodes, therefore the definition of well-formed nodes apply to
shapes too.
24)
> "Note that the definitions of well-formed property shapes and node shapes
> make these two sets of nodes disjoint."
> There is no definition of either well-formed property shape or
well-formed
> node shape.
Shapes are nodes, therefore the definition of well-formed nodes apply to
shapes too.
> ** Property Paths:
25)
> "A node in an RDF graph is a well-formed SHACL property path p if it
> satisfies exactly one of the syntax rules in the following
sub-sections. A
> node p is not a well-formed SHACL property path if p is a blank node
and any
> path mappings of p directly or transitively reference p."
> It is possible that a node could both satisfy exactly one of the syntax
> rules and also refer back to itself. What happens then?
> Every path mapping of p references p so all blank nodes are not
well-formed
> SHACL property paths.
The second sentence
"A node p is not a well-formed SHACL property path if p is a blank node and
any path mappings of p directly or transitively reference p."
This IMHO excludes the scenario of self-reference.
I don't understand your last sentence. An example would help.
26)
> "A sequence path is a blank node that is a SHACL list with at least two
> members and each member is a well-formed SHACL property path."
> Sequence paths can have extra information associated them.
I don't understand this comment. An example would help.
Note that in the paragraph above the individual path syntax sections, we
state
"... if it satisfies exactly one of the syntax rules in the following
sub-sections."
and this IMHO excludes the case that a sequence path may have other
triples such as
a sh:inversePath.
27)
> "An alternative path is a blank node that is the subject of exactly one
> triple in G." "An inverse path is a blank node that is the subject of
> exactly one triple in G." And so on.
> These paths can't have extra information associated with them. Any blank
> node that is the subject of exactly one triple is lots of kinds of paths.
I don't understand this comment. An example would help.
> ** Non-validating information
28)
> What happens if the requirements in the non-normative 2.3.2 are violated?
This section does not contain formal syntax rules so there is no impact.
> ** Targets:
29)
> "If s is a SHACL instance of sh:NodeShape or sh:PropertyShape in a shapes
> graph SG and s is also a SHACL instance of rdfs:Class in SG then the
set of
> SHACL instances of s in a data graph DG is a target from DG for s in SG."
> So a node that is a SHACL instance of sh:Shape and a SHACL instance of
> rdfs:Class will not produce an implicit class target. This is going to
> trip up a lot of people and needs to be changed.
I disagree that it "needs to be changed". This may be your (speculative)
personal opinion.
sh:Shape will barely show up in any SHACL file and does not show up in
our own examples.
While earlier versions of SHACL used the IRI sh:Shape to mean something
else this doesn't
imply that future readers will use it for its old meaning. People will
get used to
sh:NodeShape just like they get used to our renaming of
sh:PropertyConstraint to sh:PropertyShape.
> ** Validation:
30)
> "Conformance checking is a simplified version of validation, producing a
> boolean result."
> There is no definition of which boolean value conformance checking
produces.
The sentence you quote above has a hyperlink to the exact definition of
which values to produce.
31)
> "the validation process"
> What counts as part of the validation process? Does checking for
> ill-formedness? Does entailment? Does piecing together the shapes
graph or
> the data graph?
The document already clarifies that checking for ill-formedness is not
required,
neither is entailment. Also the document already clarifies that shapes
and data graphs are
immutable, i.e. they are assembled before the validation process.
32)
> "For example, SHACL processors may support recursion scenarios or
produce an
> error when they detect recursion."
> I expect that this should be failure instead of error.
Ok, replaced "error" with "failure".
33)
> "A shapes graph is an RDF graph containing zero or more shapes that is
> passed into a SHACL validation process so that a data graph can be
validated
> against the shapes."
> Is the graph that is used when a node in a graph is validated against a
> shape in it a shapes graph?
Yes, "shapes graph" is a role.
34)
> "Every value of sh:shapesGraph is an IRI representing a graph that
should be
> included into the shapes graph used to validate the data graph."
> Shouldn't this be SHOULD? Or should it be MAY as is used in the next
> sentence?
Ok, changed from should to SHOULD. "may" changed to SHOULD, too.
35)
> "Validating an RDF term against a shape involves validating the term
against
> each of the components for which the shape has values for all mandatory
> parameters, using the validators associated with the respective
component."
> This incorrectly uses constraint components instead of constraints.
Ok, clarified to refer to the constraint and then the component of the
constraint.
36)
> "The validation of a focus node in the data graph against a constraint in
> the shapes graph produces the top-level validation results that are
produced
> by the validator of the constraint component, using as input the
focus node,
> the specific values of the parameters in the constraint, and the
value nodes
> of the shape in the data graph."
> SHACL constraint components have multiple validators.
> Validation needs more than just the information listed above.
> Shapes don't have value nodes at all.
> The output of a validation process is not completely defined, so
there is no
> notion of *the* results of validation.
Has been changed to:
Validation of a focus node against a shape: Given a focus node in the
data graph and a shape in the shapes graph, the validation results are
the union of the results of the validation of the focus node against all
constraints declared by the shape, unless the shape has been
deactivated, in which case the validation results are empty.
Validation of a focus node against a constraint: Given a focus node in
the data graph and a constraint of kind C in the shapes graph, the
validation results are defined by the validators of the constraint
component C. These validators typically take as input the focus node,
the specific values of the parameters of C of the constraint in the
shapes graph, and the value nodes of the shape that declares the constraint.
Value nodes are defined as follows:
The validators of most constraint components use the concept of value
nodes, which is defined as follows:
For node shapes the value nodes are the individual focus nodes, forming
a set with exactly one member.
For property shapes with a value for sh:path p the value nodes are the
set of nodes in the data graph that can be reached from the focus node
with the path mapping of p. Unless stated otherwise, the value of
sh:resultPath of each validation result is a SHACL property path that
represents an equivalent path to the one provided in the shape.
37)
> "A focus node conforms to a shape if and only if the validation of
the shape
> does not produce any validation result or a failure."
> So no focus node will conform to a node shape that has a sh:not
parameter.
The definition of validation has been modified to be a "mapping" instead
of a process that "produces" something. All usages of "a validation
result MUST be produced" have been replaced with "there is a validation
result". The usages of "nested" validation has been replaced with
"conformance checking". Prose has been added to make it clear that the
results of conformance checking do not end up in the surrounding report.
> ** Conformance:
38)
> "All SHACL implementations MUST at least cover the Core."
> Covering is not defined.
Ok, replaced with "implement".
39)
> "This specification describes conformance criteria for: SHACL Core [...],
> SHACL-SPARQL [...], SHACL Shapes Graphs [...], Validation of a data graph
> against a shapes graph [...], Validation of an RDF term from a data graph
> against a shape from the shapes graph [...], SHACL Core processors [...],
> SHACL-SPARQL processors [...]."
> Conformance is something that is required of implementations. It doesn't
> make sense for special kinds of graphs or processes to conform. Instead
> these are defined as something.
Ok, I have reorganized this to move the first two bullet items into the
introductory
paragraph and removing the 3 bullet items on processes. This leaves only
the two
items about "processors" under conformance.
40)
> An RDF graph is a (mathematical) set of RDF triples. There are no
> operations that can modify an RDF graph defined in RDF.
Where does this comment apply to?
> ** Core Constraint Components:
41)
> "This section defines the built-in SHACL Core constraint components that
> MUST be supported by all SHACL Core processors."
> Are SPARQL-SHACL processors required to support these components as well?
Yes, we already state that all SHACL processors must support Core.
42)
> "The SPARQL definitions in this section represent potential validators."
> The SPARQL queries are in informative parts of the document and can't be
> considered to be definitions. As many of the SPARQL
> definitions currently have problems, it would be better to just
remove all
> the SPARQL query stuff from this section.
See 15)
You state that "many of the SPARQL definitions currently have problems".
Which?
43)
> "The following constraint components represent restrictions on the
number of
> value nodes."
> Presumably the number of value nodes for a particular focus node, not the
> number of value nodes overall.
Sure, I assume readers can understand that. Anyway, added "for the given
focus node".
44)
> "If this parameter is omitted then there is no limit on the number of
> triples."
> Which triples? How can this parameter be omitted and there still be a
> constraint set up for it?
You may remember that this sentence was deemed important by certain WG
members
(I think Karen). In any case, I agree it doesn't add formal value
so I have taken it out to proceed.
45)
> "The values of sh:pattern are literals with datatype xsd:string that are
> valid pattern arguments for the SPARQL REGEX function."
> Having this as a syntax condition means that
> checking for correct operation of SHACL processors will need to be
aware of
> whether a value for sh:pattern is a valid pattern argument for the SPARQL
> REGEX function.
SHACL Core implementations require a regular expression engine anyway.
In the worst case this test can thus be implemented by sending a dummy
regex to the
processor and catch exceptions. Usually the underlying libraries have
better syntax
checking APIs built in. Leaving this rule out will raise follow-up issues
such as what happens if the pattern is invalid syntax.
Overall I think this is currently acceptable, although I agree that it's
not a
rule that could be expressed, say, in SHACL Core itself.
46)
> "If $flags has a value then it MUST be interpreted according to the third
> argument of the SPARQL REGEX function."
> What does the mean? I'm guessing that the constraint component is
supposed
> to act as if this value is the third argument, not that this value is
> interpreted as anything.
I have changed the wording, although I believe there was not really a
chance to
misinterpret the old wording either.
47)
> "For each pair of value nodes and the values of the property $lessThan at
> the given focus node where the first value is not less than the
second value
> (based on SPARQL's < operator) or where the two values cannot be
compared, a
> validation result MUST be produced with the value node as sh:value."
> How many different validation results need to be produced if the set of
> value nodes is the set { 1, 2 } and the set of property values is the set
> { "a", "b" }?
Four (one for each combination). While adding a test case for this, I
noticed a glitch
in the informal, potential SPARQL definition, which is now fixed.
48)
> "For each value node that produces no validation results against the
shape
> $not"
> So if there is a conjunction under the not, a validation result will be
> produced here if just one of the conjuncts produces a validation result.
Switched to using the term "conforms", which IMHO clarifies this issue.
49)
> "For each value node where the validation of the value node against
any of
> the members of $and produces a validation result and no failure, a
> validation result MUST be produced with the value node as sh:value."
> So if there is a negation under any of the conjunctions a validation
result
> will always be produced.
Also switched to "conforms", similar for sh:or, sh:xone, sh:node,
sh:qualifiedValueShape
I believe this improves the situation towards your parallel email thread
on whether nested
validation processes MUST create validation results: it is now more
explicit that we
just need conformance checking, not the full report.
50)
> "If a value node is violating the constraint, sh:shape will produce
only a
> single validation result, with sh:ShapeConstraintComponent as its
> sh:sourceConstraintComponent."
> Not correct. The sh:shape could produce other validation results for
other
> value nodes.
Clarified to be "for this value node".
51)
> "On the other hand side, sh:property may produce any number of validation
> results, and these will have the individual constraint components of the
> property shape as their values of sh:sourceConstraintComponent."
> Not correct. The validation results produced here may have other
values for
> sh:sourceConstraintComponent.
Clarified to be about "the constraints in the property shape".
> ** Specify and Declare:
52)
> Specify and declare are use throughout the document in strange ways. It
> would be better to replace these with words that do not have so much
> baggage, e.g., "For example, shapes can state" or "A shape in a
shapes graph
> has a constraint of".
I remember long discussions about this terminology within the WG last year.
I don't think it would be fruitful to reopen these discussions as we are
trying
to stabilize the document. Making such general changes may introduce
new issues
while making previous reviews invalid. We have to mind the March CR
deadline.
> ** SHACL-SPARQL:
53)
> All aspects of SHACL-SPARQL depend heavily on pre-binding. As
pre-binding
> has never had a workable definition in SHACL there is no purpose in
closely
> reviewing this part of the document at this time. Either this part
of the
> document needs to be removed or a workable definition of SHACL
provided and
> this phase of the W3C process repeated.
In its meeting on March 2, 2017 the RDF Data Shapes Working Group has
decided to mark SHACL-SPARQL (i.e. sections 5 and 6) as features at risk.
The Working Group would like to collect implementation feedback on the
current definition of pre-binding which is central to SHACL-SPARQL.
Marking this as a feature at risk provides the WG with more flexibility
with regards to the recommendation track process in case technical
issues with the current definition of pre-binding are reported.
54)
> It is not clearly stated what SHACL Core processors need to do when they
> encounter constructs from SHACL-SPARQL. It might be the case that SHACL
> Core processors can completely ignore SHACL-SPARQL constructs,
because SHACL
> Core processors only "cover" SHACL Core, but it might also be the
case that
> SHACL Core processors need to examine SHACL-SPARQL constructs,
because these
> constructs might cause some nodes to be ill-formed.
What consequence would that have? Syntax checking is not required either
way.
55)
> Both sh:prefixes and sh:prefix are used. I expect that only one is
needed.
They fulfill very different roles. sh:prefix is for an individual prefix
(pair),
while sh:prefixes links a query with a set of prefixes.
Commits addressing these are onwards from
https://github.com/w3c/data-shapes/commit/28f40d46efe714ebe9f1909f82e3fd84172dc447
Regards,
Holger
On 27/02/2017 12:13, Holger Knublauch wrote:
> I have raised https://www.w3.org/2014/data-shapes/track/issues/234 to
> track our response.
>
> We will likely respond after the next WG meeting this week.
>
> Holger
>
Received on Friday, 3 March 2017 07:26:10 UTC