- From: Holger Knublauch <holger@topquadrant.com>
- Date: Sat, 26 Sep 2015 09:13:22 +1000
- To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
I am fine with this change. Did I hear you volunteer for the editorial changes :) Just one detail below. Holger On 9/26/15 7:25 AM, Arthur Ryman wrote: > Holger, > > I'd like to discuss your comment: > > "I mainly mentioned the default value of sh:minCount to clarify it, > because we had discussed in the past that ShEx uses 1 as default. min > and maxCount are grouped together because the computation of counts is > a potentially expensive operation and shouldn't be done twice. You > might say this is an implementation detail, but in order to be able to > apply this optimization, they need to be in the same template and thus > SPARQL query. I think there should be strong reasons if we want to > deviate from the template mapping. Eliminating the talk about defaults > doesn't strike me as a strong reason. (Another small difference is > that the number of results may be different if we split them up. Right > now there will always only be one constraint violation, even if both > are violated, due to a modeling error)." > > This is an example of how implementation considerations ca affect the > spec. We should avoid that for several reasons. For example, the > meaning of a set of property constraints is that they all have to be > satisfied (conjunction). All of the property constraints are optional. > Therefore we do not need to define optional values for missing > property constraints. In the case of minCount, if it is absent, then > there is no constraint. This happens to have the the same effect as > including minCount 0, but that doesn't mean we need to introduce a > default value. Similarly, if maxCount is absent then there is no > constraint. Of course, an implementation would optimize and only > compute the actual count once. That is why is it better to not view > the SPARQL semantics as the actual definitions of templates. > Implementers must have the freedom to implement the spec however they > want as long as the get the correct answers. Actual individual > templates for minCount and maxCount would still be useful for the > creation of a test oracle, or maybe even an acceptable reference > implementation. > > The latest version of the spec talks about default values and > interpretations and includes the following SPARQL which combines > minCount and maxCount . The spec doesn't assign a default value to > $maxCount so the SPARQL is not actually well-defined when maxCount is > absent. This is why I added bound($maxCount) as a guard clause. > > SELECT $this ($this AS ?subject) $predicate > WHERE { > { > SELECT (COUNT(?value) AS ?count) > WHERE { > $this $predicate ?value . > } > } > FILTER (?count < $minCount || (bound($maxCount) && ?count > $maxCount)) > } > > I propose that the discussion about defaults should be eliminated and > the SPARQL should be split into one for minCount and one for maxCount. > > Can we treat this as editorial? Thanks. > > -- Arthur > > > On Fri, Sep 18, 2015 at 10:56 PM, Peter F. Patel-Schneider > <pfpschneider@gmail.com> wrote: >> On 9/17/2015 1:28, Peter F. Patel-Schneider wrote: >>>> I took a quick look at the version of the document current on 15 September, >>>> concentrating mostly on Sections 1-6. >>>> >>>> The document is looking better, but there are still several significant >>>> problems. >>> Thanks, see my responses below. >>> >>> As an aside, I find it unhelpful to read hyperboles such as "significant >>> problems" for things that are relatively easy to fix. We are currently just >>> talking about a First Public Working Draft, not a 100% perfect >>> specification. >> I view several of the problems that I found to be significant. You can >> protest all you want, but don't expect me to stop using words that are >> neither hyperbolic nor inflammatory. >> >>>> - With the new emphasis on SPARQL, there should be a part of Section 1 that >>>> introduces the use of SPARQL as the definition of SHACL. This would >>>> include some of the stuff from the beginning of Section 3. >>> Done. >>> >>>> There needs to >>>> be more information on how SPARQL is used in the definition of SHACL in >>>> the discussion that is currently at the beginning of Section 3, such as >>>> how the results of the queries are combined. This would also discuss the >>>> problem with blank nodes. >>> I have attempted to formulate a paragraph on blank nodes, but marked it >>>> with a red TODO because the wording may not use the correct >>>> terminology. I would appreciate input (I believe Eric may have the right >>>> references here). >> Although more needs to be done here, I think that it is acceptable for now. >> >>>> As well, sh:hasShape needs to be better described. >>> I added a bit on that; not sure what else is missing. The key feature of >>>> that definition is the reference to the validateNodeAgainstShape >>>> operation, which would describe the details. >> This doesn't work, as validateNodeAgainstShape depends on sh:hasShape. >> >> As well, the recursion as error handling doesn't appear to be specified in a >> reasonable manner. One of the calls doesn't even specify a value for this >> variable. The calls for and and or set it naively, without regards for >> whether there is a negated loop. >> >>>> Several SPARQL queries provided require that the shapes graph >>>> be accessible. As this is not guaranteed, there needs to be an >>>> explanation of what is going on. It would also be better to have SPARQL >>>> definitions for more of SHACL, such as scopes. (This would require moving >>>> the details of using SPARQL to define SHACL earlier in the document.) >>> All done, I believe. >>> >>>> - The handling of results is poorly defined. There is no discussion of how >>>> the results of embedded constraints and shapes are to be handled. This >>>> needs to be cleaned up before FPWD publication. >>> I have added statements that clarify that these nested results are just >>> temporary. >> I think that much more needs to be done to introduce result handling. >> >>>> - With the user-friendly syntax, shapes do not necessarily need to be in an >>>> RDF graph. I think that this means that the early part of the document >>>> should not say "shapes graph", but instead something like "shapes". Then >>>> the document can say "SHACL shapes are often encoded in an RDF graph, >>>> which is called the shapes graph." Then later on it can say "shapes >>>> encoded as an RDF graph" where necessary. I don't know what should be >>>> done for constructs that are not going to be part of the user-friendly >>>> syntax. >>> This is not my understanding of how SHACL works. I believe the SHACL spec >>> always assumes that the shapes are represented in RDF, and in a >>> dedicated shapes graph, using exactly the specified vocabulary. If >>> someone wants to use another (compact) syntax then these syntaxes need >>> to be translated into RDF triples prior to execution. >> This is an unnecessary step. It should not be required. >> >>>> - SHACL is not an RDF vocabulary. It is a language that can be encoded in >>>> RDF, and that uses a particular vocabulary in the encoding. Any time >>>> SHACL shapes are discussed as being part of an RDF graph, careful >>>> attention needs to be paid ot the wording used so as to not give incorrect >>>> information. >>> My understanding of SHACL is that SHACL *defines* an RDF vocabulary. Our >>> whole spec references IRIs from that vocabulary such as sh:minCount. I >>> have tried to make it clear in previous edits that SHACL is not just a >>> vocabulary, and would appreciate specific pointers if you believe this is >>> still misleading. >> I will do another pass on this after FPWD publication. >> >>>> - What happens with cyclic shapes references that do not involve a property >>>> constraint? Are these handled the same as cyclic references that do >>>> involve a property constraint? >>> Yes, the same way. The recursion stops if it encounters the same >>> shape/focusNode combination. Having said this, the handling of >>> sh:valueShape differs from the others such as sh:AndConstraint: they pass >>> in another argument to the recursionIsError argument. The effect of this >>> is that sh:valueShape will handle recursion as "true" while others will >>> handle them as "failure". >> So recursion inside an and is a failure? This does not seem to be >> reasonable in general. >> >>>> - All uses of RDFS notions, e.g., subclasses, should be qualified, e.g., >>>> SHACL subclasses. >>> To me this feels a bit pedantic. Isn't this handled by section 1.1 (which >>> you wrote), i.e. we do a broad statement in the beginning so that we don't >>> need to repeat the same things over and over again? >> There are going to be many readers of the document that are coming from an >> RDF and RDFS background. I think that the document needs to hammer in the >> differences so that such people do not get the wrong idea now and complain >> bitterly later. >> >>>> - The relationship between shapes and constraints is poorly explained. A >>>> shape has a bunch of constraints, which together serve to define the >>>> shape. Constraints are not just validated against the same focus nodes. >>> I have tried to improve the wording. >> I think that the problem is still that the way that SHACL is put together is >> difficult to describe, and comprehend. For example, shape scopes are used >> sometimes when shapes are considered but not at other times. A cleaner >> design would be easier to describe. >> >>>> - Most of the document is about the definition of SHACL. There is little or >>>> no need for MUST, etc., in this definition. Where MUST, etc., should show >>>> up is when describing the behaviour of SHACL implementations. For >>>> example, a good description of scope combination would be "The scopes of a >>>> SHACL shape are considered additively, so, for example, in a shape with >>>> two individual scopes both individuals are in the scope of the shape." >>>> with no MUST, etc., needed. >>> Ok, I didn't know that (and wonder why MUST exists at all). I have now >>> tried to limit MUST to sentences about the implementations. >> Things are better now. It is guaranteed that more wordsmithing will be >> needed down the line. >> >>>> - The description of the various bits of property constraints obscures the >>>> independence of some of the bits. For example, splitting sh:minCount and >>>> sh:maxCount would eliminate talk about defaults. >>> I mainly mentioned the default value of sh:minCount to clarify it, >>> because we had discussed in the past that ShEx uses 1 as default. min and >>> maxCount are grouped together because the computation of counts is a >>> potentially expensive operation and shouldn't be done twice. You might say >>> this is an implementation detail, but in order to be able to apply this >>> optimization, they need to be in the same template and thus SPARQL >>> query. I think there should be strong reasons if we want to deviate from >>> the template mapping. Eliminating the talk about defaults doesn't strike >>> me as a strong reason. (Another small difference is that the number of >>> results may be different if we split them up. Right now there will always >>> only be one constraint violation, even if both are violated, due to a >>> modeling error). >>>> There are few things that would make the document better, but that are >>>> certainly not needed immediately. >>>> >>>> - It would be nice to have an option to hide the SPARQL definitions. >>> Yep, will do at some stage. >>> >>>> - It might be better to turn Section 2.3 into the beginning of Section 3. >>> This doesn't make sense to me because 3 is only about the Core constraint >>> types, while 2.3 is about the general mechanism (including a brief >>> reference to the extension mechanism). Also, how can we have a section on >>> Shapes without mentioning Constraints. >>> Details of my recent edits are here: >>> >>> >> https://github.com/w3c/data-shapes/commit/607e758fbf6f633ab36fcdb4d59df01ccdad1699 >>> Thanks again, >>> Holger
Received on Friday, 25 September 2015 23:13:55 UTC