- From: Arthur Ryman <arthur.ryman@gmail.com>
- Date: Fri, 25 Sep 2015 17:25:59 -0400
- To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
- Cc: Holger Knublauch <holger@topquadrant.com>, "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Holger, I'd like to discuss your comment: "I mainly mentioned the default value of sh:minCount to clarify it, because we had discussed in the past that ShEx uses 1 as default. min and maxCount are grouped together because the computation of counts is a potentially expensive operation and shouldn't be done twice. You might say this is an implementation detail, but in order to be able to apply this optimization, they need to be in the same template and thus SPARQL query. I think there should be strong reasons if we want to deviate from the template mapping. Eliminating the talk about defaults doesn't strike me as a strong reason. (Another small difference is that the number of results may be different if we split them up. Right now there will always only be one constraint violation, even if both are violated, due to a modeling error)." This is an example of how implementation considerations ca affect the spec. We should avoid that for several reasons. For example, the meaning of a set of property constraints is that they all have to be satisfied (conjunction). All of the property constraints are optional. Therefore we do not need to define optional values for missing property constraints. In the case of minCount, if it is absent, then there is no constraint. This happens to have the the same effect as including minCount 0, but that doesn't mean we need to introduce a default value. Similarly, if maxCount is absent then there is no constraint. Of course, an implementation would optimize and only compute the actual count once. That is why is it better to not view the SPARQL semantics as the actual definitions of templates. Implementers must have the freedom to implement the spec however they want as long as the get the correct answers. Actual individual templates for minCount and maxCount would still be useful for the creation of a test oracle, or maybe even an acceptable reference implementation. The latest version of the spec talks about default values and interpretations and includes the following SPARQL which combines minCount and maxCount . The spec doesn't assign a default value to $maxCount so the SPARQL is not actually well-defined when maxCount is absent. SELECT $this ($this AS ?subject) $predicate WHERE { { SELECT (COUNT(?value) AS ?count) WHERE { $this $predicate ?value . } } FILTER (?count < $minCount || (bound($maxCount) && ?count > $maxCount)) } I propose that the discussion about defaults should be eliminated and the SPARQL should be split into one for minCount and one for maxCount. Can we treat this as editorial? Thanks. -- Arthur On Fri, Sep 18, 2015 at 10:56 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote: > On 9/17/2015 1:28, Peter F. Patel-Schneider wrote: >> > I took a quick look at the version of the document current on 15 September, >> > concentrating mostly on Sections 1-6. >> > >> > The document is looking better, but there are still several significant >> > problems. >> >> Thanks, see my responses below. >> >> As an aside, I find it unhelpful to read hyperboles such as "significant >> problems" for things that are relatively easy to fix. We are currently just >> talking about a First Public Working Draft, not a 100% perfect >> specification. > > I view several of the problems that I found to be significant. You can > protest all you want, but don't expect me to stop using words that are > neither hyperbolic nor inflammatory. > >> > >> > - With the new emphasis on SPARQL, there should be a part of Section 1 that >> > introduces the use of SPARQL as the definition of SHACL. This would >> > include some of the stuff from the beginning of Section 3. >> >> Done. >> >> > There needs to >> > be more information on how SPARQL is used in the definition of SHACL in >> > the discussion that is currently at the beginning of Section 3, such as >> > how the results of the queries are combined. This would also discuss the >> > problem with blank nodes. >> >> I have attempted to formulate a paragraph on blank nodes, but marked it >> >with a red TODO because the wording may not use the correct >> >terminology. I would appreciate input (I believe Eric may have the right >> >references here). > > Although more needs to be done here, I think that it is acceptable for now. > >> > As well, sh:hasShape needs to be better described. >> >> I added a bit on that; not sure what else is missing. The key feature of >> >that definition is the reference to the validateNodeAgainstShape >> >operation, which would describe the details. > > This doesn't work, as validateNodeAgainstShape depends on sh:hasShape. > > As well, the recursion as error handling doesn't appear to be specified in a > reasonable manner. One of the calls doesn't even specify a value for this > variable. The calls for and and or set it naively, without regards for > whether there is a negated loop. > >> > Several SPARQL queries provided require that the shapes graph >> > be accessible. As this is not guaranteed, there needs to be an >> > explanation of what is going on. It would also be better to have SPARQL >> > definitions for more of SHACL, such as scopes. (This would require moving >> > the details of using SPARQL to define SHACL earlier in the document.) >> >> All done, I believe. >> >> > >> > - The handling of results is poorly defined. There is no discussion of how >> > the results of embedded constraints and shapes are to be handled. This >> > needs to be cleaned up before FPWD publication. >> >> I have added statements that clarify that these nested results are just >> temporary. > > I think that much more needs to be done to introduce result handling. > >> > - With the user-friendly syntax, shapes do not necessarily need to be in an >> > RDF graph. I think that this means that the early part of the document >> > should not say "shapes graph", but instead something like "shapes". Then >> > the document can say "SHACL shapes are often encoded in an RDF graph, >> > which is called the shapes graph." Then later on it can say "shapes >> > encoded as an RDF graph" where necessary. I don't know what should be >> > done for constructs that are not going to be part of the user-friendly >> > syntax. >> >> This is not my understanding of how SHACL works. I believe the SHACL spec >> always assumes that the shapes are represented in RDF, and in a >> dedicated shapes graph, using exactly the specified vocabulary. If >> someone wants to use another (compact) syntax then these syntaxes need >> to be translated into RDF triples prior to execution. > > This is an unnecessary step. It should not be required. > >> > - SHACL is not an RDF vocabulary. It is a language that can be encoded in >> > RDF, and that uses a particular vocabulary in the encoding. Any time >> > SHACL shapes are discussed as being part of an RDF graph, careful >> > attention needs to be paid ot the wording used so as to not give incorrect >> > information. > >> My understanding of SHACL is that SHACL *defines* an RDF vocabulary. Our >> whole spec references IRIs from that vocabulary such as sh:minCount. I >> have tried to make it clear in previous edits that SHACL is not just a >> vocabulary, and would appreciate specific pointers if you believe this is >> still misleading. > > I will do another pass on this after FPWD publication. > >> > - What happens with cyclic shapes references that do not involve a property >> > constraint? Are these handled the same as cyclic references that do >> > involve a property constraint? >> >> Yes, the same way. The recursion stops if it encounters the same >> shape/focusNode combination. Having said this, the handling of >> sh:valueShape differs from the others such as sh:AndConstraint: they pass >> in another argument to the recursionIsError argument. The effect of this >> is that sh:valueShape will handle recursion as "true" while others will >> handle them as "failure". > > So recursion inside an and is a failure? This does not seem to be > reasonable in general. > >> > - All uses of RDFS notions, e.g., subclasses, should be qualified, e.g., >> > SHACL subclasses. >> >> To me this feels a bit pedantic. Isn't this handled by section 1.1 (which >> you wrote), i.e. we do a broad statement in the beginning so that we don't >> need to repeat the same things over and over again? > > There are going to be many readers of the document that are coming from an > RDF and RDFS background. I think that the document needs to hammer in the > differences so that such people do not get the wrong idea now and complain > bitterly later. > >> > - The relationship between shapes and constraints is poorly explained. A >> > shape has a bunch of constraints, which together serve to define the >> > shape. Constraints are not just validated against the same focus nodes. >> >> I have tried to improve the wording. > > I think that the problem is still that the way that SHACL is put together is > difficult to describe, and comprehend. For example, shape scopes are used > sometimes when shapes are considered but not at other times. A cleaner > design would be easier to describe. > >> > - Most of the document is about the definition of SHACL. There is little or >> > no need for MUST, etc., in this definition. Where MUST, etc., should show >> > up is when describing the behaviour of SHACL implementations. For >> > example, a good description of scope combination would be "The scopes of a >> > SHACL shape are considered additively, so, for example, in a shape with >> > two individual scopes both individuals are in the scope of the shape." >> > with no MUST, etc., needed. >> >> Ok, I didn't know that (and wonder why MUST exists at all). I have now >> tried to limit MUST to sentences about the implementations. > > Things are better now. It is guaranteed that more wordsmithing will be > needed down the line. > >> > >> > - The description of the various bits of property constraints obscures the >> > independence of some of the bits. For example, splitting sh:minCount and >> > sh:maxCount would eliminate talk about defaults. > >> I mainly mentioned the default value of sh:minCount to clarify it, >> because we had discussed in the past that ShEx uses 1 as default. min and >> maxCount are grouped together because the computation of counts is a >> potentially expensive operation and shouldn't be done twice. You might say >> this is an implementation detail, but in order to be able to apply this >> optimization, they need to be in the same template and thus SPARQL >> query. I think there should be strong reasons if we want to deviate from >> the template mapping. Eliminating the talk about defaults doesn't strike >> me as a strong reason. (Another small difference is that the number of >> results may be different if we split them up. Right now there will always >> only be one constraint violation, even if both are violated, due to a >> modeling error). > >> > There are few things that would make the document better, but that are >> > certainly not needed immediately. >> > >> > - It would be nice to have an option to hide the SPARQL definitions. >> >> Yep, will do at some stage. >> >> > >> > - It might be better to turn Section 2.3 into the beginning of Section 3. > >> >> This doesn't make sense to me because 3 is only about the Core constraint >> types, while 2.3 is about the general mechanism (including a brief >> reference to the extension mechanism). Also, how can we have a section on >> Shapes without mentioning Constraints. > >> >> Details of my recent edits are here: >> >> > https://github.com/w3c/data-shapes/commit/607e758fbf6f633ab36fcdb4d59df01ccdad1699 >> >> Thanks again, >> Holger >
Received on Friday, 25 September 2015 21:26:32 UTC