- From: Håvard Ottestad <hmottestad@gmail.com>
- Date: Mon, 6 Jul 2020 16:19:54 +0200
- To: Irene Polikoff <irene@topquadrant.com>
- Cc: Public Shacl W3C <public-shacl@w3.org>, Holger Knublauch <holger@topquadrant.com>
- Message-ID: <CAEKmdN2Em0K-ONwchhAFHNWgsBMtev0nqP2Mu4QhqbOg_13esg@mail.gmail.com>
In my haste to respond I can see that I forgot to account for the sh:class constraint in example 2. This would either amount to a query with a union or two separate queries (one for sh:in and one for sh:class). Håvard On Mon, Jul 6, 2020 at 4:05 PM Håvard Ottestad <hmottestad@gmail.com> wrote: > Hi Holger, > > I've typed up some queries I believe will validate examples 2, 3 and 4 at > least as efficiently as a SPARQL target. > > If you agree that these queries are correct (or close enough to being > correct), then we know that all those examples at least can be implemented > at least as performant as SPARQL targets. > > If there is a way to evaluate all target shapes that is as fast or faster > than using SPARQL targets then I think that sh:targetShape should be > considered on the same terms as SPARQL targets performance wise, regardless > of the poor performance you are currently seeing. > > I can think of one consideration that would either not perform well, or at > least be very difficult to implement, and that is recursive SHACL since > SPARQL does not support recursion. I believe that this should not hold us > back from considering sh:targetShape. > > Here is the list of queries. > > > ############################################################################### > # 2. Instances in namespace "company" must have appropriate class and > dc:type # > > ############################################################################### > ex:CompanyShape a sh:NodeShape; > sh:targetShape [ > sh:nodeKind sh:IRI; > sh:pattern "^https://company-graph.example.com/resource/company/"; > ]; > sh:class ex:Company; > sh:property [sh:path dc:type; sh:in ("conglomerate" "collective" > "enterprise")]; > . > > ### Target query ### > select ?target where { > { > ?target ?b ?c. > FILTER(isIRI(?target) && regex(str(?target), "^ > https://company-graph.example.com/resource/company/") ) > } union { > ?a ?b ?target. > FILTER(isIRI(?target) && regex(str(?target), "^ > https://company-graph.example.com/resource/company/") ) > } > > } > > ### Combined query ### > select ?target ?value where { > { > ?target ?b ?c. > ?target dc:type ?value. > FILTER(NOT IN ("conglomerate" "collective" "enterprise")) > FILTER(isIRI(?target) && regex(str(?target), "^ > https://company-graph.example.com/resource/company/") ) > } union { > ?a ?b ?target. > ?target dc:type ?value. > FILTER(NOT IN ("conglomerate" "collective" "enterprise")) > FILTER(isIRI(?target) && regex(str(?target), "^ > https://company-graph.example.com/resource/company/") ) > } > > } > > ### Optimized combined query for property shapes### > select ?target ?value where { > ?target dc:type ?value. > FILTER(NOT IN ("conglomerate" "collective" "enterprise")) > FILTER(isIRI(?target) && regex(str(?target), "^ > https://company-graph.example.com/resource/company/") ) > > } > > > ##################################################################### > # 3. All langStrings must have one of a predefined set of languages # > ##################################################################### > ex:langStringShape a sh:NodeShape; > sh:targetShape [sh:datatype rdf:langString]; > sh:languageIn ("en" "bg"); > . > > > ### Target query ### > select ?target where { > ?a ?b ?target. > FILTER(DATATYPE(?target) = rdf:langString) > } > > ### Combined query ### > select ?target where { > ?a ?b ?target. > FILTER(DATATYPE(?target) = rdf:langString) > FILTER(!langMatches(lang(?title), "en") && !langMatches(lang(?title), > "bg")) > } > > > > ######################################################################################### > # 4. Steve is very popular, so everyone who knows at least three people > must know Steve # > > ######################################################################################### > ex:Personshape a sh:NodeShape; > sh:targetShape [sh:path foaf:knows; sh:minCount 3]; > sh:property [sh:path foaf:knows; sh:hasValue ex:Steve]; > . > > > ### Target query ### > select ?target where { > ?target foaf:knows ?count_0, ?count_1, ?count_2. > FILTER(?count_0 != ?count_1) > FILTER(?count_1 != ?count_2) > FILTER(?count_2 != ?count_0) > } > > ### Combined query ### > select ?target ?value where { > ?target foaf:knows ?count_0, ?count_1, ?count_2. > FILTER(?count_0 != ?count_1) > FILTER(?count_1 != ?count_2) > FILTER(?count_2 != ?count_0) > > ?target foaf:knows ?value. > FILTER NOT EXISTS {?target foaf:knows ex:Steve} > } > > > Cheers, > Håvard > > PS. I believe that example 2 is actually a bit wrong, I'll comment on the > PR instead of in this email. > > On Mon, Jul 6, 2020 at 1:49 PM Irene Polikoff <irene@topquadrant.com> > wrote: > >> But if there is no agreement, then I am concerned about using sh: >> namespace for this new construct. This does not seem right. >> >> TQ has been adding some custom constructs into dash: namespace, for >> example. Other namespaces for custom extensions could be used, as well. >> >> I believe sh: namespace should be reserved for things that a majority of >> implementers have reached consensus on. I recognize that currently we only >> have 2 implementers in this discussion. It would be better to broaden the >> circle of people looking at this topic. >> >> If we have a strong disagreement, then the process to follow is normally: >> >> 1. Have a call (or calls) between the concerned parties to see if they >> can reach an agreement to come to some compromise. >> 2. If not and the objection is strong ( as in “will not implement”), then >> typically, the feature would not make it to the spec. Implementers can >> still do it as a custom extension. >> >> In the past, we had some hypothetical and/or philosophical objections >> that slowed progress by many many months. It was frustrating. An objection >> based on the implementation experience is, however, different. I believe it >> needs a serious consideration and a resolution - even if it slows progress. >> Unfortunately, there is no way to avoid some sluggishness when multiple >> parties need to come to consensus - this is a downside of standards >> development. >> >> Irene >> >> > On Jul 6, 2020, at 6:06 AM, Holger Knublauch <holger@topquadrant.com> >> wrote: >> > >> >> On 6/07/2020 16:53, Håvard Ottestad wrote: >> >> >> >> Hi Holger, >> >> >> >> Could you share the shape that has particularly bad performance so I >> can see if I can think of an optimal solution? >> > For examples 2,3,4 from https://github.com/w3c/shacl/pull/3 >> >> >> >> My plan has essentially been to convert the targetShape into a sparql >> query. This would put the performance in the same realm as sparql targets. >> >> >> >> The benefits of targetShape over sparql targets is that it’s possible >> to validate the changes to a database efficiently, we are seeing O(c) >> performance where c is the effective size of the change instead of O(n) >> which is what we were seeing with sparql targets (where n is the size of >> the database). >> > >> > The same algorithms can be applied to the dash:HasValueTarget - it's >> just as declarative as the other shapes. >> > >> > I think we (I at least) had discussed this sufficiently. I think I was >> clear on the performance issues. I think we can agree to disagree and move >> on. I was hoping that other implementers may have additional input. >> > >> > Holger >> > >> > >> >> >> >> Håvard >> >> >> >>> On 6 Jul 2020, at 03:02, Holger Knublauch <holger@topquadrant.com> >> wrote: >> >>> >> >>> There have been various discussions around SHACL target extensions, >> and there is an open Pull Request https://github.com/w3c/shacl/pull/3 to >> add sh:targetShape as a new target type to SHACL-AF. I have meanwhile >> attempted to implement that feature in our code base and have concluded >> that the feature is not a good idea for SHACL-AF (or even SHACL Core). The >> main argument is still about performance: >> >>> >> >>> - I stated that the worst-case performance of this general feature is >> *catastrophic* as it needs to perform validation on all subjects and >> objects only to determine which nodes it then needs to validate for real. >> This means that sh:targetShape is very different from the other 4 built-in >> target types (sh:targetClass, sh:targetNode, sh:targetSubjectsOf, >> sh:targetObjectsOf) in that it requires validation before validation (which >> by itself causes implementation complexity). >> >>> >> >>> - Håvard stated that the alternative, SPARQL-based targets has bad >> performance for his implementation. >> >>> >> >>> We do have similar use cases to yours, esp around dependencies across >> multiple properties. For example: >> >>> >> >>> IF ex:country=USA THEN ex:state sh:in [ "AZ", "CA", "FL" ... ] >> >>> IF ex:country=AU THEN ex:state sh:in [ "NSW", "VIC", "QLD" ... ] >> >>> >> >>> We also want a declarative solution that can be used by input forms, >> so that if the user changes the country then the states drop down list also >> changes. So relying on SPARQL queries or so wouldn't solve our use cases >> either. >> >>> >> >>> The current proposal, based on the new keyword sh:targetShape was >> >>> >> >>> ex:USAStateShape >> >>> sh:targetShape [ >> >>> a sh:PropertyShape ; >> >>> sh:path ex:country ; >> >>> sh:hasValue ex:USA ; >> >>> ] ; >> >>> sh:property [ >> >>> sh:path sh:state ; >> >>> sh:in [ "AZ" "CA" "FL" ... ] >> >>> ] . >> >>> >> >>> I believe the following is better overall: >> >>> >> >>> ex:USAStateShape >> >>> sh:target [ >> >>> a dash:HasValueTarget ; >> >>> dash:predicate ex:country ; >> >>> dash:object ex:USA ; >> >>> ] ; >> >>> sh:property [ >> >>> sh:path sh:state ; >> >>> sh:in [ "AZ" "CA" "FL" ... ] >> >>> ] . >> >>> >> >>> Where dash:HasValueTarget is a SPARQL-based Target Type >> https://w3c.github.io/shacl/shacl-af/#SPARQLTargetType >> >>> >> >>> Implementations of SHACL-AF already will do the right thing and will >> be able to do so efficiently. If you cannot use SPARQL efficiently, your >> platform can simply hard-code this pattern, just like you currently >> hard-code the common scenarios of the proposed sh:targetShape property to >> avoid the bad default performance. I expect the difference in your >> implementation would be marginal, but we neither need to change the spec >> nor open up SHACL to a feature that is very complex to implement >> efficiently. >> >>> >> >>> The downside of using something like dash:HasValueTarget is that it >> doesn't "cover" all possible use cases. Instead of allowing arbitrary >> sh:targetShapes we limit this to hasValue patterns. But those hasValue >> patterns were the main use cases before we brainstormed that "it would be >> nice" to also support various other shape types (sh:filterShape etc). >> hasValue patterns are trivial to look up. If anyone needs additional >> patterns, such as the one from the PR then they can be covered by custom >> targets which may also get hard-coded by those that cannot use SPARQL. >> >>> >> >>> BTW the case of >> http://datashapes.org/constraints.html#HasValueInConstraintComponent can >> be covered by having multiple dash:HasValueTargets with different >> dash:objects. A bit more verbose but can reuse the same machinery. If you >> have long lists, introduce your own dash-like extension backed by a SPARQL >> query and hard-code against that if performance isn't good. >> >>> >> >>> Sorry for moving back and forth on this topic, but getting hands-on >> experience with an implementation revealed to me just how bad the >> sh:targetShape solution would become. And I couldn't schedule time for such >> an implementation earlier due to other commitments. >> >>> >> >>> It would be useful to have input from other SHACL implementers (there >> are about a dozen SHACL engines out there, and counting). We really don't >> want to rush something through which then becomes a burden for others. >> >>> >> >>> Holger >> >>> >> >>> >> >>> >> > >> >
Received on Monday, 6 July 2020 14:20:20 UTC