- From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
- Date: Thu, 4 Jun 2020 12:31:53 +0300
- To: Public Shacl W3C <public-shacl@w3.org>
- Message-ID: <CAMv+wg52_mHXkP6S8V6fQ0FMy0dQbc7k0hF9_+ZZRBwAWCHV7w@mail.gmail.com>
Hi everyone! (This email is formatted as markdown) I have 2 objections to earlier proposals: - According to https://www.w3.org/TR/shacl-af/#node-expressions-filter-shape, `sh:filterShape` is always used with `$this` as seed and `sh:nodes` as generator. So I don't think it can be used for our case. - It seems wrong to me to use `sh:target` and `sh:filterShape` in a disconnected manner (the former with just marker classes, the latter to carry the actual target shape) I thought more about what Holger called `sh:targetNodesConforming`, and I think what we need already exists: target by `NodeShape`. So I think we only need to add a new subsection of https://www.w3.org/TR/shacl-af/#targets but no new classes or properties. > Separating sh:AllSubjects and sh:AllObjects separately would offer more flexibility too Both subjects and objects are Nodes in the graph. I think `NodeShape` already gives us enough flexibility to select one or the other (there are 2 related examples below: selecting by IRI pattern, and selecting langString literals). Just like we don't have distinct `SubjectNodeShape` vs `ObjectNodeShape`, I don't think we need such distinction for targeting either. Below is a proposal for such new subsection, please comment. # NodeShape Targets Sometimes it is useful to find nodes by shape, and then validate them using another shape. To do this, you can use `sh:target` that is a `sh:NodeShape`: ``` ex:MyNodeShape a sh:NodeShape; sh:target [a sh:NodeShape; <NodeShape constructs for target> ]; <NodeShape constructs for validation> . ``` In the following subsections we show several examples of this design. ## Target by Property and Object Norwegians must have one norwegianID: ``` ex:NorwegianShape a sh:NodeShape; sh:target [a sh:NodeShape; sh:property [sh:path ex:nationality; sh:hasValue ex:Norway]; ]; sh:property [sh:path ex:norwegianID; sh:minCount 1; sh:maxCount 1]; . ``` ## Target Namespace Instances All instances in a given namespace must have a certain shape: ``` ex:CompanyShape a sh:NodeShape; sh:target [a sh:NodeShape; sh:nodeKind sh:IRI; sh:pattern "^https://company-graph.ontotext.com/resource/company/"; ]; sh:class ex:Company; sh:property [sh:path dc:type; sh:in ("conglomerate" "collective" "enterprise")]; . ``` ## Target All langStrings All langStrings must have one of a predefind set of languages: ``` ex:langStringShape a sh:NodeShape; sh:target [a sh:NodeShape; sh:datatype rdf:langString; ]; sh:languageIn ("en" "bg"); . ``` ## Target By Cardinality Let's say a person Steve is very popular, so everyone who knows at least three people must know Steve: ``` ex:Personshape a sh:NodeShape; sh:target [a sh:NodeShape; sh:property [sh:path foaf:knows; sh:minCount 3]; ]; sh:property [sh:path foaf:knows; sh:hasValue ex:Steve]; . ``` ## Semantic Type Discrimination In some datasets, instances are not discriminated by `rdf:type` alone, but also by other traits. Often more than one check needs to be performed. Eg in Geonames, all instances have type `gn:Feature`, and are further discriminated by `gn:featureCode`. That's a 2-level classification of some 650 codes that includes everything from continents to mountains to pipelines to hotels. Imagine that you're interested only in countries and top-level administrative divisions (states, provinces and the like). - A bunch of codes correspond to the concept "country" - Countries have `gn:countryCode` - Only the code `gn:ADM1` corresponds to top-level administrative divisions - Administrative divisions have `gn:parentCountry` (This does not describe all Geonames fields, only the ones that we need.) ``` gn:Feature a sh:NodeShape, rdf:Class; # implicit: sh:targetClass gn:Feature; sh:property [sh:path gn:name; sh:datatype xsd:string; sh:minCount 1; sh:maxCount 1]; sh:property [sh:path gn:featureClass; sh:nodeKind sh:IRI; sh:minCount 1; sh:maxCount 1]; sh:property [sh:path gn:featureCode; sh:nodeKind sh:IRI; sh:minCount 1; sh:maxCount 1]; . ex:CountryShape a sh:NodeShape; sh:target [a sh:NodeShape; sh:class gn:Feature; sh:property [sh:path gn:featureCode; sh:in (gn:A.PCLI gn:A.PCLD gn:A.PCLIX gn:A.PCLS gn:A.PCL gn:A.TERR gn:A.PCLF)]; ]; sh:property [sh:path gn:countryCode; sh:datatype xsd:string; sh:minCount 1; sh:maxCount 1]; . ex:ADM1Shape a sh:NodeShape; sh:target [a sh:NodeShape; sh:class gn:Feature; sh:property [sh:path gn:featureCode; sh:hasValue gn:ADM1]; ]; sh:property [sh:path gn:parentCountry; sh:node ex:CountryShape; sh:minCount 1; sh:maxCount 1]; . ``` ## Targeting and Reference Shapes In the last example we stated that `gn:parentCountry` must point to something that satisfies `ex:CountryShape`. This means that every time we validate `ex:ADM1Shape`, we need to validate its country (together with the country-specific properties). So the validation of ADM1 must recurse into validation of Country. This is not always convenient since it's hard to control this recursive process. Furthermore, if Country referred back to `ex:ADM1Shape` of its regions, we'd have a recursive shape and the result would be undefined. It may therefore be more convenient to check only the **existence** of Country from ADM1, and depend that some other process will check the validity of Country. We could do it like this: ``` ex:CountryReferenceShape a sh:NodeShape; sh:class gn:Feature; sh:property [sh:path gn:featureCode; sh:in (gn:A.PCLI gn:A.PCLD gn:A.PCLIX gn:A.PCLS gn:A.PCL gn:A.TERR gn:A.PCLF)]; . ex:CountryShape a sh:NodeShape; sh:target ex:CountryReferenceShape; sh:property [sh:path gn:countryCode; sh:datatype xsd:string; sh:minCount 1; sh:maxCount 1]; . ex:ADM1ReferenceShape a sh:NodeShape; sh:class gn:Feature; sh:property [sh:path gn:featureCode; sh:hasValue gn:ADM1]; . ex:ADM1Shape a sh:NodeShape; sh:target ex:ADM1ReferenceShape; sh:property [sh:path gn:parentCountry; sh:node ex:CountryReferenceShape; sh:minCount 1; sh:maxCount 1]; . ``` The significant change is in the last line: ADM1 checks `ex:CountryReferenceShape` rather than `ex:CountryShape`. And we reuse `ex:CountryReferenceShape` as both: - Existence check in `ex:ADM1Shape` - Targeting shape in `ex:CountryShape` ## Politicians and Parties Let's say every Party has at least one Politician, every Politician belongs to exactly one Party (ok, that is unrealistic), politicians are defined by a combination of `rdf:type` and `dc:type`, and both Parties and Politicians adhere to one of two politics (Democrat vs Republican). If we model this with two shapes that refer to each other, we'd have recursive shapes. So again we use two shapes for every entity: - A "smaller" ReferenceShape that just checks existence in terms of "semantic type discrimination" - A "bigger" Shape that checks all other properties of the instance, and uses the ReferenceShape for targeting This eliminates the recursion. ``` ex:PoliticianReferenceShape a sh:NodeShape; sh:property [sh:path rdf:type; sh:in (foaf:Person dbo:Person)]; sh:property [sh:path dc:type; sh:hasValue "politician"]; . ex:PoliticianShape a sh:NodeShape; sh:target ex:PoliticianReferenceShape; sh:property [sh:path ex:politics; sh:in ("Democrat" "Republican")]; sh:property [sh:path ex:party; sh:node ex:PartyReferenceShape; sh:minCount 1; sh:maxCount 1]; . ex:PartyReference a sh:NodeShape; sh:property [sh:path rdf:type; sh:hasValue foaf:Organization]; sh:property [sh:path dc:type; sh:hasValue "political party"]; . ex:PartyShape a sh:NodeShape; sh:target ex:PartyReferenceShape; sh:property [sh:path ex:politics; sh:in ("Democrat" "Republican")]; sh:property [sh:path ex:politician; sh:node ex:PoliticianReferenceShape; sh:minCount 1]; . ```
Received on Thursday, 4 June 2020 09:32:20 UTC