[data-shapes] IRI introspection (#228)

ajnelson-nist has just created a new issue for https://github.com/w3c/data-shapes:

== IRI introspection ==
There is a function available in SPARQL that I do not believe is available in SHACL 1.0.  If this actually is available somewhere or already under draft for 1.2, I very much welcome a link.

I have some SHACL shapes I would like to write that, for various purposes, require some review of the spelling of the subject IRI.


## Use case 1

Use case 1 is a prescribed-name-form checker.  With an ontology I work with, there is an expectation that `owl:NamedIndividual`s using the ontology should end with a UUID if they have no other practice in place to prevent IRI collisions.  E.g., `http://example.org/kb/Thing-36b67df4-1a42-4588-808a-19dfb79efbeb`.  My understanding is the only way to enforce this in SHACL 1.0 is a SPARQL constraint, because in SPARQL we can review the IRI as a string with `STR($this)`.  For reference, the shape we use[^1] is [here](https://ontology.unifiedcyberontology.org/documentation/shape-coreucothing-identifier-regex-shape.html)[^2][^3], but for saving a click, the form SPARQL constraint reads like this:

```turtle
[]
            a sh:SPARQLConstraint ;
            rdfs:seeAlso <https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.3> ;
            sh:message "UcoThings are suggested to end with a UUID."@en ;
            sh:select """
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX core: <https://ontology.unifiedcyberontology.org/uco/core/>
   SELECT $this
   WHERE {
           FILTER (
                   ! REGEX (
                           STR($this),
                           "[0-9a-f]{8}-[0-9a-f]{4}-[0-5][0-9a-f]{3}-[0-9a-f]{4}-[0-9a-f]{12}$",
                           "i"
                   )
           )
   }
  """ ] ;
```


## Use case 2

Use case 2 is infer-able class assignment based on the URL form.  For instance, suppose an RDF data model wanted to model GitHub's Issues and Pull Requests as `ex:Issue` and `ex:PullRequest`.  If a `owl:NamedIndividual` follows the pattern `https://github.com/[^/]+/[^/]/issues/\d+` (with whatever regex escaping's needed to make that work), I should be able to write a `sh:TripleRule` that assigns the type `ex:Issue`.  Again, using SPARQL's `STR($this)`, I can write a `CONSTRUCT` query to handle this, but I don't see an easy way to target based only on IRI spelling.

I should be able to take a graph like this:

```
<https://github.com/w3c/data-shapes/issues/228>
a ex:Issue ;
ex:mentions <https://github.com/w3c/data-shapes/issues/227> .
```

and from exactly those triples in the data graph, entail this for the yet-untyped node that's the object of `ex:mentions`:

```
<https://github.com/w3c/data-shapes/issues/227>
a ex:Issue .
```


## Use case 3

Not my use case - @philharveyonline posted #227 , which proposes generation of new nodes.  Some kind of functionality would be needed to name the created node.  The [SPARQL `IRI(...)` function](https://www.w3.org/TR/sparql11-query/#func-iri) would let him pull together a string and cast the string into a new node.  But, I don't think there's functionality in SHACL 1.0 or SHACL-AF to do this.


## Proposal

For SHACL 1.2 Core, my use cases 1 and 2 would benefit from letting a target node's lexical value be reviewed by `sh:pattern`.  This suggests to me some kind of use of a `sh:PropertyShape`, with either `sh:path` accepting a special BNode-housed predicate like `sh:inversePath` (`[ sh:into sh:this ]`?) or a new sibling property for `sh:path` (`sh:nodeIRI`?).

I don't quite grok node expressions well enough yet to know whether the `sh:PropertyShape` mentality would work for #227 .


[^1]: Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.
[^2]: Apologies for the distraction, but in case you notice it, I *do* suspect the query's `a/rdfs:subClassOf*` part is superfluous.  There's no need to discuss that here, as I'll be discussing it with that community soon.
[^3]: A second aside, I am aware of the obsoleting [RFC 9562](https://datatracker.ietf.org/doc/html/rfc9562).  Another thing I'll be discussing with that community soon.

Please view or discuss this issue at https://github.com/w3c/data-shapes/issues/228 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 5 February 2025 13:17:29 UTC