Re: SHACL-based data extraction from a knowledge graph

I think you could do it with jena. Load the dara into a Graph, then get the focus nodes for all shapes you want using VLib.focusNodes. evaluate each shape on its focus nodes and compile the intersection of all focus nodes that are valid, along with the shapes. Now evaluate the shapes again on these valid focus nodes and record all the triples/quads that are pulled from the data graph during evaluation. 

That last bit requires you to wrap the original data graph object in a custom class extending the Graph class in such a way that you intercept all reading calls and store the result triples in an internal set before handing them back to the client.

After the second evaluation of only the valid focus nodes you should have your desired extraction result in the wrapper graph.

I may be wrong about this approach, but it might just work. If you try this and succeed, please consider contributing the code to jena. It's not the first time this question has come up.

All the best!
Florian


Am 8. März 2022 18:25:13 MEZ schrieb Thomas Francart <thomas.francart@sparna.fr>:
>Hello !
>
>I am facing the following situation :
>
>   - A large knowledge graph with lots of triples
>   - A need to export multiple RDF datasets from this large Knowledge
>   Graph, each containing a subset of the triples from the graph
>   - Datasets are not limited to a flat list of entities with their
>   properties, but will each contain a small piece of graph
>   - The exact content of each Dataset is specified in SHACL, using
>   standard constraints of cardinalities, sh:node, datatype, languageIn,
>   sh:hasValue, etc. This SHACL will be used as the source for documenting the
>   exact content of each Dataset using [1]
>
>And now the question : can we automate the extraction of data from the
>large knowledge graph based on the SHACL definition of our datasets ?
>What we are looking for is a guarantee that the extraction process will
>produce a dataset that is conformant with the SHACL definition.
>
>Has anyone done something similar ? A naîve approach would be a SPARQL
>query generation based on the SHACL definition of the dataset, but I
>suspect the query will quickly be too complicated.
>
>Thanks !
>Thomas
>
>[1] SHACL Play documentation generator :
>https://shacl-play.sparna.fr/play/doc
>
>
>-- 
>
>*Thomas Francart* -* SPARNA*
>Web de *données* | Architecture de l'*information* | Accès aux
>*connaissances*
>blog : blog.sparna.fr, site : sparna.fr, linkedin :
>fr.linkedin.com/in/thomasfrancart
>tel :  +33 (0)6.71.11.25.97, skype : francartthomas

-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

Received on Tuesday, 8 March 2022 23:55:53 UTC