Re: Announcement: Springer Nature SciGraph - Public datasets release

Following up on our announcement we would like to add some notes on
the RDF validation approach used in the Springer Nature SciGraph
project [1].

As an integral part of our data publishing architecture (at various
layers – integration, validation and publishing) we are relying on the
Shapes Constraint Language (SHACL) for validating RDF graphs
(currently under W3C specification [2]) and an open-source prototype
SHACL validation implementation using Apache Jena by TopQuadrant [3].

This has proved to be a very useful technology for us to constrain
input and output 'views' over our knowledge graph.

For data ingestion we have defined separate shapes graphs for each of
our ETL workflows and a corresponding named graph for the data. We
make extensive use of data typing.

For data publishing we define a set of export SPARQL queries which
dynamically query over a corresponding set of export shapes graphs to
restrict the types and relations. This means that we can maintain the
'view' of our export data to a set of SHACL shape declarations. (Note
that this mechanism replaces a homegrown method we had used previously
for defining data publishing 'contracts'.)

SHACL thus provides us with some key (previously missing) puzzle
pieces in our semantic tool chest.

The SciGraph Team

[1] http://www.springernature.com/scigraph
[2] http://w3c.github.io/data-shapes/shacl/
[3] https://github.com/TopQuadrant/shacl

Received on Tuesday, 14 March 2017 09:44:44 UTC