SPIN use case: EPIM ReportingHub from Holger Knublauch on 2014-08-03 (public-rdf-shapes@w3.org from August 2014)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 04 Aug 2014 08:11:21 +1000
To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <53DEB389.6000608@topquadrant.com>
This email describes a use case of constraint checking and data 
transformation that should be of interest to the WG. This work was 
undertaken by TopQuadrant and partners including Franz, Inc. for a large 
Norwegian Oil and Gas project. The resulting EPIM ReportingHub system 
[1] has been in production since 2012. You can find more technical 
presentations about this via Google (e.g. [2]). TopQuadrant has since 
worked on two equally large and complete applications EPIM 
EnvironmentHub and EPIM LogisticsHub, all delivered as SaaS with ongoing 
support for years to come.

In a nutshell, the system is a server that is used by oil producers to 
upload daily and monthly drilling reports in a pre-defined XML format. 
The data gets checked against XSD constraints and is then converted into 
a canonical RDF representation where each XML element essentially 
becomes an instance of an OWL ontology derived from the XSD. Those RDF 
instances are then validated against basic integrity constraints, for 
example to verify that the well bores in the document correspond to a 
background data (NPD Fact Pages). Those NPD Fact Pages are in a specific 
named graph based on a different ontology from incoming data.

Here is an example SPIN constraint from that step:

CONSTRUCT {
     _:cv a spin:ConstraintViolation ;
         spin:violationRoot ?this ;
         spin:violationPath ddr:nameWellbore ;
         rdfs:label ?message .
}
WHERE {
     ?this ep-lib:nameWellbore ?wellBoreName .
     BIND (rhspin:wellBoreByName(?wellBoreName) AS ?wellBore) .
     FILTER (!bound(?wellBore)) .
     BIND (fn:concat("[RH-19] Fact Pages validation of the XML file 
failed with the following error: Unregistered well bore name ", 
?wellBoreName) AS ?message) .
}

The query above makes use of SPIN functions such as 
rhspin:wellBoreByName - essentially stored procedures. If such a SPIN 
function is called, a nested SELECT query is executed. This makes the 
resulting queries much more readable and modular than having to repeat 
the same logic again and again.

Once the first step of constraint checking passes, SPIN is used to 
translate from the canonical RDF representation (derived from the XML) 
into the target ontology used to store the drilling reports in a triple 
store. spin:rule is used to attach those transformation queries to the 
source classes. Here is an example rule, again a SPARQL CONSTRUCT. There 
are hundreds of rules with similar complexity.

# STEP 116 Transfer presTestType
CONSTRUCT {
     ?dailyDrillingActivityToStatus ep-core:hasPart _:b0 .
     _:b0 a ?pressureTestType .
}
WHERE {
     ?this ep-spin-lib:nameWellbore ?nameWellBore .
     ?this ddr:dTimStart ?dTimStart .
     ?this ddr:statusInfoRef ?statusInfo .
     ?statusInfo ddr:dTim ?dTim .
     ?statusInfo ddr:presTestTypeRef ?presTestType .
     BIND (ep-spin-lib:selectPressureTestType(?presTestType) AS 
?pressureTestType) .
     BIND (ep-spin-lib:normalizeString(?nameWellBore) AS 
?normalizedWellBoreName) .
     BIND 
(ep-spin-lib:buildDailyDrillingActivityToStatusURI(?normalizedWellBoreName, 
?dTimStart, ?dTim) AS ?dailyDrillingActivityToStatus) .
}

At this stage there is a new RDF graph for the uploaded report, and this 
graph is validated against another set of SPIN constraints, such as

CONSTRUCT {
     _:cv a spin:ConstraintViolation ;
         spin:violationRoot ?this ;
         rdfs:label ?message .
}
WHERE {
     ?this 
(ep-report:reportOn/ep-activity:onWellBore)/ep-core:temporalPartOf 
?wellBore .
     FILTER (!rhspin:currentUserIsOperatorOfWellBore(?wellBore)) .
     BIND (COALESCE(rhspin:npdName(?wellBore), "Unknown well bore") AS 
?wellBoreName) .
     BIND (COALESCE(rhspin:npdName(?licence), "Unknown licence") AS 
?licenceName) .
     BIND (rhspin:companyName() AS ?companyName) .
     BIND (fn:concat("[RH-11] Your company (", ?companyName, ") is not 
the operator of the BAA or licence associated with well bore ", 
?wellBoreName) AS ?message) .
}

I would like to highlight a few points:

- The ontologies were mostly custom-written for this project. However, 
they are the basis for a new ISO 15926-12 standard for life cycle data 
integration using OWL.

- The SPIN files above and some ontologies are entirely private to the 
specific application, not part of "The Semantic Web".

- However, other components and applications use these ontologies 
internally, for example to generate reports documents that aggregate 
data from other uploaded reports.

- OWL is used as the syntax for those ontologies, mostly to describe 
range and cardinality restrictions on properties. OWL is not used for 
inferencing. The only type of "inferencing" are SPIN rules that convert 
from one ontology to another.

- The complexity of the constraint checks and transformation rules 
required something as expressive as SPARQL.

- In order to maintain those queries, we made heavy use of SPIN 
functions that encapsulate reusable blocks of SPARQL code.

- It was natural to associate SPIN constraints and rules with the 
ontology, i.e. we did not need a parallel structure of "Shapes" that is 
detached from the existing class structure.

Regards,
Holger

[1] http://www.epim.no/reportinghub
[2] 
https://www.posccaesar.org/svn/pub/SemanticDays/2012/Presentations/May9/10_David_Price.pdf
Received on Sunday, 3 August 2014 22:12:54 UTC