- From: Holger Knublauch <holger@topquadrant.com>
- Date: Mon, 04 Aug 2014 08:11:21 +1000
- To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
This email describes a use case of constraint checking and data
transformation that should be of interest to the WG. This work was
undertaken by TopQuadrant and partners including Franz, Inc. for a large
Norwegian Oil and Gas project. The resulting EPIM ReportingHub system
[1] has been in production since 2012. You can find more technical
presentations about this via Google (e.g. [2]). TopQuadrant has since
worked on two equally large and complete applications EPIM
EnvironmentHub and EPIM LogisticsHub, all delivered as SaaS with ongoing
support for years to come.
In a nutshell, the system is a server that is used by oil producers to
upload daily and monthly drilling reports in a pre-defined XML format.
The data gets checked against XSD constraints and is then converted into
a canonical RDF representation where each XML element essentially
becomes an instance of an OWL ontology derived from the XSD. Those RDF
instances are then validated against basic integrity constraints, for
example to verify that the well bores in the document correspond to a
background data (NPD Fact Pages). Those NPD Fact Pages are in a specific
named graph based on a different ontology from incoming data.
Here is an example SPIN constraint from that step:
CONSTRUCT {
_:cv a spin:ConstraintViolation ;
spin:violationRoot ?this ;
spin:violationPath ddr:nameWellbore ;
rdfs:label ?message .
}
WHERE {
?this ep-lib:nameWellbore ?wellBoreName .
BIND (rhspin:wellBoreByName(?wellBoreName) AS ?wellBore) .
FILTER (!bound(?wellBore)) .
BIND (fn:concat("[RH-19] Fact Pages validation of the XML file
failed with the following error: Unregistered well bore name ",
?wellBoreName) AS ?message) .
}
The query above makes use of SPIN functions such as
rhspin:wellBoreByName - essentially stored procedures. If such a SPIN
function is called, a nested SELECT query is executed. This makes the
resulting queries much more readable and modular than having to repeat
the same logic again and again.
Once the first step of constraint checking passes, SPIN is used to
translate from the canonical RDF representation (derived from the XML)
into the target ontology used to store the drilling reports in a triple
store. spin:rule is used to attach those transformation queries to the
source classes. Here is an example rule, again a SPARQL CONSTRUCT. There
are hundreds of rules with similar complexity.
# STEP 116 Transfer presTestType
CONSTRUCT {
?dailyDrillingActivityToStatus ep-core:hasPart _:b0 .
_:b0 a ?pressureTestType .
}
WHERE {
?this ep-spin-lib:nameWellbore ?nameWellBore .
?this ddr:dTimStart ?dTimStart .
?this ddr:statusInfoRef ?statusInfo .
?statusInfo ddr:dTim ?dTim .
?statusInfo ddr:presTestTypeRef ?presTestType .
BIND (ep-spin-lib:selectPressureTestType(?presTestType) AS
?pressureTestType) .
BIND (ep-spin-lib:normalizeString(?nameWellBore) AS
?normalizedWellBoreName) .
BIND
(ep-spin-lib:buildDailyDrillingActivityToStatusURI(?normalizedWellBoreName,
?dTimStart, ?dTim) AS ?dailyDrillingActivityToStatus) .
}
At this stage there is a new RDF graph for the uploaded report, and this
graph is validated against another set of SPIN constraints, such as
CONSTRUCT {
_:cv a spin:ConstraintViolation ;
spin:violationRoot ?this ;
rdfs:label ?message .
}
WHERE {
?this
(ep-report:reportOn/ep-activity:onWellBore)/ep-core:temporalPartOf
?wellBore .
FILTER (!rhspin:currentUserIsOperatorOfWellBore(?wellBore)) .
BIND (COALESCE(rhspin:npdName(?wellBore), "Unknown well bore") AS
?wellBoreName) .
BIND (COALESCE(rhspin:npdName(?licence), "Unknown licence") AS
?licenceName) .
BIND (rhspin:companyName() AS ?companyName) .
BIND (fn:concat("[RH-11] Your company (", ?companyName, ") is not
the operator of the BAA or licence associated with well bore ",
?wellBoreName) AS ?message) .
}
I would like to highlight a few points:
- The ontologies were mostly custom-written for this project. However,
they are the basis for a new ISO 15926-12 standard for life cycle data
integration using OWL.
- The SPIN files above and some ontologies are entirely private to the
specific application, not part of "The Semantic Web".
- However, other components and applications use these ontologies
internally, for example to generate reports documents that aggregate
data from other uploaded reports.
- OWL is used as the syntax for those ontologies, mostly to describe
range and cardinality restrictions on properties. OWL is not used for
inferencing. The only type of "inferencing" are SPIN rules that convert
from one ontology to another.
- The complexity of the constraint checks and transformation rules
required something as expressive as SPARQL.
- In order to maintain those queries, we made heavy use of SPIN
functions that encapsulate reusable blocks of SPARQL code.
- It was natural to associate SPIN constraints and rules with the
ontology, i.e. we did not need a parallel structure of "Shapes" that is
detached from the existing class structure.
Regards,
Holger
[1] http://www.epim.no/reportinghub
[2]
https://www.posccaesar.org/svn/pub/SemanticDays/2012/Presentations/May9/10_David_Price.pdf
Received on Sunday, 3 August 2014 22:12:54 UTC