- From: Holger Knublauch <holger@topquadrant.com>
- Date: Mon, 04 Aug 2014 08:11:21 +1000
- To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
This email describes a use case of constraint checking and data transformation that should be of interest to the WG. This work was undertaken by TopQuadrant and partners including Franz, Inc. for a large Norwegian Oil and Gas project. The resulting EPIM ReportingHub system [1] has been in production since 2012. You can find more technical presentations about this via Google (e.g. [2]). TopQuadrant has since worked on two equally large and complete applications EPIM EnvironmentHub and EPIM LogisticsHub, all delivered as SaaS with ongoing support for years to come. In a nutshell, the system is a server that is used by oil producers to upload daily and monthly drilling reports in a pre-defined XML format. The data gets checked against XSD constraints and is then converted into a canonical RDF representation where each XML element essentially becomes an instance of an OWL ontology derived from the XSD. Those RDF instances are then validated against basic integrity constraints, for example to verify that the well bores in the document correspond to a background data (NPD Fact Pages). Those NPD Fact Pages are in a specific named graph based on a different ontology from incoming data. Here is an example SPIN constraint from that step: CONSTRUCT { _:cv a spin:ConstraintViolation ; spin:violationRoot ?this ; spin:violationPath ddr:nameWellbore ; rdfs:label ?message . } WHERE { ?this ep-lib:nameWellbore ?wellBoreName . BIND (rhspin:wellBoreByName(?wellBoreName) AS ?wellBore) . FILTER (!bound(?wellBore)) . BIND (fn:concat("[RH-19] Fact Pages validation of the XML file failed with the following error: Unregistered well bore name ", ?wellBoreName) AS ?message) . } The query above makes use of SPIN functions such as rhspin:wellBoreByName - essentially stored procedures. If such a SPIN function is called, a nested SELECT query is executed. This makes the resulting queries much more readable and modular than having to repeat the same logic again and again. Once the first step of constraint checking passes, SPIN is used to translate from the canonical RDF representation (derived from the XML) into the target ontology used to store the drilling reports in a triple store. spin:rule is used to attach those transformation queries to the source classes. Here is an example rule, again a SPARQL CONSTRUCT. There are hundreds of rules with similar complexity. # STEP 116 Transfer presTestType CONSTRUCT { ?dailyDrillingActivityToStatus ep-core:hasPart _:b0 . _:b0 a ?pressureTestType . } WHERE { ?this ep-spin-lib:nameWellbore ?nameWellBore . ?this ddr:dTimStart ?dTimStart . ?this ddr:statusInfoRef ?statusInfo . ?statusInfo ddr:dTim ?dTim . ?statusInfo ddr:presTestTypeRef ?presTestType . BIND (ep-spin-lib:selectPressureTestType(?presTestType) AS ?pressureTestType) . BIND (ep-spin-lib:normalizeString(?nameWellBore) AS ?normalizedWellBoreName) . BIND (ep-spin-lib:buildDailyDrillingActivityToStatusURI(?normalizedWellBoreName, ?dTimStart, ?dTim) AS ?dailyDrillingActivityToStatus) . } At this stage there is a new RDF graph for the uploaded report, and this graph is validated against another set of SPIN constraints, such as CONSTRUCT { _:cv a spin:ConstraintViolation ; spin:violationRoot ?this ; rdfs:label ?message . } WHERE { ?this (ep-report:reportOn/ep-activity:onWellBore)/ep-core:temporalPartOf ?wellBore . FILTER (!rhspin:currentUserIsOperatorOfWellBore(?wellBore)) . BIND (COALESCE(rhspin:npdName(?wellBore), "Unknown well bore") AS ?wellBoreName) . BIND (COALESCE(rhspin:npdName(?licence), "Unknown licence") AS ?licenceName) . BIND (rhspin:companyName() AS ?companyName) . BIND (fn:concat("[RH-11] Your company (", ?companyName, ") is not the operator of the BAA or licence associated with well bore ", ?wellBoreName) AS ?message) . } I would like to highlight a few points: - The ontologies were mostly custom-written for this project. However, they are the basis for a new ISO 15926-12 standard for life cycle data integration using OWL. - The SPIN files above and some ontologies are entirely private to the specific application, not part of "The Semantic Web". - However, other components and applications use these ontologies internally, for example to generate reports documents that aggregate data from other uploaded reports. - OWL is used as the syntax for those ontologies, mostly to describe range and cardinality restrictions on properties. OWL is not used for inferencing. The only type of "inferencing" are SPIN rules that convert from one ontology to another. - The complexity of the constraint checks and transformation rules required something as expressive as SPARQL. - In order to maintain those queries, we made heavy use of SPIN functions that encapsulate reusable blocks of SPARQL code. - It was natural to associate SPIN constraints and rules with the ontology, i.e. we did not need a parallel structure of "Shapes" that is detached from the existing class structure. Regards, Holger [1] http://www.epim.no/reportinghub [2] https://www.posccaesar.org/svn/pub/SemanticDays/2012/Presentations/May9/10_David_Price.pdf
Received on Sunday, 3 August 2014 22:12:54 UTC