- From: Michael Schneider <m_schnei@gmx.de>
- Date: Tue, 03 Dec 2013 00:20:34 +0100
- To: public-rdf-comments@w3.org
Dear Working Group, please find below my implementation report for my experimental Swertia RDF-Based Reasoner, a system that tries to be a close implementation of the model-theoretic semantics of RDF (unlike the many existing systems that are more based on the RDF entailment rules). I still wasn't able to run the official RDF 1.1 tests, due to lack of time. I also believe that the result for the test suite will not become very good, as many of the tests are about datatype reasoning, which is not supported by my system. Anyway, I still plan to run the tests, as soon I find the time, and also plan to provide the results and the prototypical system, but for now I provide you with my implementation experiences only. I hope this will already be useful for the Working Group. Best regards, Michael = RDF 1.1 Semantics Implementation Report Swertia = Swertia [1], the Semantic Web Entailment Regime Translation and Inference Architecture, is intended to become a generic Semantic Web reasoning framework. The goal is to provide reasoning support for all major Semantic Web reasoning standards, including RDF(S), OWL 2 (Direct Semantics, RDF-Based Semantics, RL/RDF rules), SWRL, RIF (RIF BLD, RIF Core, RIF+RDF and RIF+OWL combinations), and Common Logic. Supported reasoning methods are entailment checking, consistency checking and query answering in the form of SPARQL entailment regimes. Internally, Swertia will not provide any reasoning capabilities itself but will provide all necessary means to enable the use of existing reasoners, such as first-order logic (FOL) theorem provers and model finders, to perform reasoning in the supported Semantic Web standards. The framework itself is still in an early phase and no initial release has been published. However, as part of Swertia, a prototypical reasoner implementation for reasoning in the RDF 1.0 semantics and the OWL 2 RDF-based semantics has existed for a while now. While not in wide use, the reasoner worked quite well for the experimental purposes of the author, and had been tested successfully with a comprehensive test suite for RDF-based reasoning [2], and used for evaluation work in a published paper [3]. For the RDF 1.1 Semantics, an attempt was made to adapt the existing reasoner into a system that supports as much of RDF 1.1 as possible. == Overview of the Swertia Reasoner == The reasoner is mainly a translator of RDF graphs into FOL formulae represented in the TPTP language [4], which is understood (directly or indirectly via translation tools) by the majority of existing FOL reasoning systems. The translator itself only translates the input RDF graphs (the premise and possibly a conjecture graph of an entailment checking task) into corresponding axiom and conjecture TPTP formulae, following RDF Simple semantics: IRIs are translated into constant terms, blanknodes into existential variable terms, literals into function terms (with different functions for plain, language-tagged and typed literals), triples into ternary predicates, and graphs into conjunctions of such predicates, with globally scoped existential quantifiers for all the blank nodes occurring in the graph. The semantics for the different entailment regimes are not treated by the RDF translator itself but, rather, the corresponding semantic conditions are directly modeled as sets of FOL axiom formulae (usually one formula per semantic condition). For reasoning, the axiom formulae that represent the semantic conditions of the respective entailment regime are combined with the formulae for the translated input graphs graphs and given to an FOL theorem prover for entailment or inconsistency detection and a FOL model finder for non-entailment or consistency detection. The final reasoning result is the combination of the result of the two systems. == Support for Basic Semantic Conditions == By "basic semantic conditions", I refer to all the semantic conditions that are not specifically about blank nodes, plain or tagged strings, or datatypes and typed literals (I will get to these aspects of the RDF 1.1 Semantics below), but including all the axiomatic triples for RDF and RDFS. For RDF 1.0 and OWL 2 Full, the Swertia RDF translator itself did not have any particular support for the basic semantic conditions. Rather, all these semantic conditions were represented by FOL formulae. In the past, all the basic semantic conditions of the RDF 1.0 entailment regimes Simple, RDF, and RDFS were easily translated into FOL formulae. For RDF 1.1, I went through all the semantic conditions of the new entailment regimes to see what needed to be changed. I found that hardly anything had changed from the point of view of semantic conditions, except for the order of entailment regimes. In fact, for RDF 1.1, all the original basic semantic conditions turned out to be there again. Hence, I was able to reuse all the original FOL formulae for the basic semantic conditions from the old implementation without change. == Support for Blank Nodes == For RDF 1.0 and OWL 2 Full, the Swertia RDF translator maps blank nodes into existential variables that apply to the whole target FOL formula. For this, the translator iterated the input RDF graph, looked up all the occurring blank nodes, and produced a fresh FOL variable name for each new blank node, while for blank nodes that re-appeared in different positions of the graph, the corresponding FOL variable name was reused. This operation was technically easy to implement, takes at most n*log(n) time, for n the graph size (if, for example, a balanced tree representation is used), and requires up to linear-size space for the resulting mapping structure (which needs to be kept throughout the translation process). For RDF 1.1, nothing relevant had changed wrt. blank nodes that would have required a change of this treatment. Hence, there were no additional or new implementation issues compared to the old RDF revision, so the implementation was doable without problems for RDF 1.1. == Support for Plain and Language-Tagged Strings == The original Swertia RDF translator came with specific support for plain and language-tagged literals in the translation output format TPTP. As the translator's input, the Model representation of the Jena framework [5] was used, which essentially provides an implementation of the RDF 1.0 Abstract Syntax. In particular, Jena Model's have direct support for plain and language-tagged literals. For both kinds of plain literals, dedicated FOL function terms have been used in the translation: Plain literals were represented by unary function terms with the literal's lexical form represented by a constant term uniquely encoding the string. Language-tagged literals were represented by binary function terms, where the first argument term was represented like that for plain literals, and the second argument term was a corresponding representation of the language tag as a constant term. For RDF 1.1, it was an obvious idea to use the same FOL functions for representing strings and language-tagged strings in the FOL output, because their interpretations (or values) are the same as those of RDF 1.0 plain and language-tagged literals, respectively. However, I was unclear what to expect from the input format for the translation, specifically in the case of language-tagged strings. I understand that concrete RDF serialization syntaxes are free to represent language tagged strings as they like (including the old tagged plain literal format). What I do not understand is how they are represented in the abstract RDF 1.1 model. Afterall, if I use a framework like Jena, I have to rely on the parsing from the concrete syntax into the internal representation model, and I am unclear what will happen for language-tagged literals. If Jena parses them into the old representation for language-tagged literals, than nothing would need to be changed in my implementation. However, if they are mapped into something else, I would need to do a change to my translator software as well. According to ยง3.3 of the "Concents and Abstract Syntax" document, "a literal is a language-tagged string if and only if its datatype IRI is rdf:langString, and only in this case the third element is present:..." I am not sure if I really understand this. So far, my guess was that a language-tagged string would be a typed literal, where the lexical form is composed of the "plain" lexical form", the "@" sign, and then the language tag, i.e.: ( "foo@en" , rdf:langString ) But the above definition sounds to me more as if a language-tagged string is a /triple/ consisting of (1) the lexical form /without/ the language tag; and (2) the language tag; and (3) the datatype IRI rdf:langString, i.e. ( "foo", "en", rdf:langString ) It would be good to clarify the situation to make it easier for implementers to decide how to support language-tagged strings. I, for now, decided to stick with the original implementation, which mapped Jena representations of language-tagged literals into binary function terms. Therefore, no changes have been made so far. == Support for Datatypes and Typed Literals == The most obvious deficit of my original translator was its almost complete lack of support for datatype semantics, as support for datatypes has not yet been of much relevance for my work. Nevertheless, there have always be plans to support some level of datatype reasoning, and some initial ideas have been developed. Definitely, I want to support datatypes in the future, because without datatype support at least for rudimentary types like integer numbers, the system, while appropriate for some experimental work, will not be of much practical usefulness. For RDF 1.1, given the short time for the Call-for-Implementation phase, I have not undergone any effort to support datatypes in the RDF 1.1 implementation. But at least I have checked for changes in the RDF 1.1 specification concerning datatypes that would have an effect on datatype reasoning, in order to be sure that I will not meet problems in the future that would have been avoidable. This is not only relevant for RDF 1.1, which provides pretty rudimentary datatype semantics, but also for expressive semantic extensions, such as OWL 2 Full. The obvious way to start was to compare the original RDF 1.0 semantics with the new semantics w.r.t. datatypes. If there would be no or only marginal changes, this would mean that if an implementation would work for RDF 1.0, there should not be too many surprises with an implementation for RDF 1.1. Or put differently: any big problems with the RDF 1.1 semantics would have already be problems for RDF 1.0. Comparing the two semantics, it became clear that, apart from some reordering of the semantic conditions due to the reordering of the entailment regimes, the semantics remained technically almost identical: essentially the same semantic conditions that were present in the old specification in chapter 5 of datatypes were again present in the new spec, although spread over different places. The only problem that I found was with the new notion of "identified datatypes": In the original spec, the notion of a datatype map was that of a set of pairs, which stated associations between URIs and the corresponding datatypes. So, for example, if a semantic extension of RDF 1.0 D-entailment was meant to include the xsd:integer datatype, one was able to state that the datatype map D contained the pair consisting of the URI "xsd:integer" with the particular datatype of integers as defined in the XSD Datatypes spec. For implementations, this would make the situation sufficiently clear. In RDF 1.1, we only get a set of datatype IRIs, and the actual association with concrete datatypes is not directly supported. So an implementation of a particular semantic extension of RDF 1.1 needs to somehow find out what the associations are. Of course, a definition of a particular semantic extension would tell the identified datatypes for the identifying IRIs in SOME way, but in any case if WILL have to say what the association is, otherwise it would be impossible for an implementation to ever become compliant. In other words, there must _always_ be such an association in order to be useful, because just a set of IRIs can be interpreted in any arbitrary way. Therefore, the RDF spec should, as it did in the past, and as several other W3C standards on top of it such as RIF, SPARQL 1.1, and OWL 2, do, support this idea directly in terms of a set of associations, not only as a set of IRIs alone! == Conclusions == For the most part, the adaptation of the existing RDF translator was straight-forward and little was to be done. There was some confusion about the representation of language-tagged strings, specifically what their real representation is in the RDF 1.1 abstract syntax. The specification should be clearer about this. As the original RDF translator did not offer explicit support for datatype semantics, and there was only very little time given by the CfI, I decided not to do any implementation effort for datatype semantics, and only have a look what /would/ have to change if I had datatype support. It turned out that technically the semantics has not changed much. However, one problem (not so much for RDF(S), but for more expressive systems with more datatypes) would in my opinion be that the RDF 1.1 semantics does not support stating explicit associations between datatype IRIs and the correspondingdatatypes, but leaves it to other specifications to find a way to specify these relationships. I consider this to be a problem, and it is definitely a deviation from the original RDF specification, that should not be done. == References == [1] Swertia Home: http://swertia.sourceforge.net/ (doesn not contain any sources or binaries currently) [2] Schneider, M., Mainzer, K.: A Conformance Test Suite for the OWL 2 RL/RDF Rules Language and the OWL 2 RDF-Based Semantics. In: Proceedings of the 6th International Workshop on OWL: Experiences and Directions (OWLED 2009). CEUR Workshop Proceedings, vol. 529 (2009) [3] Michael Schneider and Geoff Sutcliffe: Reasoning in the OWL 2 Full Ontology Language using First-Order Automated Theorem Proving. In: Proceedings of the 23rd International Conference on Automated Deduction (CADE 2011), pp. 446-460, LNAI 6803 (2011). [4] TPTP Home and language specification: http://tptp.org/ [5] Jena Home: http://jena.apache.org/
Received on Monday, 2 December 2013 23:21:07 UTC