- From: Markus Krötzsch <markus.kroetzsch@cs.ox.ac.uk>
- Date: Thu, 10 Nov 2011 10:59:41 +0000
- To: Sampo Syreeni <decoy@iki.fi>
- CC: Alexandre Riazanov <alexandre.riazanov@gmail.com>, Semantic Web List <semantic-web@w3.org>
Dear Sampo: some quite interesting remarks that call for a reply .. see inline. On 09/11/11 19:15, Sampo Syreeni wrote: > On 2011-11-03, Markus Krötzsch wrote: > >> It is true that DL (and thus OWL DL) is conceptually based on >> unary/binary relations but this was hardly the historical reason for >> RDF being defined this way in the first place. >> >> However, somewhat ironically, OWL ontologies are a good example of >> pieces of data that suffer a lot from forced triplification. [...] > > I don't believe this is a theoretical obstacle to either of "triplified" > reifications of general n-ary data, or alternatively the n-ary > representations of triples. That's because I believe there is a natural > and well-defined isomorphism (perhaps a pair of functors) between the > two models, which fully retains all of the logical aspects of the model. > > If and when so, you then just prove whatever you want to on either side, > and it applies on the other by automation. After that, the question of > which data model is the best reduces to questions about programmer > productivity, maintainability of shared interfaces, and the efficiency > of actual (physical as well) implementations. Not a logical problem, but > a practical one. > > True, realizing and proving such an equivalence is a hard task, since it > obviously requires someone to restrict what is done on the triple side > to what can be done in its image on the n-ary, relational one. But > really, it shouldn't *that* hard, and once you've done the job, you > suddenly have two rather different and possibly complementary theories > which bear on the same problem: relational design principles like the > theory of normalization on the one hand, and then the hard logical > theory of description logics and the like on the other. > > Personally I'm reasonably sure that attacking the logical problems > starting from both fronts at the same time will lead to more fruitful > and easier to prove theorems in the long run. At the same time, having a > single, well-settled isomorphism between the two models in place would > grant a lot more leeway for API, storage, optimizer and like builders to > find the optimum balance e.g. between how to access the traditional > OLTP/OLAP-like databases, and the more involved, deductive ones. In my > mind, this sort of thinking leads to a clearer separation of the levels > of abstraction, the way my favourite relational model tried to do from > the start, but at the same time extends to semi-structured data. Let me first comment on this observation which I would summarise as "The information of any n-ary relational structure can be captured by a triple-based structure, and it should be no theoretical obstacle to translate between the two." In principle, this is true, and it is what is already done: there is a standard translation between the n-ary OWL data model and the triple-based RDF data model [1]. This is (one possible version of) an isomorphisms as you suggest it. So this is solved. Unfortunately, this does not really solve the underlying problem. You are fine as long as you have an OWL ontology with n-ary statements. This is easy to translate into triples and these triples can be used in place of the original by tools. As you say, this is just an implementation issue and may actually add more freedom to tool design. But not every set of triples can be translated back into OWL. Since RDF is the main exchange format on the data web, you can find a lot of OWL-like RDF documents online that do not translate back into OWL axioms. To address this, it was necessary to develop an alternative, RDF-Based Semantics for interpreting OWL. This semantics is tolerant to noise of all kinds, but there is no algorithm for finding all entailed inferences (i.e., there cannot ever be one for principled reasons). Moreover, the RDF-Based Semantics only partially agrees with the DL-bases "Direct Semantics" which is not so good for interoperability. Summing up, OWL is based on n-ary axioms that are inspired by features in Description Logics, but since these axioms are decomposed and mixed up on the Web, it is often not possible to translate them back into axioms to which DL methods would be applicable. This also affects tool interoperability on an API level, since only tools that are based on triple decompositions of axioms can be sure to process any OWL document without loosing information. If axioms were encoded as the n-ary statements that people originally entered when editing the ontologies in their editors, then many of the reasons for not being able to apply the Direct Semantics would vanish (not all, but many). > >> OWL has a native (functional style) syntax that is quite easy to >> parse, whereas its RDF serialisation requires multiple passes over the >> data to group triples that belong to the same axioms (because the >> triples that form a single OWL statement can be distributed over a >> whole file, in random order). > > Extending the above analogy of mine, that might suggest a binary RDF > serialization which groups and orders triples for more efficient > (semi-)serial computation and communication. But it certainly does not > affect the logical quality of the overall theory we're dealing with. > Thus, decoupling of different levels of abstraction: > storage/transmission/processing on the one hand, and the logical > underpinnings on the other. Yes, this would work if all data that was stored would make sense on all levels of abstraction. But on the Web, a transmission format that does not syntactically enforce that the data is meaningful on higher levels will always lead to data where this is not the case, effectively forcing all tools to work at the lowest level of representation and losing the hoped-for independence between processing and encoding. I do not claim to have a solution ready for solving this, since the interoperability issues with a relational model are not necessarily smaller (e.g., how do you enforce that a relation has a constant arity across all its uses on the Web?). Maybe the best way is to advertise a higher consciousness of data quality on the data producer side, which is the mission of the Pedantic Web group [2]. > >> So one can actually say that OWL users, while preferring to model >> information in a *semantic* world of binary relations, are not very >> well served with a *syntax* that requires n-ary statements to be >> encoded in triples which do not allow have a reasonable meaning unless >> they can be re-assembled appropriately. > > Precisely so. And this is again one thing the relational world learned > long before RDF came along: semantics and syntax should be decoupled, > but then once you start to implement stuff for real, the syntax must be > adviced by the semantics, unless we want implementations with unbounded > buffers, unnecessary sorting/merging/joining and so on. This is all > covered within the relational literature, even in the distributed DB > plus distributed DBMS setting. Thus, what *I* think we need is a > clearcut isomorphism from the triple/EAV model to the relational, n-ary > one, and then just wholesale application of knowledge via that > isomorphism in both ways. For OWL, this isomorphisms is [1] but only for a subset of RDF. For general RDB models, there are various efforts to achieve something similar, again for only a subset of RDF. Such isomorphisms do not work well in all contexts, especially not in situations where new data is created: to create a new 4-ary relation in triples, e.g., one needs to create new *individual objects*, i.e., one has to add to the active domain of the database. This can be a problem (for one thing, if you do this recursively then you don't know if it will ever stop). We have proved recently that there are Semantic Web related tasks where ternary relations (triples) are not sufficient to compute inferences, even if the inferences are triples [3] (again, this is a principled result: no algorithm [of the general kind considered in the paper] that uses only triples without inventing new individuals can ever solve this problem). Markus [1] http://www.w3.org/TR/owl2-mapping-to-rdf/ [2] http://pedantic-web.org/ [3] http://korrekt.org/page/Efficient_Rule-Based_Inferencing_for_OWL_EL -- Dr. Markus Krötzsch Department of Computer Science, University of Oxford Room 306, Parks Road, OX1 3QD Oxford, United Kingdom +44 (0)1865 283529 http://korrekt.org/
Received on Thursday, 10 November 2011 11:00:07 UTC