- From: Sampo Syreeni <decoy@iki.fi>
- Date: Wed, 9 Nov 2011 21:15:31 +0200 (EET)
- To: Markus Krötzsch <markus.kroetzsch@cs.ox.ac.uk>
- cc: Alexandre Riazanov <alexandre.riazanov@gmail.com>, Semantic Web List <semantic-web@w3.org>
- Message-ID: <Pine.LNX.4.64.1111092048230.18966@lakka.kapsi.fi>
On 2011-11-03, Markus Krötzsch wrote: > It is true that DL (and thus OWL DL) is conceptually based on > unary/binary relations but this was hardly the historical reason for > RDF being defined this way in the first place. > > However, somewhat ironically, OWL ontologies are a good example of > pieces of data that suffer a lot from forced triplification. [...] I don't believe this is a theoretical obstacle to either of "triplified" reifications of general n-ary data, or alternatively the n-ary representations of triples. That's because I believe there is a natural and well-defined isomorphism (perhaps a pair of functors) between the two models, which fully retains all of the logical aspects of the model. If and when so, you then just prove whatever you want to on either side, and it applies on the other by automation. After that, the question of which data model is the best reduces to questions about programmer productivity, maintainability of shared interfaces, and the efficiency of actual (physical as well) implementations. Not a logical problem, but a practical one. True, realizing and proving such an equivalence is a hard task, since it obviously requires someone to restrict what is done on the triple side to what can be done in its image on the n-ary, relational one. But really, it shouldn't *that* hard, and once you've done the job, you suddenly have two rather different and possibly complementary theories which bear on the same problem: relational design principles like the theory of normalization on the one hand, and then the hard logical theory of description logics and the like on the other. Personally I'm reasonably sure that attacking the logical problems starting from both fronts at the same time will lead to more fruitful and easier to prove theorems in the long run. At the same time, having a single, well-settled isomorphism between the two models in place would grant a lot more leeway for API, storage, optimizer and like builders to find the optimum balance e.g. between how to access the traditional OLTP/OLAP-like databases, and the more involved, deductive ones. In my mind, this sort of thinking leads to a clearer separation of the levels of abstraction, the way my favourite relational model tried to do from the start, but at the same time extends to semi-structured data. > OWL has a native (functional style) syntax that is quite easy to > parse, whereas its RDF serialisation requires multiple passes over the > data to group triples that belong to the same axioms (because the > triples that form a single OWL statement can be distributed over a > whole file, in random order). Extending the above analogy of mine, that might suggest a binary RDF serialization which groups and orders triples for more efficient (semi-)serial computation and communication. But it certainly does not affect the logical quality of the overall theory we're dealing with. Thus, decoupling of different levels of abstraction: storage/transmission/processing on the one hand, and the logical underpinnings on the other. > So one can actually say that OWL users, while preferring to model > information in a *semantic* world of binary relations, are not very > well served with a *syntax* that requires n-ary statements to be > encoded in triples which do not allow have a reasonable meaning unless > they can be re-assembled appropriately. Precisely so. And this is again one thing the relational world learned long before RDF came along: semantics and syntax should be decoupled, but then once you start to implement stuff for real, the syntax must be adviced by the semantics, unless we want implementations with unbounded buffers, unnecessary sorting/merging/joining and so on. This is all covered within the relational literature, even in the distributed DB plus distributed DBMS setting. Thus, what *I* think we need is a clearcut isomorphism from the triple/EAV model to the relational, n-ary one, and then just wholesale application of knowledge via that isomorphism in both ways. -- Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Wednesday, 9 November 2011 19:16:00 UTC