Re: relational data as a bona fide member of the SM from Sampo Syreeni on 2011-11-09 (semantic-web@w3.org from November 2011)

From: Sampo Syreeni <decoy@iki.fi>
Date: Wed, 9 Nov 2011 21:15:31 +0200 (EET)
To: Markus Krötzsch <markus.kroetzsch@cs.ox.ac.uk>
cc: Alexandre Riazanov <alexandre.riazanov@gmail.com>, Semantic Web List <semantic-web@w3.org>
Message-ID: <Pine.LNX.4.64.1111092048230.18966@lakka.kapsi.fi>
On 2011-11-03, Markus Krötzsch wrote:

> It is true that DL (and thus OWL DL) is conceptually based on 
> unary/binary relations but this was hardly the historical reason for 
> RDF being defined this way in the first place.
>
> However, somewhat ironically, OWL ontologies are a good example of 
> pieces of data that suffer a lot from forced triplification. [...]

I don't believe this is a theoretical obstacle to either of "triplified" 
reifications of general n-ary data, or alternatively the n-ary 
representations of triples. That's because I believe there is a natural 
and well-defined isomorphism (perhaps a pair of functors) between the 
two models, which fully retains all of the logical aspects of the model.

If and when so, you then just prove whatever you want to on either side, 
and it applies on the other by automation. After that, the question of 
which data model is the best reduces to questions about programmer 
productivity, maintainability of shared interfaces, and the efficiency 
of actual (physical as well) implementations. Not a logical problem, but 
a practical one.

True, realizing and proving such an equivalence is a hard task, since it 
obviously requires someone to restrict what is done on the triple side 
to what can be done in its image on the n-ary, relational one. But 
really, it shouldn't *that* hard, and once you've done the job, you 
suddenly have two rather different and possibly complementary theories 
which bear on the same problem: relational design principles like the 
theory of normalization on the one hand, and then the hard logical 
theory of description logics and the like on the other.

Personally I'm reasonably sure that attacking the logical problems 
starting from both fronts at the same time will lead to more fruitful 
and easier to prove theorems in the long run. At the same time, having a 
single, well-settled isomorphism between the two models in place would 
grant a lot more leeway for API, storage, optimizer and like builders to 
find the optimum balance e.g. between how to access the traditional 
OLTP/OLAP-like databases, and the more involved, deductive ones. In my 
mind, this sort of thinking leads to a clearer separation of the levels 
of abstraction, the way my favourite relational model tried to do from 
the start, but at the same time extends to semi-structured data.

> OWL has a native (functional style) syntax that is quite easy to 
> parse, whereas its RDF serialisation requires multiple passes over the 
> data to group triples that belong to the same axioms (because the 
> triples that form a single OWL statement can be distributed over a 
> whole file, in random order).

Extending the above analogy of mine, that might suggest a binary RDF 
serialization which groups and orders triples for more efficient 
(semi-)serial computation and communication. But it certainly does not 
affect the logical quality of the overall theory we're dealing with. 
Thus, decoupling of different levels of abstraction: 
storage/transmission/processing on the one hand, and the logical 
underpinnings on the other.

> So one can actually say that OWL users, while preferring to model 
> information in a *semantic* world of binary relations, are not very 
> well served with a *syntax* that requires n-ary statements to be 
> encoded in triples which do not allow have a reasonable meaning unless 
> they can be re-assembled appropriately.

Precisely so. And this is again one thing the relational world learned 
long before RDF came along: semantics and syntax should be decoupled, 
but then once you start to implement stuff for real, the syntax must be 
adviced by the semantics, unless we want implementations with unbounded 
buffers, unnecessary sorting/merging/joining and so on. This is all 
covered within the relational literature, even in the distributed DB 
plus distributed DBMS setting. Thus, what *I* think we need is a 
clearcut isomorphism from the triple/EAV model to the relational, n-ary 
one, and then just wholesale application of knowledge via that 
isomorphism in both ways.
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Wednesday, 9 November 2011 19:16:00 UTC