- From: Sampo Syreeni <decoy@iki.fi>
- Date: Sat, 26 Nov 2011 01:45:10 +0200 (EET)
- To: Frank Manola <fmanola@acm.org>
- cc: Alexandre Riazanov <alexandre.riazanov@gmail.com>, Semantic Web List <semantic-web@w3.org>
On 2011-11-03, Frank Manola wrote: > From my point of view, a major reason for focusing on unary and binary > predicates (the logical forms that underlie RDF triples) is that it's > easier to deal with the problems of integrating heterogeneous data (a > key issue in the semantic web) if the data is in (or is mapped to > being in) that form, as opposed to data in arbitrary arity relations > [...] (I'm pretty sure RDF doesn't much mind about unaries.) >From my angle n-ary relations is not much of a problem. It's just that it's easier to do propositional and predicate logic with (implicitly) variable arity relations. (The arity then comes from the number of preceding quantifiers.) With no type theory in between, you don't really see the difference between subject-predicate-object relations, and the ones where you have an n-ary relations between a (sometimes composite) key and a number of functionally related attributes (i.e. a potential ton of functionally dependent ones), that's precisely what happens in RDF as well. The only real difference is that RDF insists upon having a single field key, which is then owl:equal to the composite key. And when a blank node, then hey, it's suddenly a surrogate key in RM terms. > (for example, with n-aries you need a schema to interpret any tuples > you encounter "in the wild", otherwise you don't know what the > "columns" mean). True. RM in its basic form cannot handle either of incomplete knowledge or de novo, unexpected schema. Because of that, it tried to handle both of those in its later versions, in its own way. Incomplete schema by way of nulls (which have been reinterpreted successfully as "relation marks"), and even earlier by exposing the schema as well as relations, so as to make relational manipulation of the schema possible as well. Obviously no RDBMS makes it natural to implement all of the data models RDF admits. Even the underlying relational model does not, I grant you that. But then on the other hand, RM was designed from the start to be something like description logics: tractable at the cost of not being fully universal. In the case of DL the governing principle was logical consistency for reasoning systems. In the case of RM it was practical realizability of 1970's transactional production databases, coder productivity on top of them as against the CODASYL mess, and in all a limitation to "something we can already solve". RM started out as a very pragmatically minded business, even if it took its while to come into fruition, you know. > one of the major approaches to developing the mappings between the > various relational schemas was by interpreting the various local > schemas in terms of unary and binary relations for just this reason > [...] Been there, done that. And you know, at the time, I was rather amazed to discover that Ora Lassila, as one of my own firm, had been influential in getting the first RDF draft through. That was in fact before I had really delved into relational theory myself. > (compound keys had to be dealt with in this way too, because the same > combinations of columns didn't necessarily constitute the keys in > otherwise corresponding relations in the different local schemas). My thinking about both single and compound keys is that, actually, you need object identifiers to make them come together. This is full-blown heresy for a relational guy like me. But still, hear me out. A key is a key, no? It's supposed to be used as a unit, always and everywhere. Thus, the domain of keys in all is fully isomorphic to the domain of object identifiers from the ORM side of the picture. It's just an identity which can be used wherever and whenever to refer to an object. The relational model then has its counterpart to this kind of a thing: the surrogate key. RM says you're never supposed to relinquish the internal value of a surrogate key to the user of a database. Just as you wouldn't divulge the physical address of a blank RDF node to anybody, ever, over any interface. At most you'd let them operate on them using a query language, and by extension, you'd make *darn* sure that even if you broke the separation of concerns/protocol layers by passing on something lower-minded, you'd never ever take responsibility for what happens if people started relying on their embedded semantics. And so on. I hope you can start so see how similar RM and RDF actually are. The only real differences happen because of a) fixed versus variable arity (just make each RDF attribute an object of the key as subject, with predicate naming the column), b) the local versus global naming of the objects (this is where RM falls short and RDF reigns supreme; so why not name things within the RM by URIs as well?), and c) RDF can talk anything about anything. The last part is a bit tricky. In my mind it tells me that even within RM, every single "row" of a relation ought to have a synonym of sorts, which attaches to one or more of its keys. Sort of like an OID, or a surrogate, synthetic key, which maps 1-1 to all of the relation's inborn keys. I mean, in that way, it's not too difficult to see how URIs can variously be attached or not to every single relation one can imagine. In full accordance with RDF, on the other hand. Why we can have a full isomorphism between the triple/RDF/EAV model and the relational model, on two different hands. Do give me a counter-example or minimal pair, since I'm prolly making myself too clear right now. > Mind you, if you're NOT worried about integrating heterogeneous data, > RDF introduces extra pain of its own (figuring out all those > identifiers, for one thing), but if you ARE worried about integrating > heterogenous data, I think you want those identifiers around. To amplify, we *definitely* want those identifiers around. My above analysis was mostly about how RM and RDF could be fit together in the most natural way. It didn't say anything about the value of publicly shared keys, which is what I think RDF's URIs are. And what RM sadly lacks. > I don't quite understand your argument. Indeed, interoperability is > the target. Syntactic interoperability is not a problem as long as you > use the same or convertible syntaxes. I'm something of a pragmatist myself. I wouldn't want to have to parse RDF/XML myself, for example. Even NTriples. I'd rather like a parser which doesn't have to deal with whitespace, even. Something that could be proven to be correct, even, which is very, very hard if there are any alternatives in the syntax of the parsed language/protocol. > Semantic interoperability requires shared understanding of the > identifiers being used, which has nothing to do with arity. There we're in full mutual understanding. Thus I'll refrain from commentin on that point from now on. > Reinterpreting legacy relational schemas is a related, but separate > issue. Binary predicates are often handy to represent attributes, but > it does not mean n-ary predicates cannot be helpful in the same > (although I could not recall a real example) and other KR tasks. At the same time, I've painted the triple/EAV/RDF representation of n-ary relations as a sort of a reification already. From the mathematical logic point of view, that is far from nontrivial. And I sort of think many of the W3C standards around RDF have become much too difficult for common consumption by programmers, precisely because of this divide. That's not a happy circumstance to me, because I'd like the Semantic Web "just fucking work". So to speak. > The original question (I thought) was why there weren't relational > approaches applied in Semantic-Web-like contexts (where, as you say, > interoperability is the target). Yes, I think so. Though don't ever think it's the only point (I don't think your point would be anymore simplistic.) > I cited the integration of heterogeneous relational databases to argue > that, in this case, where relations were already being used by all > parties, and interoperability was the target, those doing the > integration found that using unaries and binaries helped [...] Actually that is then one case where relational theory long preceded RDF, despite its lack of shared identifiers. (Which, mind me, would have made the effort so much easier.) There is an entire literature of relational model mapping, which eventually landed at restricted second order logic, as the model for that sort of thing which eventually closes upon itself. (And no, most of the dependency or query theories don't close as neatly. Template dependencies seem to, but then there's no efficient realization of them. Unlike indices for inclusion dependencies, in foreign keys... My eventual point is that RM theory is well-advanced, as is that of description logics the like. I'd like to see some cross-fertilization instead of mutual bickering, for a change.) > (I agree that shared understanding of the identifiers is necessarily > for semantic interoperability, but in RDF+OWL, at least the > identifiers are *there*; hose putting the data on the Web had to > create them). Fully agreed. That is one of the novel innovations of the distributed semantic web. TimBL would prolly agree, as per his early writings on the topic. > All that RDF is doing is starting from the unaries and binaries. Where's the unary, by the way? I'm only seeing ternaries, and not even binaries. What binaries there are, are named ones, and thus actually ternaries. For example, it's very difficult to express in RDF one of the basic ideas presented in Codd's RM/T. That is, the unary statement that "we have this thing called <x>". Without at the same time saying anything more about it, like "<x> is also <class-wise==y> related to the class <z>". If you look at RM/T, or even RM/V2, you can see definite connections towards object-relational-mapping. Which I've adopted fullsale in my own relational modelling discipline. And then some. (I've actually constructed counter-examples in working relational databases towards Codd's original vision.) But even when I can work RDF at the same time, many of the natural constructions I use in that sort of work, don't seem to translate naturally into the ternary which RDF relies upon. > Nor is it an argument that you can't do semantic integration using > n-ary relations. I again repeat: certainly not. On the contrary I think there is a natural isomorphism between the two models, there. It's just that, somehow, people don't see it too clearly, and could then perhaps benefit from reading more closely both the underlying theory of RDF/EAV/DL, and RL/FOPL as well. > I think it's *easier* to do that integration with the RDF approach, > [...] It absolutely is. But then it's also much less efficient and clean to do something with the integrated data. Unless you turn it back into RM. If you do, we're in perfect harmony. If you don't, I'll bet you or the fellow who inherits your integration framework will be in a world of hurt. :) For the most part I've used EAV/CR as an example of a middle-ground. Because one of the papers having to do with it seemed to say: "keep the funky stuff as EAV, put the rest in proper relational tables, and then use metadata and a middleware solution to distinguish between the two". Now I'm not so sure whether they go with that sort of a sensible solution anymore. > There have certainly been attempts to provide more general KRs > (allowing n-ary predicates) for data/knowledge exchange; [...] N-ary is hard. So then let's do what mathematicians do so well: let's define a full isomorphism between n-ary and what we already know how to handle in DL, so as to reduce (at least part of) the problem to what we already know. That will reveal many aspects of RM as well to be at a reified level which escapes the formalism, as well. So be it. But let's at least do that, no? > Perhaps someone with more experience with those languages can chip in > here (Pat?) and cite their experiences in using them to integrate > large amounts of data, [...] Absolutely. Though now I have three separate Pat's in mind. One whom I've bumped heads against in the past. The one you prolly refer to. And then the sadomasochist, gender changing guru who I've truly never had the privilege of meeting even online. ;) > This isn't a pragmatic vs. theoretical issue, it's a question of what > problem you're trying to solve. DF is based on the open world > assumption because it's designed with the Web in mind, and the Web, > unlike a relational database, is open. My real problem is that it could be based on both. Via explicit metadata. It could be just a syntax for expressing, among other things, that certain things are to be taken with an open and other ones with a closed syntax. Yet it's fixed as being open, no matter what. That then means that certain kinds of relational databases (my favourite; my job) can't be expressed in it at all. Or if I'm mistaken, how do you express the kinds of closed world semantics I usually work with, in RDF? > I don't have a problem with the OWA in general. The problem is the OWA > is there even when you don't want it, specifically when you want to be > able to specify a piece of data completely and unambiguously. With > OWA, you cannot compute the length of a list because somebody else can > redefine the list somewhere. Yes. Though, perhaps it's just suitable that your assertion of a closed world within the syntax might not be believed. Or might, then. There perhaps the biggest problem is that the trust portion of the layer pie hasn't developed as rapidly as it could/should have. In sum, I think we're on the same tracks. On more than one topic. Me likes. -- Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Friday, 25 November 2011 23:45:42 UTC