- From: Sampo Syreeni <decoy@iki.fi>
- Date: Tue, 28 Jul 2020 21:44:06 +0300 (EEST)
- To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
- cc: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, semantic-web@w3.org
- Message-ID: <alpine.DEB.2.21.2007282104100.19446@lakka.kapsi.fi>
On 2020-07-21, Antoine Zimmermann wrote: >> I was puzzled by the relationship between the general datatype and >> the specific datatypes. [...] It's been a while, and I have my Issues with RDF, and whatnot, but... Blank nodes as such have a long a rich history. They essentially surfaced into the data management practice with Edgar Codd's seminal work in relational databases. There, they were called "surrogate keys", and in RM/V2 it was made rather clear any surrogate key should not be exposed to the user of the database. Only the logical operation of equality comparison should be permitted on a value like that. That's a blank node, if you think about. Through-and-through. An anonymous thingy you can only compare for equality, autogenerated for its innate identifier not-to-be-seen, but-to-be-seen-in-higher-level reification and serialization; even if you should *never* touch the identifier even if you should see it. So if you're about to abolish blank nodes, you're about to demolish the rightful notion of surrogate keys and everything that follows from it. The notion isn't without its problems -- it's for example ripe to break the idea of natural keys and so some of the naturalness of relational normalization -- but it's still here from *beyond* time, and needs to be represented within RDF. Now, if you look at the relational theory of yore, all of the trouble with blanks/surrogates was already anticipated. Of *course* they are a nasty, self-multiplying piece of shit within the model. Of *course* they break the nice natural keying, semantic, closed world interpretation of your model, if used wrong; and they will be, *always*, just as every relational database out there now has "autoincrementing primary keys". (All of them at the very *least* ought to feign to be real surrogates, so that you couldn't see their values. But no.) But you can't really do without them either. If you somehow want to express the idea that "my car just got painted and found a new engine, despite the fact that my car is now identified by its paint and engine"... And we often do. We often do want to think by external level entities which aren't fully defined by their internal attributes in the database/knowledge graph. Which is what blank nodes/surrogate keys (in the proper sense) are for. Then, once you admit such a construct into your data model, things get messy. Because it *will* cause the database/graph designer/modeller to take the easy way out. If it's admitted, every relation in your relational database will have an autogenerated primary key. Pretty much everything in your RDF graph will eventually be a subject of some comment as a blank node, even if it could be referenced by named attributes. Should be, because that's how you do passable first order predicate logic; even Datalog, as a very limited and as such productive subset; kinda like the relational model, and its algebra. However, if you really want to make it hard on yourself... The only, final problem between knowing a data model with/without blanks/surrogates from one without, or reasoning with either, is in *no* way separate from the general, nasty problem of general graph isomorphism and normalization upto it. Because essentially a blank/surrogate stands in for something else, logically speaking either a natural, full key in a closed world, or for easiness often more likely some logically unexpressed within our model qualities of an open world; "that blue car, which is still a blue car logically, but 'somehow' 'different'". Whatever you do, eventually this revolves into trying to make your logical data model self-consistent, and at the same time homomorphic to whatever real world concepts you are trying to describe. Your logical descriptors are going to boil down into predicates. Including any and all blanks/surrogates, only used for equality comparison, opaquely. Because under under any sane reduction, relevant to either RDF/SW, or the relational model, reduction over graph isomorphism takes those kinds of spurious keys away again. Then we're left with "just" the harder problem of graph isomorphism with the rest of the dataset -- your real dataset. -- Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front +358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Tuesday, 28 July 2020 18:44:26 UTC