Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from Sampo Syreeni on 2020-07-28 (semantic-web@w3.org from July 2020)

From: Sampo Syreeni <decoy@iki.fi>
Date: Tue, 28 Jul 2020 21:44:06 +0300 (EEST)
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
cc: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, semantic-web@w3.org
Message-ID: <alpine.DEB.2.21.2007282104100.19446@lakka.kapsi.fi>
On 2020-07-21, Antoine Zimmermann wrote:

>> I was puzzled by the relationship between the general datatype and 
>> the specific datatypes. [...]

It's been a while, and I have my Issues with RDF, and whatnot, but...

Blank nodes as such have a long a rich history. They essentially 
surfaced into the data management practice with Edgar Codd's seminal 
work in relational databases. There, they were called "surrogate keys", 
and in RM/V2 it was made rather clear any surrogate key should not be 
exposed to the user of the database. Only the logical operation of 
equality comparison should be permitted on a value like that.

That's a blank node, if you think about. Through-and-through. An 
anonymous thingy you can only compare for equality, autogenerated for 
its innate identifier not-to-be-seen, but-to-be-seen-in-higher-level 
reification and serialization; even if you should *never* touch the 
identifier even if you should see it.

So if you're about to abolish blank nodes, you're about to demolish the 
rightful notion of surrogate keys and everything that follows from it. 
The notion isn't without its problems -- it's for example ripe to break 
the idea of natural keys and so some of the naturalness of relational 
normalization -- but it's still here from *beyond* time, and needs to be 
represented within RDF.

Now, if you look at the relational theory of yore, all of the trouble 
with blanks/surrogates was already anticipated. Of *course* they are a 
nasty, self-multiplying piece of shit within the model. Of *course* they 
break the nice natural keying, semantic, closed world interpretation of 
your model, if used wrong; and they will be, *always*, just as every 
relational database out there now has "autoincrementing primary keys". 
(All of them at the very *least* ought to feign to be real surrogates, 
so that you couldn't see their values. But no.)

But you can't really do without them either. If you somehow want to 
express the idea that "my car just got painted and found a new engine, 
despite the fact that my car is now identified by its paint and 
engine"... And we often do. We often do want to think by external level 
entities which aren't fully defined by their internal attributes in the 
database/knowledge graph. Which is what blank nodes/surrogate keys (in 
the proper sense) are for.

Then, once you admit such a construct into your data model, things get 
messy. Because it *will* cause the database/graph designer/modeller to 
take the easy way out. If it's admitted, every relation in your 
relational database will have an autogenerated primary key. Pretty much 
everything in your RDF graph will eventually be a subject of some 
comment as a blank node, even if it could be referenced by named 
attributes. Should be, because that's how you do passable first order 
predicate logic; even Datalog, as a very limited and as such productive 
subset; kinda like the relational model, and its algebra.

However, if you really want to make it hard on yourself... The only, 
final problem between knowing a data model with/without 
blanks/surrogates from one without, or reasoning with either, is in *no* 
way separate from the general, nasty problem of general graph 
isomorphism and normalization upto it. Because essentially a 
blank/surrogate stands in for something else, logically speaking either 
a natural, full key in a closed world, or for easiness often more likely 
some logically unexpressed within our model qualities of an open world; 
"that blue car, which is still a blue car logically, but 'somehow' 
'different'".

Whatever you do, eventually this revolves into trying to make your 
logical data model self-consistent, and at the same time homomorphic to 
whatever real world concepts you are trying to describe. Your logical 
descriptors are going to boil down into predicates. Including any and 
all blanks/surrogates, only used for equality comparison, opaquely. 
Because under under any sane reduction, relevant to either RDF/SW, or 
the relational model, reduction over graph isomorphism takes those kinds 
of spurious keys away again. Then we're left with "just" the harder 
problem of graph isomorphism with the rest of the dataset -- your real 
dataset.
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Tuesday, 28 July 2020 18:44:26 UTC